A MORPHOLOGICALLY-AWARE TOKENIZER FOR THE UZBEK LANGUAGE
This paper proposes a morphologically-aware tokenizer that leverages the Apertium-uzbek morphological analyzer to enable word splitting. Results on Sentiment Analysis, Text Classification, and NER experiments prove the superiority of the suggested tokenizer over the baseline approach, reflecting...

Actual problems in modern technical sciences / 2025 / February

Kuriyozov Elmurod Radjabboy Ugli

Volume 9 | Issue 2

3 February 2025, 21:50

TRANSFORMER-BASED NLP SOLUTIONS FOR THE UZBEK LANGUAGE
This work presents BERTbek, one of the first monolingual transformerbased language models specifically for the Uzbek language. BERTbek is trained on a morphologically-sensitive tokenizer, utilising various text sources and training corpus sizes to evaluate the performance from different...

Actual problems in modern technical sciences / 2025 / February

Kuriyozov Elmurod Radjabboy Ugli

Volume 9 | Issue 2

3 February 2025, 21:50