DICTIONARY-BASED TOKENIZATION ALGORITHM FOR UZBEK TEXTS
Tokenization is the process of dividing a text into smaller parts, called tokens. Tokens can be words, punctuation marks, numbers, or other meaningful elements. Tokenization is primarily used in natural language processing (NLP) and is an essential first step for analyzing, understanding, or...

Actual problems in modern technical sciences / 2025 / November

Abdusobir Saidov, Maksud Sharipov, Ogabek Sobirov

Volume 9 | Issue 11

3 November 2025, 22:00

DESIGN AND IMPLEMENTATION OF A MODEL AND ALGORITHM FOR PART-OF-SPEECH TAGGING IN UZBEK TEXTS USING THE CONDITIONAL RANDOM FIELDS (CRFs) APPROACH
This paper presents a part-of-speech tagging system for the Uzbek language based on the Conditional Random Fields (CRF) approach. Using a manually annotated corpus and a set of language-specific morphological and contextual features, the model was trained and evaluated through 5-fold...

Actual problems in modern technical sciences / 2025 / July

Sharipov Maksud

Volume 9 | Issue 7

CONTEXT-FREE GRAMMAR BASED PARSING ALGORITHM FOR SIMPLE SENTENCES IN THE UZBEK LANGUAGE
Analyzing simple sentences in the Uzbek language is of great importance within the field of Natural Language Processing (NLP). This paper proposes a solution to the problem of developing a parsing algorithm for simple Uzbek sentences based on Context-Free Grammar (CFG). The syntactic rules of Uzbek...

Modern problems of Philology and Linguistics / 2025 / March

Maqsud Siddiqovich Sharipov, Ixtiyor Davlatyorovich Avezmatov

Volume 9 | Issue 3

PUNCTUATION ANALYSIS OF UZBEK TEXTS BASED ON THE N-GRAM MODEL
In this work, we consider the development of an algorithm for punctuation analysis of Uzbek texts. Due to the fact that the Uzbek language is considered a low-resource language, such algorithms have not been developed so far. An algorithm was developed using the N-gram model for punctuation...

Modern problems of Philology and Linguistics / 2025 / February

Hushnudbek Saylboyevich Adinaev

Volume 9 | Issue 2

3 February 2025, 21:50

CONTRASTIVE STUDY OF LINGUISTIC ASPECTS OF THE UZBEK AND KOREAN LANGUAGES
This article examines the grammatical and linguistic features of the Uzbek and Korean languages from a comparative-analytical perspective. The Uzbek language belongs to the agglutinative family, in which the role of suffixes in word formation is of great importance. The Korean language is also an...

Modern problems of Philology and Linguistics / 2025 / February

Musaev Farkhod

Volume 9 | Issue 2

3 February 2025, 21:50

A MORPHOLOGICALLY-AWARE TOKENIZER FOR THE UZBEK LANGUAGE
This paper proposes a morphologically-aware tokenizer that leverages the Apertium-uzbek morphological analyzer to enable word splitting. Results on Sentiment Analysis, Text Classification, and NER experiments prove the superiority of the suggested tokenizer over the baseline approach, reflecting...

Actual problems in modern technical sciences / 2025 / February

Kuriyozov Elmurod Radjabboy Ugli

Volume 9 | Issue 2

3 February 2025, 21:50

TRANSFORMER-BASED NLP SOLUTIONS FOR THE UZBEK LANGUAGE
This work presents BERTbek, one of the first monolingual transformerbased language models specifically for the Uzbek language. BERTbek is trained on a morphologically-sensitive tokenizer, utilising various text sources and training corpus sizes to evaluate the performance from different...

Actual problems in modern technical sciences / 2025 / February

Kuriyozov Elmurod Radjabboy Ugli

Volume 9 | Issue 2

3 February 2025, 21:50

EVALUATION OF SENTIMENT ANALYSIS DATASET USING MACHINE LEARNING AND DEEP LEARNING MODELS
This paper introduces the results that were obtained after conducting the experiments. To create the baseline models for Uzbek sentiment analysis, it has been chosen various classifiers from different families, including different methods of Support Vector Machines (SVM), and recent Deep Learning...

Actual problems of Mathematics, Physics and Mecanics / 2024 / August

Matlatipov Sanatbek Gayratovich

Volume 8 | Issue 8

3 August 2024, 21:50

COMPARATIVE ANALYSIS OF NEURAL NETWORK MODELS FOR UZBEK TEXT CLASSIFICATION
This study investigates the comparison of Machine Learning model and Neural Network model in Uzbek language text classification problem. In this study, the dataset aligned into 11 classes which extracted from the project of text classification dataset for Uzbek language. The Machine Learning model...

Modern problems of Philology and Linguistics / 2023 / September

Salaev Ulugbek

Volume 7 | Issue 7

15 September 2023, 21:50

AUTOMATIC PART-OF-SPEECH ANNOTATION TOOL FOR UZBEK LANGUAGE
This research paper introduces the Part-of-Speech tagging model for the Uzbek language using 16 selected tags. The proposed methodology includes a morphological analysis library for inflectional words by considering highly agglutinative character of the language and supported by a tagged lexicon...

Modern problems of Philology and Linguistics / 2023 / August

Salaev Ulugbek

Volume 7 | Issue 6

15 August 2023, 21:50