Uzbek language » KhorezmScience.uz

DICTIONARY-BASED TOKENIZATION ALGORITHM FOR UZBEK TEXTS

Tokenization is the process of dividing a text into smaller parts, called tokens. Tokens can be words, punctuation marks, numbers, or other meaningful elements. Tokenization is primarily used in natural language processing (NLP) and is an essential first step for analyzing, understanding, or...

Actual problems in modern technical sciences / 2025 / November

Abdusobir Saidov, Maksud Sharipov, Ogabek Sobirov

Volume 9 | Issue 11

3 November 2025, 22:00

Preview Download (1.83 Mb) (12)

DESIGN AND IMPLEMENTATION OF A MODEL AND ALGORITHM FOR PART-OF-SPEECH TAGGING IN UZBEK TEXTS USING THE CONDITIONAL RANDOM FIELDS (CRFs) APPROACH

This paper presents a part-of-speech tagging system for the Uzbek language based on the Conditional Random Fields (CRF) approach. Using a manually annotated corpus and a set of language-specific morphological and contextual features, the model was trained and evaluated through 5-fold...

Actual problems in modern technical sciences / 2025 / July

Sharipov Maksud

Volume 9 | Issue 7

3 July 2025, 21:50

Preview Download (854.33 Kb) (4)

CONTEXT-FREE GRAMMAR BASED PARSING ALGORITHM FOR SIMPLE SENTENCES IN THE UZBEK LANGUAGE

Analyzing simple sentences in the Uzbek language is of great importance within the field of Natural Language Processing (NLP). This paper proposes a solution to the problem of developing a parsing algorithm for simple Uzbek sentences based on Context-Free Grammar (CFG). The syntactic rules of Uzbek...

Modern problems of Philology and Linguistics / 2025 / March

Maqsud Siddiqovich Sharipov, Ixtiyor Davlatyorovich Avezmatov

Volume 9 | Issue 3

3 March 2025, 21:50

Preview Download (866.09 Kb) (8)

PUNCTUATION ANALYSIS OF UZBEK TEXTS BASED ON THE N-GRAM MODEL

In this work, we consider the development of an algorithm for punctuation analysis of Uzbek texts. Due to the fact that the Uzbek language is considered a low-resource language, such algorithms have not been developed so far. An algorithm was developed using the N-gram model for punctuation...

Modern problems of Philology and Linguistics / 2025 / February

Hushnudbek Saylboyevich Adinaev

Volume 9 | Issue 2

3 February 2025, 21:50

Preview Download (802.2 Kb) (8)

CONTRASTIVE STUDY OF LINGUISTIC ASPECTS OF THE UZBEK AND KOREAN LANGUAGES

This article examines the grammatical and linguistic features of the Uzbek and Korean languages from a comparative-analytical perspective. The Uzbek language belongs to the agglutinative family, in which the role of suffixes in word formation is of great importance. The Korean language is also an...

Modern problems of Philology and Linguistics / 2025 / February

Musaev Farkhod

Volume 9 | Issue 2

3 February 2025, 21:50

Preview Download (846.19 Kb) (6)

A MORPHOLOGICALLY-AWARE TOKENIZER FOR THE UZBEK LANGUAGE

This paper proposes a morphologically-aware tokenizer that leverages the Apertium-uzbek morphological analyzer to enable word splitting. Results on Sentiment Analysis, Text Classification, and NER experiments prove the superiority of the suggested tokenizer over the baseline approach, reflecting...

Actual problems in modern technical sciences / 2025 / February

Kuriyozov Elmurod Radjabboy Ugli

Volume 9 | Issue 2

3 February 2025, 21:50

Preview Download (686.67 Kb) (5)

TRANSFORMER-BASED NLP SOLUTIONS FOR THE UZBEK LANGUAGE

This work presents BERTbek, one of the first monolingual transformerbased language models specifically for the Uzbek language. BERTbek is trained on a morphologically-sensitive tokenizer, utilising various text sources and training corpus sizes to evaluate the performance from different...

Actual problems in modern technical sciences / 2025 / February

Kuriyozov Elmurod Radjabboy Ugli

Volume 9 | Issue 2

3 February 2025, 21:50

Preview Download (690.75 Kb) (7)

EVALUATION OF SENTIMENT ANALYSIS DATASET USING MACHINE LEARNING AND DEEP LEARNING MODELS

This paper introduces the results that were obtained after conducting the experiments. To create the baseline models for Uzbek sentiment analysis, it has been chosen various classifiers from different families, including different methods of Support Vector Machines (SVM), and recent Deep Learning...

Actual problems of Mathematics, Physics and Mecanics / 2024 / August

Matlatipov Sanatbek Gayratovich

Volume 8 | Issue 8

3 August 2024, 21:50

Preview Download (1.04 Mb) (9)

COMPARATIVE ANALYSIS OF NEURAL NETWORK MODELS FOR UZBEK TEXT CLASSIFICATION

This study investigates the comparison of Machine Learning model and Neural Network model in Uzbek language text classification problem. In this study, the dataset aligned into 11 classes which extracted from the project of text classification dataset for Uzbek language. The Machine Learning model...

Modern problems of Philology and Linguistics / 2023 / September

Salaev Ulugbek

Volume 7 | Issue 7

15 September 2023, 21:50

Preview Download (1.05 Mb) (5)

AUTOMATIC PART-OF-SPEECH ANNOTATION TOOL FOR UZBEK LANGUAGE

This research paper introduces the Part-of-Speech tagging model for the Uzbek language using 16 selected tags. The proposed methodology includes a morphological analysis library for inflectional words by considering highly agglutinative character of the language and supported by a tagged lexicon...

Modern problems of Philology and Linguistics / 2023 / August

Salaev Ulugbek

Volume 7 | Issue 6

15 August 2023, 21:50

Preview Download (616.61 Kb) (15)