The demand for Chinese language learning has surged, yet the unique characteristics of the Chinese writing system pose challenges for learners in improving their writing skills. Appropriate reading materials are crucial for language acquisition and literacy development among Chinese learners. While automatic readability assessment has been widely applied in alphabetic writing systems, research on Chinese text readability started relatively late. With the growing number of Chinese learners, automatic readability assessment for Chinese texts has gained attention, aiming to develop evaluation methods and tools tailored to the Chinese writing system and reading acquisition patterns. Text readability significantly impacts the ease of comprehension for readers. Previous research has primarily focused on English and other alphabetic systems, but studies on Chinese text readability are now emerging. Advances in natural language processing and artificial intelligence have shifted readability assessment from linear regression formulas to machine learning models. However, automatic readability assessment for Chinese texts faces challenges, requiring consideration of both universal and language-specific features to establish effective readability metrics.
This section reviews empirical studies on Chinese text readability using the PRISMA guidelines, summarizing progress from various perspectives. The study focuses on the application of Chinese readability assessment in educational settings, analyzing the corpora, linguistic features, evaluation paradigms, and automated assessment platforms used in Chinese readability research. It explores the diversity of global Chinese readability corpora, the predictive validity of language-specific features, classification paradigms in Chinese readability research, and existing automated assessment platforms. Readability paradigms are categorized into linear regression, machine learning, and deep learning, with trends in each paradigm quantified.
This section outlines the methodology for automatic Chinese text readability assessment. First, inclusion and exclusion criteria were established following PRISMA guidelines, limiting the review to empirical studies published between 2010 and 2024. Second, relevant literature was retrieved from Scopus and CNKI using keywords such as "readability assessment" for English papers and "可读性评估" (readability evaluation) for Chinese papers. After screening, 30 English papers from Scopus and 14 Chinese papers from CNKI were selected, totaling 44 studies. Key information, including basic details, corpus characteristics, linguistic features, and classification methods, was extracted to support subsequent analysis.
Advances in information technology have propelled readability research, with a notable increase in publications after 2018, driven by progress in distributed representations and neural network models. Most studies were authored by researchers from mainland China, focusing on readability for native and second-language Chinese learners. Corpora and linguistic features play a vital role in readability assessment, with many studies using textbook grade levels as difficulty benchmarks and incorporating Chinese-specific linguistic traits. Feature analysis spans character, word, and sentence levels, including glyph complexity, word acquisition age, and grammatical points. Readability paradigms are divided into linear regression, machine learning, and deep learning, with deep learning excelling in feature interaction but lacking interpretability. Currently, five readability analysis platforms exist, each with distinct features. Comparative analysis reveals differences between native and second-language Chinese readability research in study subjects, data sources, linguistic features, and methodologies. Machine learning dominates, while deep learning has grown rapidly in recent years. Readability analysis is widely applied in Chinese education, with automated tools aiding in selecting and adapting texts of appropriate difficulty to enhance language learning.
This section reviews progress in Chinese text readability assessment from 2010 to 2024, highlighting three limitations: insufficient openness of corpora, small sample sizes in second-language studies, and low interpretability of deep learning models. Future research should expand corpus scale, integrate linguistic theory with AI algorithms, and develop representations tailored to Chinese features. Although Chinese readability studies are increasing, they still lag behind research on alphabetic languages. Mainland Chinese researchers contributed 72.5% of the papers, with additional contributions from Taiwan, Hong Kong, Singapore, and Germany. The Chinese CTAP project holds cross-linguistic significance. The field has expanded to diverse learner groups, with 57% of studies targeting native texts and 37% focusing on second-language learners. The uniqueness of Chinese texts necessitates language-specific feature sets. Most studies emphasize character- and word-level features, neglecting discourse-level analysis. High-quality annotated corpora remain scarce, primarily sourced from textbooks. While machine learning performs well on small datasets, combining it with deep learning can improve accuracy. Readability research is interdisciplinary, and advances in NLP offer new approaches. Deep learning combined with linguistic features may enhance model performance. This study reviewed 44 papers via PRISMA but excluded unpublished theses; future work could include manual screening and additional databases.
Automatic Chinese text readability assessment has made progress in linguistic features and methodologies, but challenges remain in dataset quality and model interpretability. Automated tools enable precise text difficulty analysis, improving learning efficiency and personalization, which is vital for promoting Chinese language education globally.
* 以上内容由AI自动生成,内容仅供参考。对于因使用本网站以上内容产生的相关后果,本网站不承担任何商业和法律责任。