Not Logged In
[Zoom] You need to be logged in, be a member, and get a ticket to attend this session.
Automatic Readability Assessment using Artificial Intelligence
In this presentation, we will present a summary of recent advancements in automatic readability assessment for second language learners using artificial intelligence. Automatic readability assessment is a fundamental task that can be applied to a wide range of language learning situations, such as recommending appropriate texts to learners. Many formulas have been developed for this task using superficial measures of text, such as the Flesch-Kincaid Grade Level (FKGL). This leads us to the following research questions. Which method among the many formulas should be used? Or do we need to develop a better method? In this presentation, we will summarize our recent results to address these research questions. In 2018, Vajjala and Lučić published a corpus in which English teachers were actually asked to annotate the difficulty of each text on a three-point scale. This made it possible to compare which evaluation formulas and methods are in line with the judgment of English teachers. In this presentation, by using this dataset, we report the comparisons of the accuracies of various methods ranging from conventional the FKGL formula to our recent neural natural language processing-based methods such as Bidirectional Encoder Representations from Transformers (BERT), which has recently been introduced in the AI-based-software developers communities. In addition, using data from vocabulary test results conducted on learners, we report that we can predict the difficulty of a text accurately without looking at the label of the difficulty of the text at all. Of course, there are other settings that we have not covered in this experiment. For example, this corpus is based on general newspaper texts, which leaves unanswered questions about English learning in STEM and other fields. In this presentation, we will also address current open questions, such as what kind of data is needed to answer what kind of open questions.