Skip to main content

Announcing the ICLE500 Dataset

cecl |

cecl
11 July 2024, modified on 6 December 2024

We are delighted to announce the release of the ICLE500 dataset (Thwaites et al., 2024). This dataset contains 500 argumentative essays from the International Corpus of Learner English (ICLE; Granger et al., 2020), each annotated with its corresponding CEFR level.

As part of the CLAP project, we partnered with Kollias (Harry) Charalambos from Polytomous Limited, who oversaw an assessment task aimed at mapping ICLE texts to the CEFR levels. This was accomplished using the CEFR's Table C4 - Written Assessment Grid (Council of Europe, 2020, pp. 187-189) and the guidelines provided in the manual "Relating Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment" (Council of Europe, 2009). A comprehensive technical report detailing the full assessment procedure accompanies the dataset (Kanistra & Kollias, 2024).

You can access the dataset at https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/RIOSSC

We hope the research community will find this dataset useful and look forward to seeing it used in future studies.