Skip to main content

Just out: The Core Metadata Schema for Learner Corpora

cecl |

17 September 2024, modified on 6 December 2024

A paper was recently published in the International Journal of Learner Corpus Research: "The Core Metadata Schema for Learner Corpora (LC-meta): Collaborative efforts to advance data discoverability, metadata quality and study comparability in L2 research" (Authors: Magali Paquot (CECL), Alexander König (CLARIN ERIC), Egon W. Stemle (Eurac Research), and Jennifer-Carmen Frey (Eurac Research)).

Metadata is critical throughout the research process, from study design to corpus selection/compilation, result interpretability and cumulative research. To date, however, learner corpus research has not developed community standards or best practices for metadata collection and sharing. In this article, we present the results of a collaborative project aimed at addressing this issue by developing a standardised metadata schema for learner corpora. We first describe the procedure implemented to design the schema, including the ways in which we continuously involved learner corpus researchers in this initiative. We then introduce the Core Metadata Schema for Learner Corpora (LC-meta, Version 2), which consists in a set of obligatory and optional variables that encapsulate crucial information about L2 data (administrative details, corpus design, text-related variables, learner-related variables, annotations, annotators, or transcribers). Finally, we discuss future developments and emphasise the importance of continued maintenance and further refinement of this schema by the research community.