Title: LaVA - Latvian Language Learner corpus
Authors: Darģis, Roberts
Auziņa, Ilze
Kaija, Inga
Levāne-Petrova, Kristīne
Pokratniece, Kristīne
Calzolari, Nicoletta
Bechet, Frederic
Blache, Philippe
Choukri, Khalid
Cieri, Christopher
Declerck, Thierry
Goggi, Sara
Isahara, Hitoshi
Maegaard, Bente
Mariani, Joseph
Mazo, Helene
Odijk, Jan
Piperidis, Stelios
Rīga Stradiņš University
Keywords: acquisition;annotated;Latvian;learner corpus;5.3 Educational sciences;6.2 Languages and Literature;3.1. Articles or chapters in proceedings/scientific books indexed in Web of Science and/or Scopus database;Language and Linguistics;Library and Information Sciences;Linguistics and Language;Education
Issue Date: 2022
Publisher: European Language Resources Association (ELRA)
Citation: Darģis , R , Auziņa , I , Kaija , I , Levāne-Petrova , K & Pokratniece , K 2022 , LaVA - Latvian Language Learner corpus . in N Calzolari , F Bechet , P Blache , K Choukri , C Cieri , T Declerck , S Goggi , H Isahara , B Maegaard , J Mariani , H Mazo , J Odijk & S Piperidis (eds) , 13th Language Resources and Evaluation Conference, LREC 2022 : Proceedings . European Language Resources Association (ELRA) , pp. 727-731 , 13th International Conference on Language Resources and Evaluation, LREC 2022 , Marseille , France , 20/06/22 . < https://aclanthology.org/2022.lrec-1.77 >
conference
Abstract: This paper presents the Latvian Language Learner Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia. LaVA corpus contains 1015 essays (190k tokens and 790k characters excluding whitespaces) from foreigners studying at Latvian higher education institutions and who are learning Latvian as a foreign language in the first or second semester, reaching the A1 (possibly A2) Latvian language proficiency level. The corpus has morphological and error annotations. Error analysis and the statistics of the LaVA corpus are also provided in the paper. The corpus is publicly available at: http://www.korpuss.lv/id/LaVA.
Description: Funding Information: The work reported in this paper is a part of the project Development of Learner Corpus of Latvian: methods, tools and applications (Project No. lzp-2018/1-0527) that is being implemented at the Institute of Mathematics and Computer Science, University of Latvia (IMCS UL) since September 2018. The project is financed by Latvian Council of Science. This work is also a part of the Latvian State Research Programme Letonika - Fostering a Latvian and European Society project Research on Modern Latvian Language and Development of Language Technology (No. VPP-LETONIKA-2021/1-0006) and has received financial support from the Latvian Language Agency through the grant agreement No. 4.6/2019-029. Publisher Copyright: © European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
ISBN: 9791095546726
9791095546726
Appears in Collections:Research outputs from Pure / Zinātniskās darbības rezultāti no ZDIS Pure

Files in This Item:
File SizeFormat 
LaVA_Latvian_Language_Learner_corpus.pdf1.82 MBAdobe PDFView/Openopen_acces_unlocked


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.