BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models
Abstract
BLiSS 1.0 evaluates models' selective tolerance to naturalistic learner errors compared to artificial errors, providing a benchmark for assessing alignment with human language acquisition.
To bridge the gap between performance-oriented benchmarks and the evaluation of cognitively inspired models, we introduce BLiSS 1.0, a Benchmark of Learner Interlingual Syntactic Structure. Our benchmark operationalizes a new paradigm of selective tolerance, testing whether a model finds a naturalistic learner error more plausible than a matched, artificial error within the same sentence. Constructed from over 2.8 million naturalistic learner sentences, BLiSS provides 136,867 controlled triplets (corrected, learner, artificial) for this purpose. Experiments on a diverse suite of models demonstrate that selective tolerance is a distinct capability from standard grammaticality, with performance clustering strongly by training paradigm. This validates BLiSS as a robust tool for measuring how different training objectives impact a model's alignment with the systematic patterns of human language acquisition.
Community
Dear authors,
I tried to find the dataset through the link shared in the paper but I couldn't find it. Am I missing something?
Thank you for your work!
Best,
Abder
Hey Abder,
Check out our dataset on HuggingFace here: https://huggingface.co/datasets/ALTACambridge/BLiSS. Due to data redistribution agreements, we cannot provide the processed dataset directly, but this toolkit allows you to reconstruct it from the original sources. The dataset page has information about how to do this โ if there are any issues, please reach out to us.
Best wishes,
Suchir
I understand! Thank you for the quick response :)
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper