Evaluating Reliability of Interview Data on Traditional Multilingualism in Highland Daghestan
In this article, we address the issue of reliability of quantitative data on multilingualism of the past obtained as recall data. More specifically, we investigate whether the interviewees’ assessments of the language repertoires of their late relatives (indirect data) provide results that are quantitatively similar to those obtained from the people of the same age range themselves (direct data). The empirical data we use come from an ongoing field study of traditional multilingualism in Daghestan (Russia). We trained machine learning models to see whether they can detect differences in indirect and direct data. We conclude that our indirect quantitative data on L2 other than Russian are essentially similar to direct data, while there may be a small but systematic underestimation when reporting other’s knowledge of Russian.