SUBTLEX-CAT: Subtitle word frequencies and contextual diversity for Catalan.
Author | |
---|---|
Abstract | :
SUBTLEX-CAT is a word frequency and contextual diversity database for Catalan, obtained from a 278-million-word corpus based on subtitles supplied from broadcast Catalan television. Like all previous SUBTLEX corpora, it comprises subtitles from films and TV series. In addition, it includes a wider range of TV shows (e.g., news, documentaries, debates, and talk shows) than has been included in most previous databases. Frequency metrics were obtained for the whole corpus, on the one hand, and only for films and fiction TV series, on the other. Two lexical decision experiments revealed that the subtitle-based metrics outperformed the previously available frequency estimates, computed from either written texts or texts from the Internet. Furthermore, the metrics obtained from the whole corpus were better predictors than the ones obtained from films and fiction TV series alone. In both experiments, the best predictor of response times and accuracy was contextual diversity. |
Year of Publication | :
2020
|
Journal | :
Behavior research methods
|
Volume | :
52
|
Issue | :
1
|
Number of Pages | :
360-375
|
ISSN Number | :
1554-351X
|
URL | :
https://dx.doi.org/10.3758/s13428-019-01233-1
|
DOI | :
10.3758/s13428-019-01233-1
|
Short Title | :
Behav Res Methods
|
Download citation |