The Language Goldmine

Welcome to the Language Goldmine!

You are visitor number

This page provides links to several linguistic databases and datasets. We want to facilitate finding data sources that are relevant to your research question. And we want you to explore what is out there in the world of freely available linguistic web data.

At present, the Goldmine features XX data sources, including:

  • corpora of all varieties
  • normed data, ratings, reaction times
  • typological datasets

When using any of the data, make sure to cite the data source appropriately, given the website's specifications!

If you know about linguistic data that we are missing, please send an email with the header "goldmine update" to bodo@bodowinter.com. Alternatively, we appreciate if you file an issue on our GitHub Repository, where we collect all datasets.