logo

The Language Goldmine

Welcome to the Language Goldmine!

+++IMPORTANT+++ The Language Goldmine is considered to be finalized with the version you find here. There are no plans to further update the project or to add new data to it. Those who liked the resource and who would like to expand it are cordially invited to take the data and also the web presentation code (all open) to make a modified project on their own.

The final version that this website serves is version v1 from 2017, which you can also find on Zenodo under DOI https://doi.org/10.5281/zenodo.15576073.

You are visitor number

This page provides links to several linguistic databases and datasets. We want to facilitate finding data sources that are relevant to your research question. And we want you to explore what is out there in the world of freely available linguistic web data.

At present, the Goldmine features XX data sources, including:

  • corpora of all varieties
  • normed data, ratings, reaction times
  • typological datasets

When using any of the data, make sure to cite the data source appropriately, given the website's specifications!

If you know about linguistic data that we are missing, please send an email with the header "goldmine update" to bodo@bodowinter.com. Alternatively, we appreciate if you file an issue on our GitHub Repository, where we collect all datasets.