Skip to main content

Linguistics

English Language Corpora

Time Magazine Corpus
Full text of the articles in Time Magazine (US) from 1923- present, providing more than 100 million words.  Free login required for some features.

American English Dialect Recordings: The Center for Applied Linguistics Collection
118 hours of recordings documenting North American English dialects, dating from 1900-1999.  A few recordings of Canadian speakers are included.

IDEA: International Dialects of English Archive
Recordings are principally in English, are of native speakers, and include both English-language dialects and English spoken in the accents of other languages.

Edinburgh University Speech Timing Archive and Corpus of English (EUSTACE)
Comprises 4608 spoken sentences spoken by six speakers of British English, designed to examine a number of durational effects in speech and are controlled for length and phonetic content.

 

Corpus of Canadian English (Strathy)
50 million words from more than 1,100 spoken, fiction, magazines, newspapers, and academic texts.

Hip Hop Word Count (Rap Research Lab)
Request access. V0.2.2 contains syntax, semantic and rhyme data for 50 artists. In progress, a dataset with  20,000 artists.