Skip to main content



A Corpus is a collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures. This guide lists corpora across the world's languages in the following areas:

New to corpus linguistics? A useful guide is the Routledge Companion to Corpus Based Language Studies, which provides and excellent survey of corpora and lists tools useful for corpus-based research. It's a companion to the book  Corpus-based language studies : an advanced resource book.

For more language data, see the portals below: