Corpora of English (freely available on the www)

Size

100 million words

Variety

British English

Medium

90% written and 10% spoken language

Genres

Spoken, fiction, magazine, newspaper, non-academic, academic

Sampling period

1970s - 1993

Corpus of Contemporary American English (COCA)

Size

More than 400 million words (20 million words added each year)

Variety

American English

Medium

Written and spoken English

Genres

Fiction, popular magazines, newspapers, academic texts, conversation 

Sampling period

1990 - 2009

Corpus of Historical American English (COHA)

Size

400 million words

Variety

American English

Medium

Written English

Genres

Evenly balanced by genre

Sampling period

1810-2009

Size

More than 100 million words

Variety

American English

Medium

Written English

Genres

Articles from the Time Magazine

Sampling period

1923 - 2006

Size

1.8 million words

Variety

American English

Medium

Spoken English

Genres

Academic English (Lectures, seminars, councelling sessions, etc.)

Sampling period

1997-2001

Size

249,000 words

Variety

American English

Medium

Spoken English

Genres

Naturally occurring mostly face-to-face conversations

Sampling period

1990s

Please note: The SBCSAE is freely downloadable. To use the transcripts, however, you need to make use of a corpus software such as WordSmith Tools. Students of English enrolled at the University of Oldenburg can use this software via the University network. If you are interested in using the corpus, please feel free to contact me.

Size

1 million words

Variety

English spoken as a lingua franca by speakers of more than 50 L1s

Medium

Spoken English

Genres

Naturally occurring, non-scripted face-to-face interactions

Sampling period

2001-2007

A comprehensive list of bookmarks for corpora, corpus linguistics and corpus tools can be found here: Corpus-based Linguistics Links