How to extract data from COHA into Excel or R?

The Corpus of Historical American English is a wonderful source for corpus linguistic research on diachronic English phenomena. There are about 400 million words from newspapers, magazines, fiction and non-fiction books, starting in 1810 up to 2009. A very neat web interface is available for searching in the COHA, and there are actually quite a number of neat features available for search.

However, the COHA web interface does not allow you to make a really good dataset for corpus linguistic research.

Continue reading

Advertisements

Accountability, recall and precision in corpus linguistics

For many inexperienced linguists who start working with corpora, there is the misconception that a query in a corpus leads almost directly towards solving a research question. Nothing, however, is less true than this. A corpus linguistic approach to a research question often involves a lot of work, both on an intellectual and on a technical/mind-numbing level.

Continue reading