How to extract data from COHA into Excel or R?

The Corpus of Historical American English is a wonderful source for corpus linguistic research on diachronic English phenomena. There are about 400 million words from newspapers, magazines, fiction and non-fiction books, starting in 1810 up to 2009. A very neat web interface is available for searching in the COHA, and there are actually quite a number of neat features available for search.

However, the COHA web interface does not allow you to make a really good dataset for corpus linguistic research.

Continue reading