Some corpora come without a search interface. How do you search in them? Perhaps you read them into a concordance program like AntConc, but then you notice that the corpus has some weird idiosyncratic format that messes with the lines. AntConc quickly becomes pretty unusable if that is the case. So, what can you do? The simplest solution is to write a small Python script!
The Corpus of Historical American English is a wonderful source for corpus linguistic research on diachronic English phenomena. There are about 400 million words from newspapers, magazines, fiction and non-fiction books, starting in 1810 up to 2009. A very neat web interface is available for searching in the COHA, and there are actually quite a number of neat features available for search.
However, the COHA web interface does not allow you to make a really good dataset for corpus linguistic research.