Some corpora come without a search interface. How do you search in them? Perhaps you read them into a concordance program like AntConc, but then you notice that the corpus has some weird idiosyncratic format that messes with the lines. AntConc quickly becomes pretty unusable if that is the case. So, what can you do? The simplest solution is to write a small Python script!
The very first step of any quantitative study is to get the data into software that can do a quantitative analysis, such as R. In this post, it is explained how this is done. For the explanation in this post, we assume a working R installation, but no extra packages are required.
Although it is a good idea to build your datasets in spreadsheet software, it is an even better idea to save your dataset (after you are ready with the annotation, of course) into the csv format.