In previous posts, you have already learned how to make a frequency table or a contingency table for categorical variables. Although a table can be very insightful, things usually only get tangible when they are visualized. In this post, we learn how to turn a frequency/contingency table into a barplot with R.
In a previous post, it was explained how you can make a simple frequency table in R. Such a frequency table tells you for a single categorical variable how often each level (variant) of the categorical variable occurs in your dataset.
A contingency table does the same thing, but for two categorical variables at the same time, and in “comparison” to each other. Basically, what happens is that each level of the first categorical variable is considered with respect to each level of the second categorical variable.
Very often in linguistics, it is simply not possible to provide a classical definition with necessary and sufficient conditions for our categories. This is the case for most (perhaps all?) linguistic categories. Even basic categories such as parts of speech are not entirely clearly defined. In fact, Langacker (1987) takes that as a sign that we should re-think our whole linguistic ideas. But how can we then correctly annotate our data as a corpus linguist? Well, that is where inter-annotator agreement comes into play.
Once you have imported your dataset into R, there are countless possibilities for analyzing the data in a quantitative way, as opposed to the qualitative analysis that went into the annotation. The very first quantitative analysis that you may want to perform on categorical variables is to see how often a certain value occurs with respect to another value in a variable. This might be practical if you want to find out if you have fairly balanced dataset.
The very first step of any quantitative study is to get the data into software that can do a quantitative analysis, such as R. In this post, it is explained how this is done. For the explanation in this post, we assume a working R installation, but no extra packages are required.