In previous posts, you have already learned how to make a frequency table or a contingency table for categorical variables. Although a table can be very insightful, things usually only get tangible when they are visualized. In this post, we learn how to turn a frequency/contingency table into a barplot with R.
In a previous post, it was explained how you can make a simple frequency table in R. Such a frequency table tells you for a single categorical variable how often each level (variant) of the categorical variable occurs in your dataset.
A contingency table does the same thing, but for two categorical variables at the same time, and in “comparison” to each other. Basically, what happens is that each level of the first categorical variable is considered with respect to each level of the second categorical variable.
Now and then, you hear something, and you wonder why it was said the way it was said. For me, that is the phenomenon that you hear the word “real” without the prescriptively required adverbial “ly” as a modifier of adjectives:
I just heard some real bad news (Kanye West)
That shirt is real fly! (Fresh Prince of Bel-Air)
As said, one would expect “really bad” and “really fly”. These kinds of things attract my attention, and I decided to do a small corpus linguistic investigation to find out what is going on.
Once you have imported your dataset into R, there are countless possibilities for analyzing the data in a quantitative way, as opposed to the qualitative analysis that went into the annotation. The very first quantitative analysis that you may want to perform on categorical variables is to see how often a certain value occurs with respect to another value in a variable. This might be practical if you want to find out if you have fairly balanced dataset.