Making a barplot in R

In previous posts, you have already learned how to make a frequency table or a contingency table for categorical variables. Although a table can be very insightful, things usually only get tangible when they are visualized. In this post, we learn how to turn a frequency/contingency table into a barplot with R.

Continue reading

How to make a contingency table in R

In a previous post, it was explained how you can make a simple frequency table in R. Such a frequency table tells you for a single categorical variable how often each level (variant) of the categorical variable occurs in your dataset.

A contingency table does the same thing, but for two categorical variables at the same time, and in “comparison” to each other. Basically, what happens is that each level of the first categorical variable is considered with respect to each level of the second categorical variable.

Continue reading

What is inter-annotator agreement?

Very often in linguistics, it is simply not possible to provide a classical definition with necessary and sufficient conditions for our categories. This is the case for most (perhaps all?) linguistic categories. Even basic categories such as parts of speech are not entirely clearly defined. In fact, Langacker (1987) takes that as a sign that we should re-think our whole linguistic ideas. But how can we then correctly annotate our data as a corpus linguist? Well, that is where inter-annotator agreement comes into play.

Continue reading

How to make a frequency table in R

Once you have imported your dataset into R, there are countless possibilities for analyzing the data in a quantitative way, as opposed to the qualitative analysis that went into the annotation. The very first quantitative analysis that you may want to perform on categorical variables is to see how often a certain value occurs with respect to another value in a variable. This might be practical if you want to find out if you have fairly balanced dataset.

Continue reading