In previous posts, you have already learned how to make a frequency table or a contingency table for categorical variables. Although a table can be very insightful, things usually only get tangible when they are visualized. In this post, we learn how to turn a frequency/contingency table into a barplot with R.
Let’s start with a simple barplot for the price categories in the dataset by my Berlin students on pronominal reduction. As you may remember, four price categories were discovered, splitting up the list of shops on the basis of the mean price for jeans and jackets. We already made simple frequency tables, so you know what the following R code does:
ds <- read.delim("https://corpuslinguisticmethods.files.wordpress.com/2014/02/berlin-red.key", header=T, row.names=1, sep="\t") ds.df <- as.data.frame(ds) ds.df.pc4 <- table(ds.df$Price_cat_4)
So, the frequency table for the price categories looks as follows
> ds.df.pc4 1 2 3 4 19 14 8 8
Since the price categories were numbered from 1 to 4, with 1 the cheapest and 4 the most expensive, this table is a little bit confusing due to the two rows of numbers. However, you should be able to read that we sampled 19 of the cheapest stores (1), and 8 of the most expense stores (4).
Before I make a visualization, I draw by hand a sketch of how I expect the visualization to be. In the case of the four price categories, I expect basically four bars with decreasing heights (19 > 14 > 8 = 8). Something like this:
Now, let’s recreate this image in R with the barplot command.
This already looks pretty close to the drawing above:
There are a couple of things that we now need to do: add a title, make everything a little bit less “fat”, perhaps add some axis labels, have actual names for the price categories, etc. That is fairly simple in R, as well:
barplot(ds.df.pc4, main="Shops per price category", xlab="Price categories", ylab="Amount of shops", names.arg=c("Low", "Lower middle", "Upper middle", "Upper") )
And this is what this looks like:
Alright, that looks nice already! Just for completeness, you can save your graphics by using the File > Save as… dialog in R, but much more quality can be obtained if you use the following code:
# call the png generator png("barplot.png", width=3000, height=3000, res=300) barplot(ds.df.pc4, main="Shops per price category", xlab="Price categories", ylab="Amount of shops", names.arg=c("Low", "Lower middle", "Upper middle", "Upper") ) # save dev.off()
The png command will create a png file with filename “barplot.png” (it will be saved in your current working directory, which you can find out with getwd()); the png will have 3000px horizontally, and 3000px vertically (which is a lot), and a “resolution” of 300. If your font is too small or too big, simply play around with the res value (up and down, with steps of 100) to get it better. Don’t forget the dev.off() at the end as that will actually save the png.
The next step is to make a barplot of a contingency table in R. In a previous post, we already made a contingency table with the following command:
> ftable(ds.df$Reduction ~ ds.df$Price_cat_4) ds.df$Reduction f t ds.df$Price_cat_4 1 6 9 2 6 7 3 6 2 4 5 3
As you can see, the price categories are listed as rows, and the observation of reduction or not is in the columns. If we were to put this contingency table directly into the barplot command, R would show us two columns, divided into four areas (I leave it as an exercise for you to make this graph). Obviously, this represents the two columns for reduction, and the four areas are the four price categories. Although the information is valid, what we really want is four columns for the price categories, with each column divided into two areas that indicate how much reduction there was in contrast to how much non-reduction there was.
To obtain this, we simply flip the formula for ftable around:
> ftable(ds.df$Price_cat_4 ~ ds.df$Reduction) ds.df$Price_cat_4 1 2 3 4 ds.df$Reduction f 6 6 6 5 t 9 7 2 3
There you go, four columns with two rows. Now, we can make a simple barplot for this as follows:
price.red <- ftable(ds.df$Price_cat_4 ~ ds.df$Reduction) barplot(price.red)
But probably, you are not so terribly interested in the absolute values of reduction/non-reduction per price category. Instead, you would want relative frequencies per price category. In R, that can be calculated with the prop.table command:
price.red <- ftable(ds.df$Price_cat_4 ~ ds.df$Reduction) price.red.p <- prop.table(price.red, 2) # 2, so that the rel.freqs. are per column barplot(price.red.p)
If you then put some effort in making the plot a bit nicer, you could obtain this result:
# make a png with high resolution png("barplot-contingency.png", width=3000, height=3000, res=400) # make some space around the plot for the legend par(mar=c(5.5, 4.5, 4.5, 8.5), xpd=TRUE) # make the barplot, use "col" to set the color of the areas barplot(price.red.p, col=c("black", "white"), names.arg=c("Low", "Lower middle", "Upper middle", "Upper"), main="Proportion of (non-)reduction per price category", xlab="Price categories", ylab="Proportion of (non-)reduction") # make a legend at x=5 and y=0.6 legend(5, 0.6, fill=c("white", "black"), legend=c("Reduction", "Non-reduction")) # save with dev.off() dev.off()