What is inter-annotator agreement?

Very often in linguistics, it is simply not possible to provide a classical definition with necessary and sufficient conditions for our categories. This is the case for most (perhaps all?) linguistic categories. Even basic categories such as parts of speech are not entirely clearly defined. In fact, Langacker (1987) takes that as a sign that we should re-think our whole linguistic ideas. But how can we then correctly annotate our data as a corpus linguist? Well, that is where inter-annotator agreement comes into play.

Continue reading

Corpus linguistics is real(ly)? awesome

Now and then, you hear something, and you wonder why it was said the way it was said. For me, that is the phenomenon that you hear the word “real” without the prescriptively required adverbial “ly” as a modifier of adjectives:

I just heard some real bad news (Kanye West)

That shirt is real fly! (Fresh Prince of Bel-Air)

As said, one would expect “really bad” and “really fly”. These kinds of things attract my attention, and I decided to do a small corpus linguistic investigation to find out what is going on.

Continue reading