Saturday, June 27, 2009

How often taggers agree with each other?

... or better how often taggers of my thesis have agreed with each other?

Nine of my friends (all Computer Scientists graduates and IT professionals), who all helped me with proof reading my thesis Human-competitive automatic topic indexing, choose five tags that describe its main topics. Each one was familiar with my work, read parts of the thesis and the abstract.

General impression

There was no tag on which the nine people agreed! Five of them picked tagging, although this is only one of the three tasks that is addressed in the thesis. There was a struggle with compound words like topic indexing (should it be just indexing and topics or document topics?) and with adjectives (should they be used as separate tags, e.g. automatic, statistical or as modifiers of existing tags, e.g. automatic tagging).

One of the people picked controlled vocabularies, another controlled-vocabulary. When comparing the results, I treated these tags as the same thing, however, I didn't do it with other tags, which also represented the same thing but were expressed slightly different: topic indexing and topic assignment. In general, everyone agreed on the general topics but expressed them differently.

Two topic clouds (same tags, but different layout) show all tags assigned by everyone to the thesis:

Tag cloud 1

algorithm automatic automatic tagging automatic topic indexing
artificial intelligence computer science
controlled vocabularies
document categorization document topics
domain-specific knowledge
encyclopedic knowledge human competitive
human indexing
indexing indexing methods kea
keyphrase extraction
machine learning natural language processing
tag hierarchies taxonomies term assignment
topic indexing
topic assignment topics semantic
supervised learning
statistical wikipedia

Tag cloud 2

tagging wikipedia indexing machine learning
topic indexing controlled vocabularies keyphrase extraction
computer science automatic topic indexing
automatic automatic tagging auto-tagging artificial intelligence
document categorization document topics domain-specific knowledge
encyclopedic knowledge human competitive human indexing
indexing methods kea natural language processing tag hierarchies
taxonomies term assignment topic assignment topics semantic
supervised learning statistical

Consistency of taggers of my thesis

Consistency analysis is a traditional way of assessing indexing quality (more on this below). I applied this metric to evaluate tags assigned to my thesis and here are the results:

A - 22.5
D - 16.1
G - 20.3
J - 2.5
K - 27.8
N - 15.0
S - 5.4
T1 - 8.3
T2 - 29.6

The average consistency in this group is 16.4%, with the best tagger achieving nearly 30%.
These results were based on only one document and therefore are only a guideline.

About indexing consistency

In traditional library, inter-indexer consistency analysis is used to assess how well professionals assign subject headings to library holdings. The higher is the consistency, the better will be topic-based search in the catalog. It is a logical consequence, because if a librarians agrees with his colleague, it is likely to agree with the patron.

Admittedly, tagging is slightly different. Taggers, who assign tags for their own use, chose them based on personal preferences that might not be of use to others. But since tags are widely used by others, their quality is as important as that of subject headings in libraries.

In one of the experiments, reported earlier, I have analyzed consistency of taggers on CiteULike and used it to evaluate automatic tags produced by Maui. The consistency of taggers varied from 0 to 91%, with an average of 18.5%. Thus, my friends performed pretty much as good as an average CiteULike tagger.

1 comment:

  1. it may really depends on what word they choose for specific topic..