- Text categorization (or: text classification) - Very few general categories, like Politics or News, are assigned usually from a relatively small vocabulary.
- Term assignment (or: subject indexing) - Document's main topics are expressed using terms from a large vocabulary, e.g. a domain-specific thesaurus.
- Keyphrase extraction (or: keyword extraction, key term extraction) - Document's main topics are expressed using the most prominent words and phrases in a document.
- Terminology extraction (similar to back-of-the-book indexing) - All domain relevant words and phrases are extracted from a document.
- Full-text indexing (or: full indexing, free text indexing) - All words and phrases, sometimes excluding the stopwords, are extracted from a document.
- Keyphrase indexing (or: keyphrase assignment) - A general term, which refers to both term assignment and keyphrase extraction.
- Tagging (or: collaborative tagging, social tagging and when performed automatically: autotagging, automatic tagging) - The user defines as many topics as desired. Any word or phrase can serve as a tag. Prevalently applied on collaborative websites.
- Clustering is related to topic indexing in that it identifies groups of documents on the same topic; however, these groups are unlabeled.
Wednesday, July 8, 2009
What do subject indexing, keyphrase extraction and autotagging have in common? Terminology clarification
There has been a lot of confusion about tasks related to topic indexing. Here is an overview of these tasks, terms used to refer to them and what they stand for.