To create a training model I used a collection with 20 computer science papers indexed by graduate students in an indexing experiment for my AAAI paper "Topic indexing with Wikipedia".
Then I applied this model to generate ten keywords and keyphrases for the Introduction chapter of my PhD Thesis. I am very pleased with the results:
- Index (search engine)
- Natural language
- Keywords
- Algorithm
- Index (information technology)
- Machine learning
- Training set
- Information retrieval
- Dissertation
- Knowledge
The best thing about keyphrase extraction with Wikipedia is that it links each term to a concept -- Wikipedia article explaining this term. Thus, the resulting keyphrase sets will be consistent: they will all refer to "Dissertation" and not "PhD thesis" or "Doctoral thesis", disregarding of which term is actually used in the document to express this concept.
Each topic can be directly linked to the Wikipedia article by adding "http://en.wikipedia.org/wiki/" in front of the term as I did in the above list.
Interesting stuff. Is the Wiki20 dataset the one being mentioned here? I guess so and will see about the README inside it if any. BTW, the links in pointing to Waikato pages this post points to seem to be broken
ReplyDeleteYes, that's correct. Thanks for pointing out the broken links. Unfortunately, the content is gone now...
ReplyDelete