At workshops held at prestigious AI conferences on Wikipedia-related research, e.g. WikiAI at IJCAI and People's Web meets NLP at ACL, I have learned about pretty amazing things one can implement using Wikipedia data. From computing semantic relatedness between words at a level comparable to humans to converting folksonomies of tags into ontologies. In general, the organizers make a differentiation between how Wikipedia can help AI in solving language-related tasks and how AI can help Wikipedia to improve its quality and fight vandalism.
Let's concentrate on the former.
A huge barrier here is that Wikipedia is huge (and gets bigger!). There are tools available for processing the Wikipedia dumps, but the best working I found so far is the open source Java library Wikipedia Miner. Not just because it was developed at the University of Waikato, where I studied. The reasons are its easy installation, an intuitive object-oriented model for accessing Wikipedia data, as well as additional tools for computing similarity between any two English phrases, wikifying any text (i.e. link its phrases to concepts explained in Wikipedia) and even implemented web services. Check out the online demos:
- Search. If you search for a word like palm, it will list all the possible meanings, starting with the most likely one (Arecaceae - 65% likelihood) and all other meanings, like Palm (PDA) and hand palm. Clicking on a meaning shows the words used to refer to it, e.g. redirects like Palmtree and anchors like palm wood, as well as translations (language links) and broader terms (categories).
- Compare. Here you can calculate things like vodka is more related to wine than to water.
- Wikify. This feature helps finding out what concepts are discussed in any text or on any website. Very practical particularly for texts with many named entities, e.g. news articles, but not only. Here is this blog wikified with Wikipedia Miner (the links are added in blue and at the highest possible density).
Many other people already are using Wikipedia Miner (100 downloads in the last 6 month). It has also been referenced as research tool in various published projects, including adding structure to search queries, finding similar videos, extending ontologies, creating language learning tools and many more.
Since Maui is language independent, ¿can I infer that Wikipedia Miner is also language-independent?
ReplyDeleteTheoretically, yes! I know that people are using Wikipedia Miner with other languages as well. The approach is definitely language-independent. There are some issues with inconsistencies with how categories represented across Wikipedia languages, but Dave Milne (the creator) has been working on these.
ReplyDeleteI am not sure how much work is involved in making it work on Wikipedia dumps in other languages, because I haven't tried it out yet. You'd need to know some perl and hack the download scripts (i think).
Thanks for your explanation
ReplyDeleteThe approach is definitely language-independent. There are some issues with inconsistencies with how categories represented across Wikipedia languages.
ReplyDelete__________________
Kim
Good point, Kim. Have you done any work on this or could point us to work by others? Thanks, Alyona
ReplyDeleteAny chance a version of Maui will come out which supports Wikipedia-miner 1.2?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteNo, unfortunately this won't be possible from my side. However, Maui is an open-source project. Anybody can contribute! See
ReplyDeletemaui-indexer.googlecode.com