Tuesday, July 6, 2010

Demo of term assignment & keyphrase extracton with Maui

It's been a while since my last post on this blog, but in the meantime Maui wasn't put to rest.

The most important news is that there is a new demo of Maui on Google AppEngine.

The main purpose of this demo is to show how Maui assigns terms from controlled vocabularies to documents. (This task is similar to text categorization with large number of categories.) The documents can be in text, Microsoft Word, or PDF format, and two kinds of vocabularies are to choose from: physics or agriculture.

The demo also shows how Maui extracts keyword. In this case, Maui was trained on 180 Computer Science documents used at the SemEval-2010 keyphrase extraction track.

Some more information on this demo can be found in my recent publication Subject Metadata Support Powered by Maui. It was co-authored with Ian H. Witten and Vye Perrone and presented at the Joint Conference on Digital Libraries in Australia last month.

Some technical notes: AppEngine has a few restrictions, which don't allow me to demo Maui's full functionality. For example, very large vocabularies cannot be uploaded to appspot, although it's possible if the demo runs locally. Also the way, Wikipedia is used in Maui is not suitable for the AppEngine framework.