Friday, June 26, 2009

How to use Maui

Here are some usage instructions that are also published on Maui's wiki page.

Preparing the data

After Maui is installed, there are two ways of using it: from the command line and from the Java code. Either way, the input data is required first. The data directory in Maui's download package contains some examples of input data.

1. Formatting the document files.
Each document has to be stored individually in text form in a file with extension .txt. Maui takes as an input the name of the directory with such files. If a model needs to be created first, the same directory should contain main topics assigned manually to each document.

2. Formatting the topic files.
The topic sets need to be stored individually in text form, one topic per line, in a file with the same name as the document text, but with the extension .key.

3. Maui's Output.
If Maui is used to generate main topics for new documents, it will create .key files for each document in the input directory. If topics are generated, but .key files are already existent, the existing topics are used as gold standard for the evaluation of automatically extracted ones.

Command line usage

Maui can be used directly from the command line. The general command is:
java maui.main.MauiModelBuilder
(or maui.main.MauiTopicExtractor)
-l directory (directory with the data)
-m model (model file)
-v vocabulary (vocabulary name)
-f {skos|text} (vocabulary format)
-w database@server (wikipedia location)
Which class is used depends on the current mode of topic indexing. MauiModelBuilder is used when a topic indexing model is created from documents with existing topics. MauiTopicExtractor is used when a model is created, to assign topics to new documents.

Examples with experimental data are supplied in the Maui package. The following commands refer to the directories with this data. They correspond to different topic indexing tasks:

1. Automatic tagging and keyphrase extraction - when topics are extracted from document text itself.
MauiModelBuilder -l data/automatic_tagging/train/
-m tagging_model
MauiTopicExtractor -l data/automatic_tagging/test/
-m tagging_model
2. Term assignment - when topics are taken from a controlled vocabulary in SKOS format.
MauiModelBuilder -l data/term_assignment/train/
-m assignment_model
-v agrovoc
-f skos
MauiTopicExtractor -l data/term_assignment/test/
-m assignment_model
-v agrovoc
-f skos
3. Topic indexing with Wikipedia - when topics are Wikipedia article titles. Note in this case WikipediaMiner needs to be installed and running first.

MauiModelBuilder -l data/wikipedia_indexing/train/
-m indexing_model
-v wikipedia
-w enwiki@localhost
MauiTopicExtractor -l data/wikipedia_indexing/test/
-m indexing_model
-v wikipedia
-w enwiki@localhost
For terminology extraction use the command line argument -n set to a high value to extract all possible candidate topics in the document.

12 comments:

  1. Its a nice concept. I was testing the app and still confused with some parts (Such as ModuleBuilder : What type of data shall I provide to it to create a module). I am expecting some more details ,documentation and examples.

    ReplyDelete
  2. When you download Maui, there is a directory "data" in the main directory. It contains sample documents for different tasks that Maui performs. If you perform term assignment, i.e. assigning keywords using a pre-defined vocabulary, go into "data/term_assignment". The directory "train" contains example data required for building a model. You can test ModelBuilder on the contents of this directory.

    I hope it's clear. If you have other questions let me know.

    ReplyDelete
  3. I've been testing Maui for term assignment. I have around 1000 product and service descriptions and would like Maui to assign them terms to a handful of terms like beauty, health, arts, restaurants, etc.

    I do not have a vocabulary for this domain and I'm trying to create on manually. Is there any tool to help me build the vocabulary by parsing product descriptions, looking for keywords, augmenting them Wordnet, etc? I've tried ThManager but I've found it to be unusable.

    Is it possible to use Wikipedia Miner and then restricting results to a handful of categories?

    ReplyDelete
  4. Hi Adarsh,

    unfortunately, I don't know any great tools for this. Personally, I use custom written scripts in my consulting work.

    There is one tool for creating and maintaining vocabularies called PoolParty: http://poolparty.punkt.at/
    It has a nice visual interface, but is not free.

    If I were you, I would just use a text file and copy and paste the RDF structure from an existing vocabulary.
    You can do it very easily in just a few hours.

    Wikipedia Miner is a great alternative, but I wouldn't restrict the categories. They should be useful as they are. It does take a bit of time to set up this tool.

    Did you try to do keyphrase extraction instead?

    Have a look at the latest Maui demo:
    http://maui-indexer.appspot.com/

    ReplyDelete
  5. I've tried keyphrase extraction+tagging, it's accuracy is currently low. I wonder if I am training the model correctly. I trained it on around 150 files each having a ~200 word description mapping to a single category in the .key file. How can I increase accuracy? Add more .txt files? Or more tags per .key file?

    ReplyDelete
  6. Hi Adarsh,

    please send me an email with a sample of your training set and describe how you run Maui (which options are on). I will try help.
    It's my lastname on gmail.

    Alyona

    ReplyDelete
  7. Hi, I've tried to extract Keywords and keyphrases - no vocabulary of some documents.
    the systems returned some keywords but when i tried to follow the link to see the broader and narrower terms of a given concept it always gave me an error.
    could you please tell me what i´m doing wrong?
    thank you

    ReplyDelete
  8. Which links are you following?
    If no vocabulary is used, Maui doesn't have access to broader and narrower terms.

    ReplyDelete
  9. Hi, i'm using the option: 2. Select a vocabulary: Keywords and keyphrases - no vocabulary.
    then i make run maui. it returns the document keywords. but when i click in the keyword to see the broader and narrower terms it returns the following message:
    The requested URL /content%20management was not found on this server.

    ReplyDelete
  10. Alyoha,

    How can I use the Maui indexer on the command-line in the same fashion as the online demo for automatic tagging (http://maui-indexer.appspot.com/) ?

    ReplyDelete
  11. Alyona , How do I use Maui the same way as the online demo ? I would like to pass a free text ( "not prepared as ".key") and have topics returned ? Thank you

    ReplyDelete
  12. Dear all, please post all your questions on Maui usage to: http://groups.google.com/group/kea-and-maui-support
    Thanks.

    ReplyDelete