Preparing the data
After Maui is installed, there are two ways of using it: from the command line and from the Java code. Either way, the input data is required first. The data directory in Maui's download package contains some examples of input data.1. Formatting the document files.
Each document has to be stored individually in text form in a file with extension .txt. Maui takes as an input the name of the directory with such files. If a model needs to be created first, the same directory should contain main topics assigned manually to each document.
2. Formatting the topic files.
The topic sets need to be stored individually in text form, one topic per line, in a file with the same name as the document text, but with the extension .key.
3. Maui's Output.
If Maui is used to generate main topics for new documents, it will create .key files for each document in the input directory. If topics are generated, but .key files are already existent, the existing topics are used as gold standard for the evaluation of automatically extracted ones.
Command line usage
Maui can be used directly from the command line. The general command is:java maui.main.MauiModelBuilderWhich class is used depends on the current mode of topic indexing. MauiModelBuilder is used when a topic indexing model is created from documents with existing topics. MauiTopicExtractor is used when a model is created, to assign topics to new documents.
(or maui.main.MauiTopicExtractor)
-l directory (directory with the data)
-m model (model file)
-v vocabulary (vocabulary name)
-f {skos|text} (vocabulary format)
-w database@server (wikipedia location)
Examples with experimental data are supplied in the Maui package. The following commands refer to the directories with this data. They correspond to different topic indexing tasks:
1. Automatic tagging and keyphrase extraction - when topics are extracted from document text itself.
MauiModelBuilder -l data/automatic_tagging/train/
-m tagging_model
MauiTopicExtractor -l data/automatic_tagging/test/2. Term assignment - when topics are taken from a controlled vocabulary in SKOS format.
-m tagging_model
MauiModelBuilder -l data/term_assignment/train/
-m assignment_model
-v agrovoc
-f skos
MauiTopicExtractor -l data/term_assignment/test/3. Topic indexing with Wikipedia - when topics are Wikipedia article titles. Note in this case WikipediaMiner needs to be installed and running first.
-m assignment_model
-v agrovoc
-f skos
MauiModelBuilder -l data/wikipedia_indexing/train/
-m indexing_model
-v wikipedia
-w enwiki@localhost
MauiTopicExtractor -l data/wikipedia_indexing/test/For terminology extraction use the command line argument -n set to a high value to extract all possible candidate topics in the document.
-m indexing_model
-v wikipedia
-w enwiki@localhost
Its a nice concept. I was testing the app and still confused with some parts (Such as ModuleBuilder : What type of data shall I provide to it to create a module). I am expecting some more details ,documentation and examples.
ReplyDeleteWhen you download Maui, there is a directory "data" in the main directory. It contains sample documents for different tasks that Maui performs. If you perform term assignment, i.e. assigning keywords using a pre-defined vocabulary, go into "data/term_assignment". The directory "train" contains example data required for building a model. You can test ModelBuilder on the contents of this directory.
ReplyDeleteI hope it's clear. If you have other questions let me know.
I've been testing Maui for term assignment. I have around 1000 product and service descriptions and would like Maui to assign them terms to a handful of terms like beauty, health, arts, restaurants, etc.
ReplyDeleteI do not have a vocabulary for this domain and I'm trying to create on manually. Is there any tool to help me build the vocabulary by parsing product descriptions, looking for keywords, augmenting them Wordnet, etc? I've tried ThManager but I've found it to be unusable.
Is it possible to use Wikipedia Miner and then restricting results to a handful of categories?
Hi Adarsh,
ReplyDeleteunfortunately, I don't know any great tools for this. Personally, I use custom written scripts in my consulting work.
There is one tool for creating and maintaining vocabularies called PoolParty: http://poolparty.punkt.at/
It has a nice visual interface, but is not free.
If I were you, I would just use a text file and copy and paste the RDF structure from an existing vocabulary.
You can do it very easily in just a few hours.
Wikipedia Miner is a great alternative, but I wouldn't restrict the categories. They should be useful as they are. It does take a bit of time to set up this tool.
Did you try to do keyphrase extraction instead?
Have a look at the latest Maui demo:
http://maui-indexer.appspot.com/
I've tried keyphrase extraction+tagging, it's accuracy is currently low. I wonder if I am training the model correctly. I trained it on around 150 files each having a ~200 word description mapping to a single category in the .key file. How can I increase accuracy? Add more .txt files? Or more tags per .key file?
ReplyDeleteHi Adarsh,
ReplyDeleteplease send me an email with a sample of your training set and describe how you run Maui (which options are on). I will try help.
It's my lastname on gmail.
Alyona
Hi, I've tried to extract Keywords and keyphrases - no vocabulary of some documents.
ReplyDeletethe systems returned some keywords but when i tried to follow the link to see the broader and narrower terms of a given concept it always gave me an error.
could you please tell me what i´m doing wrong?
thank you
Which links are you following?
ReplyDeleteIf no vocabulary is used, Maui doesn't have access to broader and narrower terms.
Hi, i'm using the option: 2. Select a vocabulary: Keywords and keyphrases - no vocabulary.
ReplyDeletethen i make run maui. it returns the document keywords. but when i click in the keyword to see the broader and narrower terms it returns the following message:
The requested URL /content%20management was not found on this server.
Alyoha,
ReplyDeleteHow can I use the Maui indexer on the command-line in the same fashion as the online demo for automatic tagging (http://maui-indexer.appspot.com/) ?
Alyona , How do I use Maui the same way as the online demo ? I would like to pass a free text ( "not prepared as ".key") and have topics returned ? Thank you
ReplyDeleteDear all, please post all your questions on Maui usage to: http://groups.google.com/group/kea-and-maui-support
ReplyDeleteThanks.