Topic indexing blog: Updated release and French term assignment example

Turns out I had two pending issues on Google Code, where I host the Maui algorithm. Per default the project owner does not gets a notification!

So today I went ahead and fixed one of the requests: to have everything in a jar file.
I've also updated the example files and added Javadoc documentation. Soon I will publish a detailed installation instruction (additionally to the usage instructions), but for now just this one command line example. It shows how to create a topic indexing model and apply it to new document on the example of term assignment with French documents. Download the latest release of Maui (1.1) and then try this:

java -Xmx1024m -classpath maui-1.0.jar maui.main.FrenchExample

If the Java classpath is not yet linked to Maui's libraries, add this after maui-1.0.jar:

":lib/weka.jar:lib/wikipediaminer1.1.jar:lib/trove.jar:lib/jena.jar:lib/icu4j_3_4.jar:lib/iri.jar:lib/xercesImpl.jar:lib/snowball.jar:lib/mysql-connector-java-3.1.13-bin.jar:lib/maxent-2.4.0.jar:lib/commons-logging.jar"

The output should be something like:

-- Building the model...
--- Loading the vocabulary...
--- Building the Vocabulary index from the SKOS file...
...
-- Reading the input documents...
...
--- Computing candidates...
...
--- Building classifier
...
-- Extracting keyphrases...
-- Keyphrases and feature values:
http://www.fao.org/aos/agrovoc#c_4830,'Produit laitier',0,0,0.003276,0.00335,...,False
http://www.fao.org/aos/agrovoc#c_7848,Commerce,0,0,0.003348,...,2,True
http://www.fao.org/aos/agrovoc#c_3919,'Commerce international',...,True
http://www.fao.org
/aos/agrovoc#c_4826,Lait,...,False
http://www.fao.org/aos/agrovoc#c_8288,Volume,...,False
http://www.fao.org/aos/agrovoc#c_25201,Usine,...,False
http://www.fao.org/aos/agrovoc#c_714,Australie,...,False
http://www.fao.org/aos/agrovoc#c_8323,'Besoin en eau',...,False
-- 2.0 correct

-- Evaluation results based on 1 document:
Avg. number of correct keyphrases per document: 2 +/- 0
Precision: 25 +/- 0
Recall: 13.33 +/- 0
F-Measure: 17.39

For each test document (in this case, just one is used), Maui outputs its Agrovoc ID, name of the concept (e.g. Besoin en eau), some feature values and True or False, depending on whether this term as been assigned to this document by a human indexer. Based on these values evaluation is performed. Because the directory already contains a .key file, Maui does not override it, otherwise it would create one with automatically generated topics.

This is of course just a demonstration: after training on just two documents and testing on a third one. But at least it shows (I hope!) how simple Maui's usage can be.

Topic indexing blog

Thursday, July 16, 2009

Updated release and French term assignment example

No comments:

Post a Comment