Here is a list of changes in Maui 1.2:
- Input files are now read using Apache commons IO Package. This makes the data reading part around 10 times faster and it also saves many lines of code.
- Vocabularies are now stored in GZip format (as *.gz) and are read in using GZipInputStream. This saves a lot of space, because SKOS format tends to repeat the same characters over and over. In fact, the vocabularies are now so tiny that I could easily supply them within the distribution. The SKOS files in data/vocabularies were created by cutting out all irrelevant (to Maui) information from the original files and compressing them.
- Stopwords are now initialized from a supplied file, rather than hard encoded one. The users can thus use their own stopwords and black listed terms.
- I wrote a new class MauiWrapper that shows how to apply Maui to a single text file, or a text string. Another new class MauiWrapperFactory shows how to use MauiWrapper with several vocabularies at the same time. These classes make it easy to create web services that use Maui for identifying main topics from text supplied by the web client.
- Finally, I have also generated a few models in data/models directory for those who don't have their own training data.