Sunday, June 28, 2009

How to generate visualizations of topics with Dotty and GraphViz

In a previous post, I have shown Wikipedia article titles that Maui assigned to the introduction of my thesis. To create a nice visualization of these topics, instead of the less expressive tag clouds, I used WikipediaMiner and Dotty (via GraphViz for Mac).

I have written a script that generates a .gv file with the following content:

graph G {
"Machine learning" -- "Keywords"[style=invis];
"Machine learning" -- "Natural language" [penwidth = "3"];
"Machine learning" -- "Knowledge" [penwidth = "2"];
...
}
The nodes are the names of the topics and the links are expressed using semantic relatedness values computed using WikipediaMiner. The values are modified to reflect the line width in the generated graph:
  • style=invis if relatedness = 0;
  • otherwise penwidth is used.
The number is the original relatedness value (between 0 and 1) multiplied by 10 and made into an integer.

Update: Additionally the top keyphrase returned by the algorithm is defined as the root of the graph (e.g. graph [root="Index (search engine)"] for the graph below). Also the font size reflects the rank of the keyphrase as determined by the algorithm.
Click on the graph to see the graph in full resolution. Or view the complete GraphViz script.

The beauty of GraphViz is that the generated graph can be exported into any format and expanded to any required size.

Furthermore, the visualization can be generated for any list of topics, as long as they can be mapped to titles of Wikipedia articles:

// first check if there is an article with the title "topic"
Article article = wikipedia.getArticleByTitle(topic);

// otherwise retrieve the most likely mapping
if (article == null) {
CaseFolder cs = new CaseFolder();
article = wikipedia.getMostLikelyArticle(topic,cs);
}

The script for generating such graphs is included as a part of Maui software.

No comments:

Post a Comment