Working with Mallet – Day 1

For the Un-initiated:

“MALLET” stands for MAchine Learning for LanguagE Toolkit. It contains collections of Java classes, which will help us perform some Natural language processing/Information Extraction tasks, using Machine learning.  It can be used both for experimentation-evaluation as well as in application development.

Its an open source software, released under “Common Public License”.

Home page here.

I am not a “technical blogger”. But, I thought this’d help with someone new to “Mallet” and working on simpler tasks. I am not going to tell something I “Invented”, for there is nothing like that. I will just be mentioning over here some links to get a good idea about writing code which uses Mallet’s classes for your tasks.

And yeah, theres nothing over here – to “display”. Its just a log so that I can re-visit when I have to work on mallet again.

Coming to the point, after installing, compilation and building mallet, Let us take a simple document classification example, to work with it.

The Task:

You have a set of documents – each document tagged with a category – as your training data.
Using this data, our task is to predict the category of any new document, that comes as input.

“Mallet” comes with a set of command line tools, to perform these tasks. However, if one wants to go beyond that, we can use Mallet’s API to achieve our goals. While browsing through their site, I came across some code samples, which will help us in doing that.

Here are the steps:
-First, convert your labeled document collection in to “mallet” format. How to arrange them is explained here. What approaches can you use to train the labeled collection- is also explained in the same page.

On command line, this can be achieved by using : bin/mallet –import-file or bin/mallet –import-dir, along with their options, depending on how your training data is.

This step gives us a .mallet file, which has our data in a form understandable by mallet.

More details can also be seen on the “Professionalization of Mallet” aka “pallet” page on google code links, here.

– Next -Use a classifier which will learn “how to classify a new document” from this data. On command line, its simple.
Run: bin/mallet train-classifier –input training.mallet –output-classifier my.classifier
(where: training.mallet is the file you got from the previous step. my.classifier – is the file which contains your classifier).
[Over here, we can choose the classifier we want to try out – NaiveBayes/MaxEnt etc.]

We can also choose a split in training data – and evaluating the classifier considering the rest as testing data – etc. We can also compare the performance of various classifiers.

If you want to customize your code and not use command line – Here is the best place. This google code page also will give enough help.

-Step 3: How to use this model now?
The same links mentioned in the last lines of previous step – give lot of information.

End user comments:
Oh, I am very satisfied with my first experience with Mallet. This is not exactly my first experience – I do remember playing it out long back – in 2006 I guess, for a few minutes only, though.

May be because this is a purely academic endeavour – there are little discussion boards/support groups for “mallet” online.

Anyways, may be as I begin doing more complex tasks, I’ll find more interesting stuff with Mallet, which I can blog about soon.

Published in: on April 29, 2010 at 5:33 pm  Comments (3)  

The URI to TrackBack this entry is:

RSS feed for comments on this post.

3 CommentsLeave a comment

  1. malletlo topic modeling kooda chaala baaguntundi….adi kooda try cheseyandi…

  2. ఎంత దాకా వచ్చింది?🙂

  3. thanks sowmya u gave me a better idea of mallet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: