language independence in NLP – some thoughts

In the last session of our reading group, we discussed the following article:

title: “On achieving and Evaluating language independence in NLP”
author: Emily Bender
Linguistic Issues in Language Technology, 2011.
url here
The article is an extended version of a 2009 writeup, about which I wrote here.

To summarize in few words, the article first discusses what does language independence in natural language processing system development mean – in theory and in practice. Then, taking linguistic typology as a source of knowledge, it suggests some do’s and don’ts for NLP researchers working for the development of language independent systems. I liked the idea that true language independence is possible only through incorporation of linguistic knowledge into the system design. It took me only a few seconds to convinced about it when I read that 2009 paper and my opinion did not change in the meanwhile. My experience in working with non-English language datasets in the meanwhile only boosted the opinion.

Reading the article with a bunch of people with a linguistic and not an engineering background this time gave me some new perspective. One most important thing I noticed is this: I think I can say it is fairly common among CS based NLP communities to claim language independence by assuming that the approach that works on one or two languages, several times closely related ones, will work on any other language. I never knew what linguists think about that. The linguists in our group first wondered how can anyone claim language independence in general and how difficult is it to claim language independence. We even briefly went into a philosophical discussion. As someone who started with NLP in a CS department, I should confess I never even thought of it like this until now. People in so many NLP papers claim language independence in an off-hand manner.. and I suddenly started seeing why it could be a myth. That is the “aha” moment for that day.

Anyway, coming back to the paper, after the section 4 where there are the Do’s and Don’ts, I found the section 5 incomplete. This is an attempt to explain how Computational Linguistics is useful in typology and vice-versa – but I did not get a complete picture. There were a couple of recent papers which test the applicability of their approaches on multiple languages from different language families (two examples I can think of from 2015 are: Soricot and Och, 2015 and Müller and Schütze, 2015)

Nevertheless, it is a very well written article and is a must read for anyone who wondered if all the claims of language independence are really true and if there is no implicit considerations that favor some language over the other in the development of natural language processing systems.

Thanks to Maria, Simon, Xiaobin and Marti for all a very interesting discussion!

Published in: on November 21, 2015 at 6:50 pm  Comments (1)  

The URI to TrackBack this entry is:

RSS feed for comments on this post.

One CommentLeave a comment

  1. For almost 5 months, why is the blog silent?
    What happened to the multi faceted blogs?
    Why is there abstinence on other blogs too?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: