Notes from EACL2014

(This is a note taking post. It may not be of particular interest to anyone)


I was at EACL 2014 this week, in Gothenburg, Sweden. I am yet to give a detailed reading to most of the papers that interested me, but I thought its a good idea to list down things.

I attended the PITR workshop and noticed that there are more number of interested people both in the authors and audience compared to last year. Despite the inconclusive panel discussion, I found the whole event interesting and stimulating primarily because of the diversity of topics presented. There seems to be an increasing interest in performing eye-tracking experiments for this task. Some papers that particularly interested me:

One Step Closer to Automatic Evaluation of Text Simplification Systems by Sanja Štajner, Ruslan Mitkov and Horacio Saggion

An eye-tracking evaluation of some parser complexity metrics – Matthew J. Green

Syntactic Sentence Simplification for FrenchLaetitia Brouwers, Delphine Bernhard, Anne-Laure Ligozat and Thomas Francois

An Open Corpus of Everyday Documents for Simplification TasksDavid Pellow and Maxine Eskenazi

An evaluation of syntactic simplification rules for people with autism – Richard Evans, Constantin Orasan and Iustin Dornescu

(If anyone came till here and is interested in any of these papers, they are all open-access and can be found online by searching with the name)


Moving on to the main conference papers,  I am listing here everything that piqued my interest, right from papers I know only by titles for the moment to those for which I heard the authors talk about the work.

Parsing, Machine Translation etc.,

* Is Machine Translation Getting Better over Time? – Yvette Graham; Timothy Baldwin; Alistair Moffat; Justin Zobel

* Improving Dependency Parsers using Combinatory Categorial Grammar-Bharat Ram Ambati; Tejaswini Deoskar; Mark Steedman

* Generalizing a Strongly Lexicalized Parser using Unlabeled Data- Tejaswini Deoskar; Christos Christodoulopoulos; Alexandra Birch; Mark Steedman

* Special Techniques for Constituent Parsing of Morphologically Rich Languages – Zsolt Szántó; Richárd Farkas

* The New Thot Toolkit for Fully-Automatic and Interactive Statistical Machine Translation- Daniel Ortiz-Martínez; Francisco Casacuberta

* Joint Morphological and Syntactic Analysis for Richly Inflected Languages – Bernd Bohnet, Joakim Nivre, Igor Bogulavsky, Richard Farkas, Filip Ginter and Jan Hajic

* Fast and Accurate Unlexicalized parsing via Structural Annotations – Maximilian Schlund, Michael Luttenberger and Javier Esparza

Information Retrieval, Extraction stuff:

* Temporal Text Ranking and Automatic Dating of Text – Vlad Niculae; Marcos Zampieri; Liviu Dinu; Alina Maria Ciobanu

* Easy Web Search Results Clustering: When Baselines Can Reach State-of-the-Art Algorithms – Jose G. Moreno; Gaël Dias


* Now We Stronger than Ever: African-American English Syntax in Twitter- Ian Stewart

* Chinese Native Language Identification – Shervin Malmasi and Mark Dras

* Data-driven language transfer hypotheses – Ben Swanson and Eugene Charniak

* Enhancing Authorship Attribution by utilizing syntax tree profiles – Michael Tschuggnall and Günter Specht

* Machine reading tea leaves: Automatically Evaluating Topic Coherence and Topic model quality by Jey Han Lau, David Newman and Timothy Baldwin

* Identifying fake Amazon reviews as learning from crowds – Tommaso Fornaciari and Massimo Poesio

* Using idiolects and sociolects to improve word predictions – Wessel Stoop and Antal van den Bosch

* Expanding the range of automatic emotion detection in microblogging text – Jasy Suet Yan Liew

* Answering List Questions using Web as Corpus – Patricia Gonçalves; Antonio Branco

* Modeling unexpectedness for irony detection in twitter – Francesco Barbieri and Horacio Saggion

* SPARSAR: An Expressive Poetry reader – Rodolfo Delmonte and Anton Maria Prati

* Redundancy detection in ESL writings – Huichao Xue and Rebecca Hwa

* Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules – Advaith Siddharthan and Angrosh Mandya

* Verbose, Laconic or Just Right: A Simple Computational Model of Content Appropriateness under length constraints – Annie Louis and Ani Nenkova

* Automatic Detection and Language Identification of Multilingual Document – Marco Lui, Jey Han Lau and Timothy Baldwin

Now, in the coming days, I should atleast try to read the intros and conclusions of some of these papers. 🙂

Published in: on May 2, 2014 at 3:10 pm  Leave a Comment  

“Linguistically Naive != Language Independent” and my soliloquy

This post is about a paper that I read today (which inspired me to write a real blog post after months!)

The paper: Linguistically Naive!= Language Independent: Why NLP Needs Linguistic Typology
Author: Emily Bender
Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics, pages 26–32. ACL.

In short, this is a position paper, that argues that incorporating linguistic knowledge is a must if we want to create truly language independent NLP systems. Now, on the surface, that looks like a contradictory statement. Well, it isn’t ..and it is common sense, in.. er..some sense 😉

So, time for some background: an NLP algorithm that offers a solution to some problem is called language independent if that approach can work for any other language apart from the language for which it was initially developed. One common example can be Google Translate. It is a practical example of how an approach can work across multiple language pairs (with varying efficiencies ofcourse, but that is different). The point of these language independent approaches is that, in theory, you can just apply the algorithm on any language as long as you have the relevant data about that language. However, typically, such approaches in contemporary research eliminate any linguistic knowledge in their modeling and there by make it “language” independent.

Now, what the paper argues for is clear from the title – “linguistically naive != language independent”.

I liked the point made in section-2, where in some cases, the surface appearance of language independence is actually a hidden language dependence. The specific example of ngrams and how efficiently they work, albeit for languages with certain kind of properties, and the claim of language independence – that nailed down the point. Over a period of time, I became averse to the idea of using n-grams for each and every problem, as I thought this is not giving any useful insights neither from a linguistic nor from a computational perspective (This is my personal opinion). However, although I did think of this language dependent aspect of n-grams, I never clearly put it this way and I just accepted that “language independence” claim. Now, this paper changed that acceptance. 🙂

One good thing about this paper is that it does not stop there. It also explains about approaches that use language modeling but does slightly more than ngrams to accommodate various types of languages (factored language models) and also talks about how a “one size fits all” approach won’t work. There is this gem of a statement:

“A truly language independent system works equally well across languages. When a system that is meant to be language independent does not in fact work equally well across languages, it is likely because something about the system design is making implicit assumptions about language structure. These assumptions are typically the result of “overfitting” to the original development language(s).”

Now, there is this section on language independence claims and representation of languages belonging to various families in the papers of ACL 2008. This concludes saying:
“Nonetheless, to the extent that language independence is an important goal, the field needs to improve both its testing of language independence and its sampling of languages to test against.”

Finally, the paper talks about one form of linguistic knowledge that can be incorporated in linguistic systems – linguistic typology and gives pointers to some useful resources and relevant research in this direction.

And I too conclude the post with the two main points that I hope people noticed in the research community:

(1) “This paper has briefly argued that the best way to create language-independent systems is to include linguistic knowledge, specifically knowledge about the ways in which languages vary in their structure. Only by doing so can we ensure that our systems are not overfitted to the development languages.”

(2) “Finally, if the field as a whole values language independence as a property of NLP systems, then we should ensure that the languages we select to use in evaluations are representative of both the language types and language families we are interested in.”

Good paper and considerable amount of food for thought! These are important design considerations, IMHO.

The extended epilogue:

At NAACL-2012, there was this tutorial titled “100 Things You Always Wanted to Know about Linguistics But Were Afraid to Ask“, by Emily Bender. At that time, although I in theory could have attended the conference, I could not, as I had to go to India. But, this was one tutorial that caught my attention with its name and description and I really wanted to attend it.

Thanks to a colleague who attended, I managed to see the slides of the tutorial (which I later saw on the professor’s website). Last week, during some random surfing, I realized that an elaborate version was released as a book:

Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax
by Emily Bender
Pub: Synthesis Lectures on Human Language Technologies, Morgan and Claypool Publishers

I happily borrowed the book using the inter-library loan and it traveled for a few days and reached me from somewhere in Lower Saxony to here in Baden-Württemburg. Just imagine, it travelled all the way just for my sake! 😉 😛

So, I started to go through the book. I, even in the days of absolute lack of any basic knowledge on this field, always felt that natural language processing should involve some form of linguistic modeling by default. However, most of the successful so-called “language independent” approaches (some of which also became the products we use regularly, like Google Translate and Transliterate) never speak about such linguistic modeling (atleast, not many that I read).

There is also this Norvig vs Chomsky debate, about which I keep getting reminded of when I think of this topic. (Neither of them are wrong in my view but that is not the point here.)

In this context, I found the paper particularly worth sharing. Anyway, I perhaps should end the post. While reading the introductory parts of Emily Bender’s book, I found a reference to the paper, and this blog post came out of that reading experience.

Published in: on January 23, 2014 at 5:04 pm  Comments (3)  

MLSS 2013 – Week 1 recap

I am attending this year’s Machine Learning Summer School and we just finished one week of lectures. I thought now is the moment to look back and note down my thoughts (mainly because we thankfully don’t have lectures on sundays!). One more week to go and I am already very glad that I am here listening to all these amazing people who are undoubtedly some of the best researchers in this area. There is also a very vibrant and smart student community.

Until Saturday evening, my thoughts on the summer school focused more on the content of the sessions. They were mostly about the mathematics in the sessions, my comfort and discomfort with it, their relevance, understanding the conceptual basis of it etc., I won’t make claims that I understood everything. I understood some talks better, some talks not at all. I also understood that things could have been much better for me if we were informed about why we need to actually seriously follow all the Engineering Mathematics courses during my bachelors ;).

However, coming to the point, as I listened to the Multilayer Nets lecture by Leon Bottou on Saturday afternoon, there was something that I found particularly striking. It looks like two things that I always thought of as possibly interesting aspects of Machine Learning are not really a part of the real machine learning community. (Okay, one summer school is not a whole community but I did meet some people who have been in that field of research for years now).

1) What exactly are you giving as input for the machine to learn? Shouldn’t we give the machine proper input for it to learn what we expect it to learn?

2) Why isn’t the interpretability of a model an issue worth researching about?

Let me elaborate on these.

Coming to the first one, this is called “Feature Engineering”. The answer that I heard from one senior researcher for this question was: “We build algorithms that will enable the machine to learn from anything. Features are not our problem. The machine will figure that out.” But, won’t the machine need the right eco-system for that? If I grow up in a Telugu speaking household and get exposed to Telugu input all the time, will I be expected to learn Telugu or Chinese? Likewise, if we want to construct a model that does a specific task, is it not our responsibility to prepare the input for that? Okay, we can build systems that figure out the features that work by itself. But won’t that make the machine learn anything from the several possible problem subspaces, instead of the specific issue we want it to learn? Yes, there are always ways to assess if its learning the right thing. But, thats not the point. In a way, this connects again to the second question.

Am not knowledgeable enough on this field to come up with a well-argued response to that above comment by the senior researcher. The matter of fact is also that there is enough evidence that that approach does work in some scenarios. But, this is a general question on the applicability of the models, issues regarding domain adaptation if any etc. I found so less literature on theoretical aspects connecting feature engineering to algorithm design and hence these basic doubts.

The second question is also something that I have been thinking about for a long time now. Are people really not bothered about how those who apply Machine Learning in their fields interpret their models or am I bad at searching for the right things? Why is there no talk about the interpretability of models? I did find a small amount of literature on “Human comprehensible machine learning” and related research, but not much.

I am still in the process of thinking, reading and understanding more on this topic. I will perhaps write another detailed post soon (with whatever limited awareness I have on this topic). But, in the mean while,

* Here is a blogpost by a grad student, that has some valid points on interpretability of models.

* “Machine Learning that matters“, ICML 2012 position paper by Kiri Wagstaff. This is something that I keep getting back to time and again, whenever I get into thinking about these topics. Not that the paper answers my questions.. it keeps me motivated to think on them.

* An older blogpost on the above paper which had some good discussion in the comments section.

With these thoughts, we march towards the second week of awesomeness at MLSS 2013 :-).

Published in: on September 1, 2013 at 3:31 pm  Comments (1)  

Notes from ACL

This is the kind of post that would not interest anyone else except me perhaps. I was at ACL (in a very interesting city called Sofia, the capital of Bulgaria) last week and I am still in the process of making some notes on the papers that interested me, abstracts that raised my curiosity, short and long term interest topics etc. I thought its probably a better idea to arrange atleast the titles in some subgroups and save somewhere so that it would be easy for me to get back later. I did not read all of them completely. Infact, for a few of them, I did not even go beyond the abstract. So, don’t ask me questions. Anyone who is interested in any of these titles can either read them by googling for them or visit the ACL anthology page for ACL’13 and find the pdfs there.

The first two sections below are my current topics of interest. The third one is a general topic of interest. The fourth one includes everything else…that piqued my interest. Fifth section is on teaching CL/NLP…which is also a long term interest topic for me. The final section is about workshops as a whole that I have interest in.


Various dimensions of the notion of text difficulty, readability
* Automatically predicting sentence translation difficulty – Mishra and Bhattacharya
* Automatic detection of deception in child produced speech using syntactic complexity features – Yancheva and Rudzicz
* Simple, readable sub sentences – Klerke and Sogaard
* Improving text simplification language modeling using Unsimplified text data – Kauchak
* Typesetting for improved readability using lexical and syntactic information – Salama
* What makes writing great?: First experiments on Article quality prediction in the science journalism domain, Louis and Nenkova
* Word surprisal predicts N400 amplitude during reading – Frank
* An analysis of memory based processing costs using incremental deep syntactic dependency parsing – Schjindel

Language Learning, Assessment etc.
* Discriminative Approach to fill-in-the-blank quiz generation for language learners
* Modeling child divergences from Adult Grammar with Automatic Error Correction
* Automated collocation suggestion for Japanese second language learners
* Reconstructing an Indo-European family tree from non-native English texts
* Word association profiles and their use for automated scoring of essays -Klebanov and Flor.
* Grammatical error correction using Integer Linear programming
* A learner corpus based approach to verb suggestion for ESL
* Modeling thesis clarity in student essays – Persing & Ng
* Computerized analysis of a verbal fluency test – Szumlanski
* Exploring word class n-grams to measure language development in children. Ramirez-de-la-Rosa

NLP for other languages:
* Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison – Esmaili and Salavati
* Identifying English and Hungarian light verb constructions: A contrastive approach – Vincze
* Real-world semi-supervised learning of POS taggers for low-resource languages -Garrette
* Learning to lemmatize Polish noun phrases – Radziszewski
* Sentence level dialect identification in Arabic – Elfardy and Diab

* Exploring Word Order Universals: a probabilisitic graphical model approach – Xia Lu.
* An opensource toolkit for quantitative historical linguists
* SORT: An improved source rewriting tool for improved translation
* unsupervised consonant-vowel prediction over hundreds of languages
* Linguistic models for analyzing and detecting biased language.
* Earlier identification of Epilepsy surgery candidates using natural language processing – Matykiewicz
* Parallels between linguistics and biology. Chakraborti and Tendulkar
* Analysing lexical consistency in translation – Guillou
* Associative texture is lost in translation – Klebanov and Flor

Teaching CL, NLP:
* Artificial IntelliDance: Teaching Machine learning through choreography, Agarwal and Trainor
* Treebanking for data-driven research in the classroom, Lee
* Learning computational linguistics through NLP evaluation events: the experience of Russian evaluation initiative. Bonch-Osmolovskaya
* Teaching the basics of NLP and ML in an introductory course to Information Science. Agarwal.

whole workshops and competitions:
* Shared task on quality estimation in Machine translation
* Predicting and improving textual readability for target reader populations (PITR 2013)

Published in: on August 14, 2013 at 9:09 am  Leave a Comment