MLSS 2013 – Week 1 recap

I am attending this year’s Machine Learning Summer School and we just finished one week of lectures. I thought now is the moment to look back and note down my thoughts (mainly because we thankfully don’t have lectures on sundays!). One more week to go and I am already very glad that I am here listening to all these amazing people who are undoubtedly some of the best researchers in this area. There is also a very vibrant and smart student community.

Until Saturday evening, my thoughts on the summer school focused more on the content of the sessions. They were mostly about the mathematics in the sessions, my comfort and discomfort with it, their relevance, understanding the conceptual basis of it etc., I won’t make claims that I understood everything. I understood some talks better, some talks not at all. I also understood that things could have been much better for me if we were informed about why we need to actually seriously follow all the Engineering Mathematics courses during my bachelors ;).

However, coming to the point, as I listened to the Multilayer Nets lecture by Leon Bottou on Saturday afternoon, there was something that I found particularly striking. It looks like two things that I always thought of as possibly interesting aspects of Machine Learning are not really a part of the real machine learning community. (Okay, one summer school is not a whole community but I did meet some people who have been in that field of research for years now).

1) What exactly are you giving as input for the machine to learn? Shouldn’t we give the machine proper input for it to learn what we expect it to learn?

2) Why isn’t the interpretability of a model an issue worth researching about?

Let me elaborate on these.

Coming to the first one, this is called “Feature Engineering”. The answer that I heard from one senior researcher for this question was: “We build algorithms that will enable the machine to learn from anything. Features are not our problem. The machine will figure that out.” But, won’t the machine need the right eco-system for that? If I grow up in a Telugu speaking household and get exposed to Telugu input all the time, will I be expected to learn Telugu or Chinese? Likewise, if we want to construct a model that does a specific task, is it not our responsibility to prepare the input for that? Okay, we can build systems that figure out the features that work by itself. But won’t that make the machine learn anything from the several possible problem subspaces, instead of the specific issue we want it to learn? Yes, there are always ways to assess if its learning the right thing. But, thats not the point. In a way, this connects again to the second question.

Am not knowledgeable enough on this field to come up with a well-argued response to that above comment by the senior researcher. The matter of fact is also that there is enough evidence that that approach does work in some scenarios. But, this is a general question on the applicability of the models, issues regarding domain adaptation if any etc. I found so less literature on theoretical aspects connecting feature engineering to algorithm design and hence these basic doubts.

The second question is also something that I have been thinking about for a long time now. Are people really not bothered about how those who apply Machine Learning in their fields interpret their models or am I bad at searching for the right things? Why is there no talk about the interpretability of models? I did find a small amount of literature on “Human comprehensible machine learning” and related research, but not much.

I am still in the process of thinking, reading and understanding more on this topic. I will perhaps write another detailed post soon (with whatever limited awareness I have on this topic). But, in the mean while,

* Here is a blogpost by a grad student, that has some valid points on interpretability of models.

* “Machine Learning that matters“, ICML 2012 position paper by Kiri Wagstaff. This is something that I keep getting back to time and again, whenever I get into thinking about these topics. Not that the paper answers my questions.. it keeps me motivated to think on them.

* An older blogpost on the above paper which had some good discussion in the comments section.

With these thoughts, we march towards the second week of awesomeness at MLSS 2013 :-).

Published in: on September 1, 2013 at 3:31 pm  Comments (1)  

The URI to TrackBack this entry is:

RSS feed for comments on this post.

One CommentLeave a comment

  1. There’s a separation of concerns between “Machine learning people” in computational linguistics — who try to find good representations for linguistic objects, possibly task specific, and “Machine learning people” in mathematics, who design algorithms that, given a good representation, do learning and inference.
    This separation of concerns is similar to the one between people building and annotating a corpus and those doing feature engineering and doing experiments: it’s usually a win-win situation for everyone involved because you can take a bigger corpus and immediately reap the benefit, but also take a better way of representing it and benefit from it, or take superior machine learning machinery and have things improve without further improvements. On the other hand, one feels that there should be something to be gained if the three groups actually talked to each other.
    Contrast this to the old way of doing things (people may call this constraint-based or rule-based or grammar-based, to various extents) where you first collect a corpus, then build an internal model of how things work, and then operationalize both your intuitions of how things work linguistically and how you would do best to infer things into a running program. In that case, if you notice after the fact that linguistic assumption A was stupid, you don’t get anything from it, not more than if you notice that the heuristic B you used for creating rule #42 was kind of stupid and that you should have done it differently from the start.
    Interpretability of machine learning models is difficult in most cases because you’re effectively harnessing correlation effects (e.g. “in the domain that interests you, verb X mostly has this frame, hence an unknown word that follows it is more likely to be a proper name”) rather than purely building a model of how things work. And many benefits of ML come from the fact that you don’t need to be afraid of that, because you can annotate data for a different domain (or, if you’re feeling sophisticated, do domain adaptation).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: