Machine Learning that Matters – Some thoughts.

Its almost an year since Praneeth sent this paper and I read it…and began blogging about it. I began re-reading it today, as a part of my “evaluating the evaluation” readings, and thought I still have something to say (largely to myself) on some of the points mentioned in this paper.

Machine Learning that Matters
by Kiri L. Wagstaff
Published in proceedings of ICML 2012.

This is how it begins:

“Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society”

-I guess the tone and intention of this paper is pretty clear in this first sentence.

I don’t have any issues with the tone as such – but I thought there are so many real-world applications of machine learning these days! That doesn’t mean that every machine learning research problem leads to solving a real-world problem though, which holds good for any research. So, the above statement in my view can apply to any research in general.

I was fascinated by this statistics on the hyper-focus on bench marked datasets.

A survey of the 152 non-cross-conference papers published at ICML 2011 reveals:
148/152 (93%) include experiments of some sort
57/148 (39%) use synthetic data
55/148 (37%) use UCI data
34/148 (23%) use ONLY UCI and/or synthetic data
1/148 (1%) interpret results in domain context

-Since I am not into machine learning research but only use ML for computational linguistics problems, I found this to be very interesting… and a very valid point.

Then, the discussion moves on to evaluation metrics:

“These metrics are abstract in that they explicitly ignore or remove problem-specific details, usually so that numbers can be compared across domains. Does this seemingly obvious strategy provide us with useful information?”

-In the discussion that followed, there were some interesting points on what various evaluation metrics fail to capture etc. I have been reading on this topic of evaluation metrics for supervised machine learning in the recent past…and like with those, I am left with the same question even here – what is the best evaluation, then? Ofcourse, “real world”. But, how do you quantify that? How can there be some kind of evaluation metric, thats truly comparable with other peer research groups?

I got my answer in the later part of the paper:

Yet (as noted earlier) the common approach of using the same metric for all domains relies on an unstated, and usually unfounded, assumption that it is possible to equate an x% improvement in one domain with that in another. Instead, if the same method can yield profit improvements of $10,000 per year for an auto-tire business as well as the avoidance of 300 unnecessary surgical interventions per year, then it will have demonstrated a powerful, wide-ranging utility.

Next part of the discussion is on identifying where machine learning matters:

“It is very hard to identify a problem for which machine learning may offer a solution, determine what data should be collected, select or extract relevant features, choose an appropriate learning method, select an evaluation method, interpret the results, involve domain experts, publicize the results to the relevant scientific community, persuade users to adopt the technique, and (only then) to truly have made a difference”

-Now, I like that. 🙂 🙂

I also liked this point on the involvement of the world outside ML.

“We could also solicit short “Comment” papers, to accompany the publication of a new ML advance, that are authored by researchers with relevant domain expertise but who were uninvolved with the ML research. They could provide an independent assessment of the performance, utility, and impact of the work. As an additional benefit, this informs new communities about how, and how well, ML methods work.”

“Finally, we should consider potential impact when selecting which research problems to tackle, not merely how interesting or challenging they are from the ML perspective. How many people, species, countries, or square meters would be impacted by a solution to the problem? What level of performance would constitute a meaningful improvement over the status quo?”

-Well, I personally share the sentiments expressed here. I like and I want to work on problems whose solutions can possibly have a real life impact. However, I consider it my personal choice. But, I don’t understand what is wrong in doing something because its challenging! What’s wrong in researching for fact finding? There will be practical implications to certain research problems. There might not be an immediate impact for some. There might not be a direct impact for some. There might not really be a practical impact for some. But should that be the only deciding factor? (Well, of course, when the researchers are funded from public taxes, perhaps its expected to be thus. But, should it be thus, always??)

I found the six old and new Machine learning impact challenges really interesting.
Here are the new ones from the paper:

1. A law passed or legal decision made that relies on the result of an ML analysis.
2. $100M saved through improved decision making provided by an ML system.
3. A conflict between nations averted through high-quality translation provided by an ML system.
4. A 50% reduction in cybersecurity break-ins through ML defenses.
5. A human life saved through a diagnosis or intervention recommended by an ML system.
6. Improvement of 10% in one country’s Human Development Index (HDI) (Anand & Sen,1994) attributable to an ML system.

And finally, I found the last discussion on obstacles to ML impact also to be very true. I don’t know why there is so little work making machine learning output comprehensible to its users (e.g., doctors using a classifier to identify certain traits in a patient might not really want to see an SVM output and take a decision without understanding the output!) (atleast, I did not find too much work on Human Comprehensible Machine Learning)

As I read it again and again, this paper seems to me like a Theory vs Practice debate (generally speaking) and can possibly be worth reading for anyone outside machine learning community too (like it was useful for me!).

End disclaimer: All those thoughts expressed are my individual feelings and are not related to my employer.:-)

Published in: on March 26, 2013 at 12:35 pm  Comments (22)  

The URI to TrackBack this entry is:

RSS feed for comments on this post.

22 CommentsLeave a comment

  1. Nice post. The paper is worth reading.
    You might like this

  2. What is machine learning in a nut shell??

    • Hi Seshu:
      To put it very briefly, its about teaching the computer to learn patterns from the data we provide so that it can predict some information on future data.

    • ok…it is a search for a defined order in the randomness with the help of computer programming. That’s what I understood. Am I correct?

    • @Seshu: Yes.

    • Do you have deep understanding about emergent properties? and chaos theory.

    • No.

    • ok..

  3. Nice post. 🙂 I didn’t read the paper yet. But, your blog shows that its like an interesting debate between ‘basic mathematics’ and ‘applied mathematics’. Its somewhat analogous to debate between ‘developing new machine learning techniques’ and ‘applying machine learning’.

    Many mathematical theories like ‘number theory’ had no application significance when they were first developed. They were not even developed with the intention of being useful to any problem!! They were developed for fun and to progress human analytical thinking. Many mathematical theories which are developed just for their own sake are being used a lot many years after their invention. Many theories were lost in time and many theories were put in widespread usage. But, no one knew which theories are useful when they were first developed.

    I think similar argument applies to machine learning techniques as well even though ‘machine learning’ is more towards applied mathematics. May be, its true that more usefulness is demanded from ML techniques. I am new to machine learning field and I can’t comment on this highly debatable topic. Thanks for sharing that paper and your post. 🙂

  4. A lot of people have viewed this as a “theory versus applications” paper. In fact, it’s more a critique of poorly-executed experimental work: papers that claim to have solved a problem and present a collection of results on (usually benchmark) data sets but have given little thought to the appropriate metrics needed to truly measure whether they’ve solved the problem they originally identified, in a meaningful way.

    Purely theory papers do not exhibit the drawbacks and limitations that this paper highlights.

    But look at those statistics: 93% of ICML papers in 2011 presented experimental results as part of their content. These are not purely theory papers. The majority of the ML research community evidently wants to read about and to engage in experimental work. The idea, therefore, is to nudge this experimental work in a more meaningful direction.

    • Thanks for your comment, Kiri!
      I now get a better picture @”purely theory papers do not exhibit the drawbacks..”. Thanks for pointing out the difference.

    • Thanks for your thoughtful post on this subject in general. I’m glad that the paper has inspired additional discussions!

    • I was discussing this with a friend again and a question came up: Won’t even purely theory papers also need something to evaluate their algorithm on? They still would need to show that their algorithm actually works on some data (real world or synthetic), isn’t it? After all, what does the machine learn, if it does not have any data? Then, how can’t this be a theory vs practice paper?

      (I am not a Machine learning researcher and I did not read “pure theory” machine learning papers till date, except for tutorials.)

    • Some papers make an important theoretical contribution without doing experimental work. An example would be a paper that proves an upper bound on performance achievable by someone else’s algorithm or for a particular kind of problem. Another example would be the No Free Lunch theorems that show that no single ML algorithm can ever be superior to all others on all problems — a very interesting and important result! It is also not a statement that can be proven via experiments (it’s difficult to prove the non-existence of something unless you can exhaustively test every possibility).

  5. Thanks Kiri, for your patient explanation and quick response!

  6. “For some reason, the more a project has to count as research, the less likely it is to be something that could be turned into a startup. I think the reason is that the subset of ideas that count as research is so narrow that it’s unlikely that a project that satisfied that constraint would also satisfy the orthogonal constraint of solving users’ problems.” — Paul Graham.

    Ofcourse it’s a broad generalization even if we consider just IT startups.

    • I really am not sure if this fits this context… but I was reading something and stopped at this sentence:
      “Knowledge without appropriate procedures for its use is dumb, and procedure without suitable knowledge is blind” – Herbert Simon

  7. Thanks for the well thought off analysis of the topic and insightful discussion.. 🙂

  8. […] An older blogpost on the above paper which had some good discussion in the comments […]

  9. […] about whenever this sort of discussion comes up with fellow grad-students. (My thoughts on it here). In the past few days, there have been a lot of short online/offline discussions about how an […]

  10. If you’d like to engage in more in-depth discussion of these issues, please join us at

    • Thanks for the link!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: