Its almost an year since Praneeth sent this paper and I read it…and began blogging about it. I began re-reading it today, as a part of my “evaluating the evaluation” readings, and thought I still have something to say (largely to myself) on some of the points mentioned in this paper.
Machine Learning that Matters
by Kiri L. Wagstaff
Published in proceedings of ICML 2012.
This is how it begins:
“Much of current machine learning (ML) research has lost its connection to problems of import to the larger world of science and society”
-I guess the tone and intention of this paper is pretty clear in this first sentence.
I don’t have any issues with the tone as such – but I thought there are so many real-world applications of machine learning these days! That doesn’t mean that every machine learning research problem leads to solving a real-world problem though, which holds good for any research. So, the above statement in my view can apply to any research in general.
I was fascinated by this statistics on the hyper-focus on bench marked datasets.
A survey of the 152 non-cross-conference papers published at ICML 2011 reveals:
148/152 (93%) include experiments of some sort
57/148 (39%) use synthetic data
55/148 (37%) use UCI data
34/148 (23%) use ONLY UCI and/or synthetic data
1/148 (1%) interpret results in domain context
-Since I am not into machine learning research but only use ML for computational linguistics problems, I found this to be very interesting… and a very valid point.
Then, the discussion moves on to evaluation metrics:
“These metrics are abstract in that they explicitly ignore or remove problem-specific details, usually so that numbers can be compared across domains. Does this seemingly obvious strategy provide us with useful information?”
-In the discussion that followed, there were some interesting points on what various evaluation metrics fail to capture etc. I have been reading on this topic of evaluation metrics for supervised machine learning in the recent past…and like with those, I am left with the same question even here – what is the best evaluation, then? Ofcourse, “real world”. But, how do you quantify that? How can there be some kind of evaluation metric, thats truly comparable with other peer research groups?
I got my answer in the later part of the paper:
Yet (as noted earlier) the common approach of using the same metric for all domains relies on an unstated, and usually unfounded, assumption that it is possible to equate an x% improvement in one domain with that in another. Instead, if the same method can yield profit improvements of $10,000 per year for an auto-tire business as well as the avoidance of 300 unnecessary surgical interventions per year, then it will have demonstrated a powerful, wide-ranging utility.
Next part of the discussion is on identifying where machine learning matters:
“It is very hard to identify a problem for which machine learning may offer a solution, determine what data should be collected, select or extract relevant features, choose an appropriate learning method, select an evaluation method, interpret the results, involve domain experts, publicize the results to the relevant scientific community, persuade users to adopt the technique, and (only then) to truly have made a difference”
-Now, I like that.🙂🙂
I also liked this point on the involvement of the world outside ML.
“We could also solicit short “Comment” papers, to accompany the publication of a new ML advance, that are authored by researchers with relevant domain expertise but who were uninvolved with the ML research. They could provide an independent assessment of the performance, utility, and impact of the work. As an additional benefit, this informs new communities about how, and how well, ML methods work.”
“Finally, we should consider potential impact when selecting which research problems to tackle, not merely how interesting or challenging they are from the ML perspective. How many people, species, countries, or square meters would be impacted by a solution to the problem? What level of performance would constitute a meaningful improvement over the status quo?”
-Well, I personally share the sentiments expressed here. I like and I want to work on problems whose solutions can possibly have a real life impact. However, I consider it my personal choice. But, I don’t understand what is wrong in doing something because its challenging! What’s wrong in researching for fact finding? There will be practical implications to certain research problems. There might not be an immediate impact for some. There might not be a direct impact for some. There might not really be a practical impact for some. But should that be the only deciding factor? (Well, of course, when the researchers are funded from public taxes, perhaps its expected to be thus. But, should it be thus, always??)
I found the six old and new Machine learning impact challenges really interesting.
Here are the new ones from the paper:
1. A law passed or legal decision made that relies on the result of an ML analysis.
2. $100M saved through improved decision making provided by an ML system.
3. A conflict between nations averted through high-quality translation provided by an ML system.
4. A 50% reduction in cybersecurity break-ins through ML defenses.
5. A human life saved through a diagnosis or intervention recommended by an ML system.
6. Improvement of 10% in one country’s Human Development Index (HDI) (Anand & Sen,1994) attributable to an ML system.
And finally, I found the last discussion on obstacles to ML impact also to be very true. I don’t know why there is so little work making machine learning output comprehensible to its users (e.g., doctors using a classifier to identify certain traits in a patient might not really want to see an SVM output and take a decision without understanding the output!) (atleast, I did not find too much work on Human Comprehensible Machine Learning)
As I read it again and again, this paper seems to me like a Theory vs Practice debate (generally speaking) and can possibly be worth reading for anyone outside machine learning community too (like it was useful for me!).
End disclaimer: All those thoughts expressed are my individual feelings and are not related to my employer.:-)