This year’s Conference on Knowledge Discovery and Data Mining (KDD) just wrapped up. It is the top conference for applying machine-learning and related strategies to processing information, so I enjoyed the opportunity to meet many of the top people in the field and gauge the pulse of the industry.
I was an invited speaker on the “Death of the expert? The rise of algorithms and decline of domain experts” panel. The panelists were Claudia Perlich (Media6Degrees), John Akred (Silicon Valley Data Science), Robert Munro (Idibon), and Chris Neumann (DataHero). It was moderated by Gregory Piatetsky (KDnuggets) and organized by Kaggle’s Jeremy Howard. Industry track panels tend to be hidden away at big conferences, but at this year’s KDD the industry track was one of the most popular at the conference, with our panel being moved from an obscure basement room to the main stage. According to KDnuggets, our panel outline was one of the top URLs for the conference. There was clearly a lot of interest in what we might say.
The domain of the experts
The panel was motivated from one observation about Kaggle competitions, where different people build machine-learning algorithms to compete on a task. For the majority of competitions, people with no domain knowledge beat out those with expertise. For example, a group of space agencies including NASA set a competition for how accurately algorithms could automatically identify the influence of dark matter in images of space. The leading system for much of the competition was from a student in glaciology. Jeremy Howard has, himself, won more than one competition in areas he had no particular knowledge in. So the question we debated was an important one: if machine-learning experts can create the best algorithms in any field, then why do we need domain experts?
The consensus was that there is still an important role for domain experts in framing the problem and making the results actionable. So, for example, an economist might beat education experts in designing an algorithm to automatically grade papers, but the education experts are still required to frame the problem in terms of grading criteria and sample data, and they are also needed to interpret the output and apply the algorithm effectively in their day-to-day work.
The micro-domain expert
I made the further argument that the role of expertise is also changing, and sometimes we only need people’s expertise for part of more complex problems. For Idibon’s field of Natural Language Processing, we don’t always need an expert in the given application. For example, if we are creating speech-recognition tools for a Spanish navigation system, we can solve many problems simply by employing native Spanish speakers: they don’t need to be experts in navigation systems, just in their language.
Many consumer products already implement this: you are an expert in many things that you type into search engines, and so search companies are watching your search patterns to improve their service. You are already a micro-domain expert for many of your searches, even if you don’t realize it, and the algorithms are improving as a result. (See also my post on how language technologies already underpin much of our lives).
Data janitors
To bring the discussion down a little, one question from the audience stood out for me: when will a computer win a Kaggle competition? At the time, I argued that a computer could already beat the average entry in the majority of competitions. Not as strong as winning, but still something to consider: could the machine learning experts be lost while the domain experts remain?
The answer here is still ‘no’. Kaggle competitions follow a similar formula that would lend themselves well to replication by a fully automated approach: you train your system on some amount of data and evaluate it on some remaining data. This approach, known as supervised learning, is the most common form of machine learning, and it is well understood. There are more complicated problems that require more hands-on work.
When I argued that we could automate this, I thought about how it would be relatively easy to automatically choose between algorithms and approaches and arrive at an optimal (or likely optimal) model in a fairly straightforward way. In essence, applying machine-learning to the machine-learning itself, or meta-learning. One of the biggest hurdles is not complex interaction between the different machine-learning algorithms, but simply reviewing the data and understanding what is being asked. It reminded me of a quote by Josh Wills of Cloudera, describing data science as being a “data janitor”, referring to how much time we spend cleaning data to make it suitable for machine-learning or analysis. It’s a fair comment, as I’m confident that I can replace data scientists in a streamlined approach to many machine-learning tasks, except what sounds like the easiest: eye-balling the data and making some simple assumptions about how to proceed.
Checkers -> Chess -> Go -> Language
The reality is that ‘eye-balling the data’ to prepare it for machine-learning is a highly-specialized skill. The best computational linguists I know can quickly look through documents and immediately identify the right strategy for cleaning the data and selecting the right features to support machine-learning or the best possible analysis.
This instinct is a learned one that is akin to the instinct used in strategy games. Computers bested Checkers players some time ago, and it was considered ‘solved’ about 5 years ago, meaning that a computer would always win or draw. Chess is not ‘solved’, but the best machines routinely beat the highest masters. For the game of Go, the best computers are still considered ‘amateurs’. In other words, the professional instincts of Go players easily out-match the computers with billions of CPU cycles. A Go board is 19×19 squares, making it a more complex problem than chess. Human language is much more complicated than a 19×19 grid, taking into account the full experience and expression of humanity. We are not in danger of it being surpassed by computers any time soon.
But as with any data processing problem, the skills of the computational linguists are also guided by domain experts, whether they are the micro-domain experts who understand a given language or the final users of the systems that we create. Perhaps the biggest take-away from the panel should be that an industry panel ended up on the mainstage. As machine-learning becomes a more widely used technology, we will be coming up against an increasing number of experts in new domains, and I am looking forward to seeing these collaborations evolve.
– Rob Munro