Big Data Wizards on Kaggle: Who Are They, and What Do They Have in Common?October 29, 2013 by David Fried
Kaggle (founded in April 2010), is an online platform that hosts data analytics competitions. Renowned companies and researchers, including NASA, Deloitte and Allstate, post their data online and present a challenge to solve, and community members mine this data and try to produce the best models in order to win.
The nature of these competitions runs the gamut: a GE contest aiming to make flights more efficient is awarding $250,000 in total prizes, including a $100K grand prize for the winner. Some contests offer a job opportunity (e.g., data analyst at Facebook). At the other end of the spectrum, some contests offer “swag,” (e.g., a T-shirt) or no prize at all (usually referred to as “kudos” or “knowledge”).
A Kaggle competition hosted by Facebook
Even those competitions without monetary prizes boast serious bragging rights for winners, so it’s no wonder Kaggle has attracted some 95,000 data analysts worldwide. We decided to analyze this rich talent pool and profile top Kaggle users in an effort to understand what today’s “big data wizards” look like, and what traits they share.
Kaggle posts a user rankings board, which ranks community members according to the number of points they’ve accumulated. Points are assigned based on members’ current level of activity and the results they’ve produced.
The top 10 ranked Kaggle users (as of 10/25/13)
We examined the top 100 Kaggle users (as of October 15, 2013) for the data they’ve made publicly available. Here are our top findings:
- Over 80 percent of these top 100 performers, including all of the 21 highest-ranked individuals, have a Master’s degree or higher. Thirty-five percent have a Ph.D.
- Analysts come from a broad variety of backgrounds. Although computer sciences and mathematics top the list of most-mentioned specialties, individuals come from many walks of life and include economists, cognitive scientists, MBAs and even an attorney.
- Top performers are all over the map–literally. The U.S. has the highest concentration of top performers (30 of the 96 who disclosed their country of origin), but altogether 29 total countries are represented in this top 100 list.
- Prize wins correlate with advanced degrees and number of contests entered.
Ph.D.s and Masters of Science Are Common Degrees
Among the top 100 Kaggle performers, we were able to find data on the educational background of 60. Of those, 21 have Ph.D.s, 21 have one or more Master of Science degrees, six have another type of Master’s degree (three MBAs, two M.A.s and a Master of Philosophy), one has a J.D., nine have a Bachelor’s degree and two simply identify themselves as a “student.”
The higher degrees also appeared to be concentrated at the top of the list. The top 21 performers all have an M.S. or higher: 9 have Ph.D.s and several have multiple degrees (including one member who has two Ph.D.s).
Degree breakdown of the top 100 Kaggle performers
All of this points to an obvious conclusion: education proves very valuable in delivering high quality data analysis algorithms.
Areas of Study Are Varied
Given that our participants have a variety of highly specialized degrees from all over the world, we categorized areas of study into eleven buckets. Of the 80 total responses, computer sciences and mathematics are the top areas of study, with 16 and 15 mentions, respectively. These are followed by engineering and economics/econometrics.
This data indicates that the skills necessary to be a data “wizard” can be learned in disciplines other than computer sciences and mathematics. While most of these areas of study have obvious implications in the realm of data analysis, several others came up that demonstrate a more diverse palette: philosophy, food policy, political sciences, and law, to name a few.
Experts Are Widely Dispersed Across the Globe
Where do data analysts come from? Everywhere, is the short answer.
Of the top 100 Kaggle performers, 96 have a country listed under “location,” and 29 countries are represented within this sample. The United States has the most members in this list (30), followed by Russia (nine) and India (six). Five countries (Canada, United Kingdom, Germany, Hungary and Japan) have four members in the top 100, two countries (Spain and Australia) have three, and six (China, Ukraine, Brazil, Hong Kong, South Korea and the Netherlands) have two.
The countries’ performances do not seem to be concentrated anywhere in particular–in other words, Americans, Russians and others are evenly dispersed throughout the top 100.
Prize Winners Enter More Contests
Since Kaggle’s algorithm is designed to reward activity as much as performance (e.g., participation in the forums is one way that users can earn “Kaggle points,” and the value of a contest win diminishes over time), our data was not ideal for quantitatively assessing overall performance. However, we did notice an interesting trend based on prizes won.
Of the top 100 users analyzed, 69 are prize winners. The five individuals with the most wins came from all over the world and studied a variety of technical disciplines. Here are their profiles:
|PW||T10||T25||CE||HLE||Area of Study||Home Country|
|7||11||16||16||Ph.D.||Elementary & particle physics||United Kingdom|
|6||15||20||22||M.S.||Actuarial science & statistics||Signapore|
|4||11||14||15||Ph.D.||Physics & mathematics||Ukraine|
PW = Prizes won, T10 = Top 10 percent finishes, T25 = Top 25 percent finishes, CE = Contests entered, HLE = Highest level of education achieved
In addition, there are 10 individuals who have won three prizes each. Half of them have Ph.D.s, and all but one have entered 11 or more contests. (The one exception has been a member for less than a year and has been very active in that time, entering eight contests in 10 months.)
The above data suggests that the number of prizes won is associated with not just an advanced education, but high levels of activity, or the number of contests entered. Notable findings include:
- The top five prize winners entered an average of 19 contests and won prizes 28 percent of the time.
- The next 10 prize winners entered an average of 15 contests and won prizes an average of 20 percent of the time.
- In sum, the best performers on Kaggle can expect to be prize winners just slightly more than one-fifth of the time.
So, what have we learned? Top data analysts on Kaggle hail from all over the globe. Seventy percent of the top 100 top performers (for which data is available) have a Master’s of Science or Ph.D, and computer science and mathematics are the most common areas of study. And, perhaps most interestingly–albeit not surprisingly–the most successful performers are also those who consistently take on the most challenges.