In the series of Data Mining Research Interview, I would like to welcome Anthony Goldbloom, CEO of Kaggle. He kindly accepted to answer the following questions for the readers of Data Mining Research.
Data Mining Research: Could you introduce yourself and explain your way to Kaggle?
Anthony Goldbloom: I am Kaggle’s founder and CEO. I am an econometrician by training and have previously worked in the macroeconomic modeling areas in Australia’s Treasury and the Reserve Bank of Australia (Australia’s central bank).
Kaggle was inspired by a journalism internship I did at The Economist magazine in 2008. During this stint, I wrote an article about the use of data in modern organizations. Researching the piece, I interviewed fascinating companies including a consultancy that was identifying swing voters for Barack Obama’s presidential campaign, and dunnhumby, the number-crunching outfit that is said to have transformed Tesco from a down-market British supermarket to the world’s third largest retailer. I became excited about the power of data and eventually left my day-job as an econometrician to found Kaggle.
DMR: What is Kaggle? What are the main objectives of such a platform?
AG: Kaggle works on the premise that predictions are critical to modern life. Police predict where and when crimes are most likely to take place, banks predict which loan applicants are most likely to default and bioinformaticians predict phenotypes from gene sequences. Kaggle aims to help companies and researchers make predictions more precise by providing a platform for data prediction competitions.
Competitions turn out to be a great way to get the most out of a dataset. This is because there are infinitely many approaches to any data modeling problem. By opening up a data prediction problem to a wide audience, a competition makes it possible to get to the frontier of what is possible given a dataset’s inherent noise and richness. For example, for the HIV competition currently on the site, the best submissions had outdone the best methods in the scientific literature within a week and a half.
DMR: As a data miner, how would I benefit from Kaggle personally and/or for my company?
AG: Aside from the lure of prize money, Kaggle offers data miners access to interesting new problems and new datasets. What’s more, by the time a problem is posted, the data has been cleaned and is ready for modeling.
It also offers you instant feedback (via the leaderboard) – giving data miners a way to benchmark their techniques against other competitors. This also allows data miners to prove their mettle and enhance their reputations, while allowing researchers to demonstrate the veracity new techniques.
For those considering posting a problem, competitions are a great way to get to the frontier of what can be done with a given dataset.
DMR: What is the future of Kaggle? What kind of improvements could be done?
AG: We are in the process of sprucing up the site and extending its functionality. For instance, we plan to launch a league table over the next few months, which data miners can use to demonstrate their ability to potential employers and clients.
Kaggle is now looking for other interesting challenges. We are starting to approach companies and researchers who might be interested in posting their problems. If any readers are interested in posting a challenge – they should email me at email@example.com.
DMR: Thanks for your time Anthony.
You can find more information about Kaggle on their website: www.kaggle.com.