Data Mining Research (DMR): Can you tell us who you are and how you came to the field of Data Science?
Jerome Berthier (JB): My name is Jerome Berthier, I am an engineer in Computer Science and I have an MBA in management. After 10 years working in different roles for an IT provider (developer, sales representative, managing director), I joined ELCA in 2012 to head the BI division. At that time Big Data was starting to become a mainstream concept in IT and I had the great opportunity to be in charge of developing ELCA’s expertise in this field.
We had to start from scratch, as Big Data was completely new to us, and so, to research the subject we engaged an expert in NLP and created our own Big Data Lab devoted to testing and evaluating algorithms and Big Data solutions available on the market, basically anything that could help us to better understand the principles of Big Data and how they can be applied. This research was an amazing experience which achieved excellent results.
Since then, I have kept up to date on the evolution of Big Data and have been active in raising awareness of its benefits, through presentations in various contexts: for Elca customers, at IT events and in media interviews.
DMR: On your LinkedIn profile, you describe yourself as “A Voice” of Data Science. What do you mean by that?
JB: To be honest, I don’t believe that any individual data scientist can be skilled in all fields: IT, marketing, communication, maths, sales, statistics, business (banking, insurance, travel…) and so on : an All-in-One data scientist, so to speak. Of course, if you know one, I would be delighted to meet them and have them in my team.
The applications of Data Science are often highly specialized and there are several types of data scientist. The key is to take advantage of these different profiles and involve them together to create a strong team of data scientists. The SAS institute defines 7 basic profiles: http://www.sas.com/content/dam/SAS/en_gb/image/other1/events/WMAGDS/DataScientist-survey-report-web%20FINAL.pdf
- The Geeks 41%: The Geeks are the largest group in our sample and have the largest female membership of all the groups at 37 per cent. They have a naturally technical bias, strong logic and analytical skills. Essentially “black and white” thinkers, they like to speak plainly and stick to the point – don’t expect them to be moved by emotionally charged arguments. With their attention to detail and fondness for the rules, the Geeks are well suited to roles such as defining systems requirements, designing processes and programming.
- The Gurus 11%: The next largest group, the Gurus, has a measure of reactive introversion, like the Geeks, which pre-disposes them to scientific and technical subjects. Yet they also display a diametrically opposite characteristic: the strong presence of proactive extroversion, including solid and often highly persuasive communications and social skills. The Gurus can play a very important role by using their enthusiasm, tact and diplomacy to promote the benefits of the data sciences to those holding the purse strings, or who have the authority to give projects the green light.
- The Drivers 11%: The Drivers are proactive introverts: highly pragmatic individuals who use their determination and focus to realise their goals. Self-confident and results-oriented, they are ideal project managers and team leaders, who excel at prioritising, monitoring and driving projects to a successful conclusion.
- The Crunchers 11%: This category is probably one of the least self-promoting groups. Strongly reactive – rather than proactive – personalities, the Crunchers like routine and constancy. They display high technical competence and consistency, making them superb in a range of technically-oriented support roles including data preparation and entry, statistical analysis, monitoring of incoming data and quality control.
- The Deliverers: 7% Like the Drivers, these individuals are proactive and well suited to project and man management. This is also the group with the largest proportion of men at 80 per cent. However, the Deliverers also have a strong pre-disposition towards acquiring and/or applying technical skills. So, while they are capable of bringing focus and momentum to ensure project success, they are also likely to understand the finer technical details and devise solutions in much greater technical depth.
- The Voices 6%: The Voices are strong communicators with less apparent detailed technical knowledge than the Gurus. The presence of this group suggests a strong demand for natural promoters who have the ability to generate enthusiasm for the potential of big data and the data sciences at a conceptual level – rather than the practical or technical level. The Voices are strongly valued for their positive outlook, and may be engaged in presenting the results of big data projects as well as supporting their implementation.
- Other Personalities 13%: A smaller number of respondents displayed a range of other traits.
- The Ground Breakers: offer new approaches, new methods and new possibilities, drawn from a mix of inspiration and dogged logical thinking. Roles include: system design and algorithm development.
- The Seekers: combine superb technical knowledge and understanding with inquisitiveness and a drive to find solutions. Roles include: research.
- The Teachers: skilled at imparting knowledge and inspiring others to want to learn. Roles include: training and mentoring.
- The Lynchpins: important team players who may not have a depth of technical knowledge but provide essential support services. Roles include: co-ordination and administration.
So I am a “Voice”. Why did I decide to use this term?
Often our customers prefer to start with POC or POV to evaluate the potential of Big Data. But even if the results show great promise, it is still not easy to get the necessary budget from the Board (C-level) to go a step further, because past IT solutions did not necessarily live up to expectations. So it is my role to accompany the teams in their POC in order be familiar with the company context and to assist them in presenting the results to C-levels in such a way as to encourage continued support…
DMR: What do you see as the main skills that a Data Scientist needs in 2017?
JB: All of them, of course, but I see 3 of them as being especially important:
- Big Data has changed the relationship between infrastructure and analytics. In the past, there were 2 silos: on one hand, analytics; and on the other hand, infrastructure. Now it’s impossible to do useful analysis if you don’t understand the underlying architecture of the infrastructure; and it’s impossible to size the infrastructure accurately if you don’t know what type of analysis will be used. So knowledge of Big Data infrastructure will be as important as ever.
- Content analytics is a high priority for me because more and more projects use flow aggregation, Chabot, document analysis, email analysis …
- Finally, complex event processing is becoming increasingly relevant.
DMR: Based on your personal experience, what is the biggest challenge you face when involved in Data Science projects?
JB: For me, there are 2 key challenges:
- First of all, Data is our “raw material”. Without it we are out of work. I often read and hear that the world is now submerged in data… That may be true but how much of this data is really relevant? I have seen a number of projects in which important data is lacking (which seems incredible when we realize how much data we possess…) or/and to which important data is not accessible. Quality/governance of data is clearly a big problem everywhere.
- Another problem is habit. How can we explain to people who have spent years and years working in the same way that everything must change drastically because all of the old habits have now become obsolete? I once did a customer segmentation project which showed that the existing segmentation was irrelevant because not based on data analysis.
DMR: If you could give just one piece of advice to a future data scientist, what would it be?
JB: Open a book on Business Intelligence. I often receive new data scientists who know nothing about BI, ETL, SQL language or even what a database is…
I consider Big Data to be an evolution of our classic BI, but the basic principles stay the same: collection of data, transformation, merging, analysis and finally decision-making based on the results.
Today unstructured data can be used, merging can be on the fly and decisions can be predictive or prescriptive, but the principles remain the same.
Furthermore, outside of Academia, each time you will want to work with big data, you will have to deal with old BI and old sources of data. So it’s necessary to understand how the old system works, too.
This interview was originally published in the Swiss Analytics Magazine.