By Gregory Piatetsky, Apr 12, 2014.
A young team of 3 UC Berkeley students (Brian Liou, Tristan Tao, and Elizabeth Lin)
has produced a Data Analytics Handbook which includes interviews with data scientists and tech leaders available at
- free download.
Part 1 includes interviews with Data Scientists from
LinkedIn, Cloudera, Facebook, Yelp, HG Data, and Flurry.
Top takeaways include:
1. Communication skills are underrated
If you can't present your analysis into digestible concepts for your CEO to
understand, your analysis is only useful to yourself.
2. The biggest challenge for a data analyst isn't modeling, it's cleaning and collecting
Data analysts spend most of their time collecting and cleaning the data required
for analysis. Answering questions like "where do you collect the data?", "how
do you collect the data?", and "how should you clean the data?", require much
more time than the actual analysis itself.
3. A Data Scientist is better at statistics than a software engineer and better at software engineering than a statistician
The greatest difference between a data scientist and a data analyst is the
understanding of computer science and conducting analysis with data at scale.
Data scientists only need a basic competency in statistics and
computer science and not all are Ph.Ds. New tools are empowering more people to do data science.
Part 2 includes interviews with CEOs and Managers from
Mode Analytics, Smarter Remarketer, Cloudera, Stylitics, Flurry, Yhat, Persontyle, and BigML.
Top Takeaways include:
3. Do your own projects to break into the industry.
The truth is, even in a quantitative major you are not taught what you need to
know to work in data analytics. There is a learning gap between academia and
industry that is best filled by doing projects. Find some sports statistics and do
your own analysis. Learn R so that you can complete this analysis, not just to
learn R itself. Also try Kaggle.
4. Statistics > Programming.
The development of tools and popularity of programmers has caused black
box statistical analysis usage. Understanding selection bias vs. sampling bias
and the underlying assumptions to which statistical functions are built on will
make your opinions matter and your work invaluable.
5. The most important skill is being able to ask the right questions.
The power of data analytics is in taking open response questions and framing
them to be multiple choice. Therefore if you have the ability to filter a million
questions into options A through D, you are a data scientist for hire.
Part 3 of the handbook
which includes interviews with Academics and Research Leaders, including Hal Varian (Chief Economist, Google), Tom Davenport (Professor, Babson College) and me - Gregory Piatetsky (Editor, KDnuggets).