Learn Data Science the Hard Way

So you want to be a Data Scientist? The good news is that there are tons of great resources out there to learn from. The bad? None is comprehensive, and choosing the best can be completely overwhelming. I created this list to help you stay focused on learning what’s important, the easiest way possible.

But it won’t be easy…

Data Science combines Statistics, Programming, Machine Learning, and Visualization, amongst other disciplines. Simply put, there is a lot to learn. I took every course and read every book on this list, and it took me approximately 210 hours, over a few months.

Ready to dive in? Great! I would love to hear about your experience learning Data Science, and answer any questions. Tweet this post below and let me know how it’s going.

Finally, good luck, and have a lot of fun. I certainly did.


1. Immerse Yourself

We start with some light reading and listening. You can’t spend all your time reading textbooks and taking courses. Get these books and podcasts, and read or listen to them throughout your studies.

12 hr | $29 Read The Signal and the Noise by Nate Silver

A fun introduction to Data Science, that will teach you how to think like a data scientist.

9 hr | $17 Read Naked Statistics by Charles Wheelan

An easy introduction to statistics, without getting too deep into the maths.

free Subscribe to the Data Skeptic podcast

Features conversations with data science experts, as well as great mini episodes which teach the basics.

free Subscribe to the Partially Derivative podcast

A weekly discussion about Data Science related news.

free Subscribe to the Data Science Weekly newsletter

Data Science news in your inbox, weekly.


2. Learn Python

Programming is a key part of Data Science. There’s an on-going debate about whether you should learn R or Python first. It’s better to pick one than spend your time debating the best. Choose Python.

6 hr | free Do the Learning Python mission at DataQuest

You’ll learn Python interactively while playing with real data.

If you’re new to programming you may need a more thorough introduction. In that case:

40 hr | $30 Read Learn Python the Hard Way

A great introduction to programming using Python.

Otherwise, you’ll pick it up quickly using:

1 hr | free Read Learn Python in Y minutes

This is a really fast way to learn Python if you’re already a programmer


3. Learn the Big Picture

There are a lot of aspects to Data Science. In this unit you’ll focus on learning how they all fit together. Get a little breadth in your diet.

10 hr | $32 Read Data Science from Scratch

This is a fantastic book that introduces you to Data Science, using Python

5 hr | free Take the Data Analysis and Data Visualization missions at Data Quest

These will teach you about numpy, pandas, and matplotlib, three crucial tools for your toolbelt.


4. Learn Statistics

Statistics is the foundation for much of Data Science. It is the tool we use to rigorously reason about the world using data.

7 hr | free Take Udacity’s Intro to Descriptive Statistics course

This course seems overly simplistic at times, but it’s a good refresher on descriptive statistics. Tip: Set the playback speed to 1.5x.

10 hr | free Take Udacity’s Intro to Inferential Statistics course

This course is also a little simple. It’s still worth going through to get a strong grip on hypothesis testing, which is critical in Data Science.

40 hr | $79 Read All of Statistics

If you really want to master statistics, this is the book for you. Don’t get too bogged down with the details, but take a good read through it and use it as a reference for the rest of your career.


5. Learn Machine Learning

Machine Learning is a hot topic, and a big driver of the recent flood of interest in Data Science. It’s also a very deep field.

20 hr | free Take Udacity’s Introduction to Machine Learning course

This is a very practical, hands-on course. You learn how to apply machine learning algorithms using the sklearn Python package.

30 hr | free Take Coursera’s Machine Learning course

This is a more theoretically rigorous course. It is fantastically done.


6. Practice

Now use your skills and go out and do some actual data science!

8 hr | free Complete a Kaggle competition

Kaggle provides the data, you provide the science. Try some of their “Knowledge” competitions to get some practice.

12 hr | free Do your own analysis

Find a real dataset on Data.gov, perform a real analysis, and publish your findings online.