Skip to content

Latest commit

 

History

History
40 lines (31 loc) · 4.06 KB

File metadata and controls

40 lines (31 loc) · 4.06 KB

Data Science Basics 101

Introduction

This introduction will cover the basic knowledge you need to begin learning data science. This includes: notions of statistics, programming languages and basic data manipulation techniques. Without a minimal foundation in mathematical and programming techniques, much of what follows in this learning path will be difficult to understand and assimilate.

Having fun with data science (motivaitional intro)

The following courses will give you a broad overview of what data science is and what kinds of problems you can solve with it.

Course Link Programming Environment Notes
A crash course in data science none Very Basic. This course will help you understand what kind of tool data science is, when to use it, and why it is so useful. Appears to have a project-management slant.
Data Science Essentials R, Python, Azure Part 1 of 2. Learn to clean and explore data sets.
Principles of Machine Learning R, Python, Azure Part 2 of 2. Learn the basic techniques involved in data science.

Modern Data Science Programming Languages: R or Python

Pick your weapon! You cannot do data science without knowing how to program, and today there are two primary languages used for doing data science: R and Python. You will proabably have to be conversant in both at some point, but for now pick one and use it as much as possible throughout this learning path. Scientists and statisticians tend to prefer R, programmers tend to prefer Python.

Course Link Programming Environment
R Programming R
Introduction to R for Data Science R
Intro to data science in Python Python

You will need to setup a programming environment, which you should be able to use throughout this learning path. Take some time to get this setup correctly. The easier it is to navigate your environment, the easier and more enjoyable it will be to do data science.

Programming Language Environment Notes
Python Jupyter notebook Amazingly simple REPL environment. Highly recommended. You can use a free version hosted in Microsoft Azure.
Python PyCharm One of the environments used in industry. There is a free community version available.
R R Studio This is the workhorse and industry standard environment for R
R Jupyter notebook The jupyter notebooks were initially developed for Python, but there is now alos n R kernel you can use. Installing the R kernel for jupyter notebooks.

Statistics

You cannot complete even the simplest data science project without a basic understanding of statistics. You will need to take at least one course in straight, boring statistics. But you will continually thank yourself for it. At the very least, make sure you understand the concepts of a probability, a probability distribution (such as the normal distribution) and a test statistic (like the F-test).

Course Link Programming Environment
Basic statistics R
Essential Statistics for data analysis using Excel Excel
Statistical Thinking for data science ?