Repository files navigation
֍ Pre requisites for starting your Machine Learning journey:
Matrices and Linear Algebra Fundamentals
Database Basics:-
Relational vs Non Relational Databases
SQL + Joins
NoSql
Tabular Data
DataFrames and Series
Extract, Transform, Load(ETL)
Reporting vs BI vs Analytics
Data Formats:-
Regular Expressions (RegEx)
Python Basics:-
Expressions
Variables
Data Structures
Functions
Install packages (via pip, conda, e.t.c)
Important Libraries:-
Virtual Environments
Jupyter Notebooks
Data Mining
Web Scraping
Public Datasets
Kaggle
֍ Exploratory Data Analysis/Data Munging/Wrangling:-
Principal Component Analysis (PCA)
Dimensionalty and Numerosity Reduction
Normalisation
Data Scrubbing, Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling
Probability Theory
Randomness, random variables and random sample
Probability distribution
Conditional probability and Bayes Theorem
Statistical Independence
Cumulative distribution function (cdf)
Probability density function (pdf)
Probability mass function (pmf)
Continuous Distributions (pdf's)
Normal/Gaussian
Uniform (continuous)
Beta
Dirichlet
Exponential
chi-squared
Discrete Distributions (pmf's)
Uniform (discrete)
Binomial
Multinomial
Hypergeometric
Poisson
Geometric
Summary statistics
Expectation and mean
Variance and standard deviation
Covariance and Correlation
Median, quartile
Interquartile range
Percentile/quantile
Mode
Important Laws
Law of large numbers
Central limit theorem
Estimation
Maximum Likelihood estimation
Kernel Density Estimation
Hypothesis Testing
p-Value
chi-square test
F test
t test
Confidence Interval
Monte Carlo Method
Chart suggestions thought starter
Python
Matplotlib
plotnine
Bokeh
seaborn
ipyvolume
Web
Dashboards
BI
Concepts, Inputs and Attributes
Categorical Variables
Ordinal Variables
Numerical Variables
Cost functions and gradient descent
Overfitting/ Underfitting
Training, validation and test data
Precision vs Recall
Bias and Variance
Lift
Supervised Learning
Regression
Linear Regression
Poisson Regression
Classification
Classification Rate
Decision Trees
Logistic Regression
Naive Bayes Classifiers
K Nearest Neighbour
Support Vector Machines
Gaussian Mixture Models
Unsupervised Learning
Clustering
Hierarchical Clustering
K Means Clustering
DBSCAN
HDBSCAN
Fuzzy C Means
Mean Shift
Agglomerative
OPTICS
Association Rule Learning
Apriori Algorithm
ECLAT Algorithm
FP Trees
Dimensionality Reduction
Principal Component Analysis
Random Projection
NMF
T-SNE
UMAP
Ensemble Learning
Boosting
Bagging
Stacking
Reinforcement Learning
Sentiment Analysis
Collaborative Filtering
Tagging
Prediction
Read DL Papers with concepts
Read DL Papers with code
Understanding Neural Networks
Loss functions
Activation functions
Weight initialisation
Vanishing/Exploding gradient Problem
Feedforward Neural Network
Autoencoder
Convolutional Neural Network
Recurrent Neural Network
Transformer
Encoder
Decoder
Attention
Siamese Network
Generative Adversarial Network (GAN)
Evolving Architectures/ NEAT
Residual Connections
Optimizers
SGD
Momentum
AdaGrad
AdaDelta
Nadam
RMSProp
Learning Rate Schedule
Batch Normalisation
Batch Size Effects
Regularisation
Early Stopping
Dropout
Parameter Penalties
Data Augmentation
Adversarial Training
Multitask Learning
Transfer Learning
Curriculum Learning
Important Libraries
Awesome Deep Learning
Huggingface Transformers
Tensorflow
PyTorch
Tensorboard
MLFlow
Distillation
Quantization
Neural Architecture Search (NAS)
Summary of Data Formats
Data Discovery
Data Source and Acquisition
Data Integration
Data Fusion
Transformation and Enrichment
Data Survey
OpenRefine
How much Data
Using ETL
Data Lake vs Data Warehouse
Dockerize your Python Application
Architectural Patterns and Best Practices
Horizontal vs Vertical Scaling
Map reduce
Data replication
Name and Data Nodes
Job and Task Tracker
Check the awesome big data list
Hadoop (large data)
HDFS
Loading data with Sqoop and Pig
Storm: Hadoop Realtime
Spark (in memory)
RAPIDS (on GPU)
Flume, Scribe: For unstruct Data
Data Warehouse with Hive
Elastic (EKL) Stack
to get data (e.g logging) search, analyze and visualise it in realtime
Avro
Flink
Dask
Numba
Onnx
OpenVino
MLFlow
Kafka and KSQL
Databases
Scalability
Cloud Services
AWS Sagemaker
Google ML Engine
Microsoft Azure Machine Learning Studio
Awesome Production ML
About
An Introductory Roadmap to Data Science
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.