Machine Learning is a sub-field of artificial intelligence that uses data to train predictive models.
- Supervised learning - learns from labeled training data.
- Unsupervised learning - learns from unlabled training data. Examples include Principal component analysis and clustering.
- Reinforcement learning - Maximize a reward. An agent interacts with an environment and learns to take action by maximizing a cumulative reward.
- Transfer learning - storing knowledge gained while solving one problem and applying it to a different but related problem.
- Semi-Supervised learning - Use a mix mostly unlabeled, with a small labeled subset data.
- Self-supervised learnng - A form of unsupervised learning where the model is trained on a task using the data itself to generate supervisory signals, rather than relying on externally-provided labels. (Example: Predict the next word (LLM pretraining) or predicting part of a masked image MAE, DINO, iBOT)
- regression - predicting a continuous value attribute (Example: house prices)
- classification - predicting a discrete value. (Example: pass or fail, hot dog/not hot dog)
Features - are the inputs to a machine learning model. They are the measurable property being observed. An example of a features is pixel brightness in computer vision tasks or the square footgage of a house in home pricing prediction.
Feature selection - is the process of choosing the features. Effective features are discriminating and independent. As an example, for predicting house prices you might choose the square feet and number of floors as features whereas width, length and volume are unsuitable features.
Feature engineering - is the process of using domain knowledge to extract, transform, or create new features from raw data. For example, feature encoding and feature scaling and normalization. Deep learning has changed feature engineering from hand-crafted feature extraction to representing raw data in a way that can be effectively interpreted.
Feature Encoding - is converting categorical data into numerical formats such as one-hot encoding or embeddings, such as word embeddings for llms.
Feature scaling - the process of normalizing the range of numeric features. Common feature scaling techniques include min-max scaling, and standardization (aka z-score normalization). Min-max scaling squeezes values between a range typically 0 to 1. Min-max scaling is best for non-gaussian distributions such as pixel values in image processing. Standardization) is appropriate for Gaussian distributions, and centers the data on a mean of zero, and a standard deviation of 1. Min-max scaling is implemented in scikit learn's MinMaxScaler. Standardization is implemented by sickit learn's StandarcScaler.
Dimensionality Reduction - Transforming data from high to low dimension but retains properties. Examples include singular value decomposition, variational auto-encoders, and t-SNE (for visualizations), and max pooling layers for CNNs.
In suprervised learning data is typically split into training, validation and test data.
An example is a single instance from your dataset.
Neural Networks - Neural networks are a suitable model for fixed input features.
Transformers - Transformers are a neural network architecture designed to process sequences (text, images, audio, video) using attemtion mechansim. Transformers were originally described in the 2017 paper Attention Is All You Need. Transformers replaced recurrent neural networks for sequential models.
Transformers are the architecture for:
- LLMs such as llama, nanogpt
- Multimodal foundation models (Google Gemini, Open AI GPT-5, Anthropic Claude)
- Vision Transformers (ViT), and Swin Transformers.
These are common computer vision tasks methods for solving them. CNNs have gone through a hybrid period where people use cnn backbones with vision transformers. However the trend is towards transformers. CNNs are still used on realtime mobile devices.
Convolutional Neural Networks CNNS are suitable models for computer vision problems.
- Image classification: res net, Inception v4, dense net
- Object detection: (with bounding boxes) yolo v4 (still used for realtime object detection)
- Instance segmentation: mask r-cnn
- Semantic segmentation: U-Net