Learning Algorithms¶
Algorithm | Learning Type | Task | Comment | Assumptions | Pre-Processing | Probabilistic | Parametric | Scope | \(d_\text{VC}\) | Bias | Variance | Generalization | Advantages | Disadvantages | Time Complexity Training | Time Complexity Inference | Space Complexity Training | Space Complexity Inference |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Least Squares | Supervised | Regression | Mean has linear relationship with inputs Constant variance | ❌ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | \(O(n_\text{train} p^2 + p^3)\) | \(O(n_\text{inf} p)\) | ||||||
LogisticRegression | Supervised | Classification Binary | ✅ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | \(O(n_\text{train} p)\) | \(O(n_\text{inf} p)\) | |||||||
Naive Bayes | Supervised | \(O(n_\text{train} p)\) | \(O(n_\text{inf} p)\) | |||||||||||||||
OrderedRegression | Supervised | Classification Ordered | ||||||||||||||||
Piecewise Constant | Supervised | Regression/ Classification | ❌ | ❌ | Local | |||||||||||||
Piecewise Polynomial | Supervised | Regression/ Classification | ❌ | ❌ | Local | |||||||||||||
SVM | Supervised | Regression/ Classification | Margin-Based | ❌ | ✅ | Computationally-expensive | ||||||||||||
Gaussian Processes | Supervised | Regression/ Classification | ✅ | ✅ | ||||||||||||||
KNN Nearest Neighbor | Supervised | Regression/ Classification | Good baseline model Can use mean, median, mode for regression Can use weightage, voting for classification | Underlying relationship b/w \(x\) and \(y\) is continuous | - Feature scaling - Dimensionality reduction | ❌ | ❌ | + Lazy-learner: no train time, new train data can be added without re-training + Can perform multi-class out-of-the-box | Lazy-learner: High space complexity, high inference time - Cannot use categorical features - Does not work well for large \(n\) or \(p\): need to calculate the distance of inference record wrt training records - Sensitive to noise - Sensitive to missing data - Sensitive to scale - Cannot extrapolate | |||||||||
Decision Tree | Supervised | Regression/ Classification | Automatic Piecewise Constant Exactly opposite in characteristics wrt to OLS | ❌ | ❌ | Local | Low | High | - Highly-interpretable - Auto-detect non-linear relationships - Auto-model variable interactions - Fast evaluation: Traversal only occurs on subset of attributes | - Poor regressive performance - Unstable: Tree struct sensitive to train data; changing train data changes tree - Require large no of splits for even simple relationships - Cannot extrapolate | ||||||||
Linear Tree | Supervised | Regression/ Classification | Automatic Piecewise Polynomial | ❌ | ❌ | Local | Can extrapolate | |||||||||||
Linear Forest | Supervised | Regression/ Classification | ||||||||||||||||
Linear Boosting | Supervised | Regression/ Classification | ||||||||||||||||
Random Forest | Supervised | Regression/ Classification | Bagged Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
XGBoost | Supervised | Regression/ Classification | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
CatBoost | Supervised | Regression/ Classification | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
LightGBM | Supervised | Regression/ Classification | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
Random Forest Proximity | Supervised | Missing value imputation Anomaly Detection Clustering | ||||||||||||||||
K-Means K-Medoids | Unsupervised | Clustering | Euclidean distance assumes spherical clusters | ❌ | ❌ | |||||||||||||
Gaussian Mixtures | Unsupervised | Clustering | ||||||||||||||||
Hierarchical Clustering | Unsupervised | Clustering | ||||||||||||||||
One-Many Clustering | Unsupervised | Clustering | ||||||||||||||||
Graph Clustering | Unsupervised | Clustering | ||||||||||||||||
KNN | Unsupervised | Anomaly Detection | ||||||||||||||||
Kernel Density Estimation | Unsupervised | Anomaly Detection | ✅ | |||||||||||||||
Isolation Forest | Unsupervised | Anomaly Detection | As contamination will mostly be unknown Get score without contamination offset (normalize for easier interpretation) Visualization 1. rank by the score 2. plot histogram of scores 3. parallel coordinates | ❌ | Requires pre-specified contamination fraction | |||||||||||||
One-Class SVM | Unsupervised | Anomaly Detection | ||||||||||||||||
Autoencoders | Unsupervised | Anomaly Detection | Reconstruction loss will be high for anomalous events | |||||||||||||||
Cluster Analysis | Unsupervised | Anomaly Detection | ||||||||||||||||
IDK | Unsupervised | Ranking/Scoring | 1. Robust Scaler for \(x_{ij}\) 2. \(x_{ij} - \min (x_{j}), \forall j\), to prevent unintended consequences in step 3 3. Convert into euclidean distance from origin 4. Clustering 5. Score - Discrete score: Larger distance of cluster center from origin - Continuous score: Not sure yet | |||||||||||||||
Q-Learning | Re-Inforcement Learning | ❌ | ❌ |
Curse of Dimensionality¶
As the no of dimensions increases, relative distances tend to 0
Distance-based models are the most affected
- KNN
- K-Means
- Tree-based classification
- SVM?