Learning Algorithms¶
| Algorithm | Learning Type | Task | Comment | Assumptions | Pre-Processing | Probabilistic | Parametric | Scope | \(d_\text{VC}\) | Bias | Variance | Generalization | Advantages | Disadvantages | Time Complexity Training | Time Complexity Inference | Space Complexity Training | Space Complexity Inference | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Least Squares | Supervised | Regression | Mean has linear relationship with inputs Constant variance | ❌ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | \(O(n_\text{train} p^2 + p^3)\) | \(O(n_\text{inf} p)\) | ||||||
| LogisticRegression | Supervised | Classification Binary | ✅ | ✅ | Global | \(k+1\) | High | Low | Good \(n >> k\) | \(O(n_\text{train} p)\) | \(O(n_\text{inf} p)\) | |||||||
| Naive Bayes | Supervised | \(O(n_\text{train} p)\) | \(O(n_\text{inf} p)\) | |||||||||||||||
| OrderedRegression | Supervised | Classification Ordered | ||||||||||||||||
| Piecewise Constant | Supervised | Regression/ Classification | ❌ | ❌ | Local | |||||||||||||
| Piecewise Polynomial | Supervised | Regression/ Classification | ❌ | ❌ | Local | |||||||||||||
| SVM | Supervised | Regression/ Classification | Margin-Based | ❌ | ✅ | Computationally-expensive | ||||||||||||
| Gaussian Processes | Supervised | Regression/ Classification | ✅ | ✅ | ||||||||||||||
| KNN Nearest Neighbor | Supervised | Regression/ Classification | Good baseline model Can use mean, median, mode for regression Can use weightage, voting for classification | Underlying relationship b/w \(x\) and \(y\) is continuous | - Feature scaling - Dimensionality reduction | ❌ | ❌ | + Lazy-learner: no train time, new train data can be added without re-training + Can perform multi-class out-of-the-box | Lazy-learner: High space complexity, high inference time - Cannot use categorical features - Does not work well for large \(n\) or \(p\): need to calculate the distance of inference record wrt training records - Sensitive to noise - Sensitive to missing data - Sensitive to scale - Cannot extrapolate | |||||||||
| Decision Tree | Supervised | Regression/ Classification | Automatic Piecewise Constant Exactly opposite in characteristics wrt to OLS | ❌ | ❌ | Local | Low | High | - Highly-interpretable - Auto-detect non-linear relationships - Auto-model variable interactions - Fast evaluation: Traversal only occurs on subset of attributes | - Poor regressive performance - Unstable: Tree struct sensitive to train data; changing train data changes tree - Require large no of splits for even simple relationships - Cannot extrapolate | ||||||||
| Linear Tree | Supervised | Regression/ Classification | Automatic Piecewise Polynomial | ❌ | ❌ | Local | Can extrapolate | |||||||||||
| Linear Forest | Supervised | Regression/ Classification | ||||||||||||||||
| Linear Boosting | Supervised | Regression/ Classification | ||||||||||||||||
| Random Forest | Supervised | Regression/ Classification | Bagged Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
| XGBoost | Supervised | Regression/ Classification | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
| CatBoost | Supervised | Regression/ Classification | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
| LightGBM | Supervised | Regression/ Classification | Boosted Trees | ❌ | ❌ | Local | - Cannot extrapolate | |||||||||||
| Random Forest Proximity | Supervised | Missing value imputation Anomaly Detection Clustering | ||||||||||||||||
| K-Means K-Medoids | Unsupervised | Clustering | Euclidean distance assumes spherical clusters | ❌ | ❌ | |||||||||||||
| Gaussian Mixtures | Unsupervised | Clustering | ||||||||||||||||
| Hierarchical Clustering | Unsupervised | Clustering | ||||||||||||||||
| One-Many Clustering | Unsupervised | Clustering | ||||||||||||||||
| Graph Clustering | Unsupervised | Clustering | ||||||||||||||||
| KNN | Unsupervised | Anomaly Detection | ||||||||||||||||
| Kernel Density Estimation | Unsupervised | Anomaly Detection | ✅ | |||||||||||||||
| Isolation Forest | Unsupervised | Anomaly Detection | As contamination will mostly be unknown Get score without contamination offset (normalize for easier interpretation) Visualization 1. rank by the score 2. plot histogram of scores 3. parallel coordinates | ❌ | Requires pre-specified contamination fraction | |||||||||||||
| One-Class SVM | Unsupervised | Anomaly Detection | ||||||||||||||||
| Autoencoders | Unsupervised | Anomaly detection via reconstruction loss | Reconstruction loss will be high for anomalous events | |||||||||||||||
| Kernel PCA, UMAP | Unsupervised | Dimensionality reduction Anomaly detection via reconstruction loss | ||||||||||||||||
| Cluster Analysis | Unsupervised | Anomaly Detection | ||||||||||||||||
| IDK | Unsupervised | Ranking/Scoring | 1. Robust Scaler for \(x_{ij}\) 2. \(x_{ij} - \min (x_{j}), \forall j\), to prevent unintended consequences in step 3 3. Convert into euclidean distance from origin 4. Clustering 5. Score - Discrete score: Larger distance of cluster center from origin - Continuous score: Not sure yet | |||||||||||||||
| Q-Learning | Re-Inforcement Learning | ❌ | ❌ | 
Curse of Dimensionality¶
As the no of dimensions increases, relative distances tend to 0
Distance-based models are the most affected
- KNN
- K-Means
- Tree-based classification
- SVM?
\[ \begin{aligned} \dfrac{d_a}{d_b} &= \dfrac{ \sqrt{\sum_j^k d^2_{a, j}} }{ \sqrt{\sum_j^k d^2_{b, j}} } \\ \text{When } d \to \infty, \dfrac{d_a}{d_b} &\approx \dfrac{ \sqrt{d^2_{a, \text{nominal}} + n^2k} }{ \sqrt{d^2_{b, \text{nominal}} + n^2k} } \\ &\approx 1 \end{aligned} \]