Uncertainty¶
Types of Uncertainty¶
Others’ knowledge Our knowledge | Known | Unknown |
---|---|---|
Known | Things we are certain of | We know there are things we can’t predict eg: Random Process |
Unknown | Others know but you don’t know eg: Insufficient data | Completely unexpected/unforeseeable events eg: Unknown distribution |
Aleatoric | Epistemic | |
---|---|---|
Uncertainty in | Data | Model |
Cause | Noisy input data Measurement errors | Missing training data |
Describes confidence in | Input data | Prediction |
Reducible through more training data | ❌ | ✅ |
Can be learnt by model | ✅ | ❌ |
Solution | Better instruments/measurements | Get more data |
Uncertainty Quantification Methods¶
Concept | Assumption | Works for non-linear | Limitations | |
---|---|---|---|---|
Asymptotic approach | Central limit theorem | - Assumes normal distribution of response residuals - Assumes homoscedascity of response residuals | ❌ | - Requires large sample size to satisfy asymptotic condition - Requires appropriate formula for calculating standard error (not possible for complex models) |
Bootstrapping (preferred) | Random sampling with replacement | ✅ | Higher computation cost | |
Delta Approach | ✅ | |||
Conformal Prediction |
Uncertainty Intervals¶
\(\Delta y\) | |
---|---|
Normal Assumption | \(t_{n_\text{cal}, \alpha/2} \times \text{SE}\) |
Conformal Prediction | \(S^{-1} \left[ q_{\frac{\lceil (n+1)\alpha \rceil}{n}} \right]\) |
Normal Assumption¶
Coefficient Confidence Interval | Response Confidence Interval | Response Prediction Interval | |
---|---|---|---|
Notation | \(\sigma_{\hat \beta}\) | \(\sigma \Big[ \hat \mu \vert x_{i, \text{new}} \Big]\) | \(\sigma \Big[ \hat y_{i, \text{new}} \vert x_{i, \text{new}} \Big]\) |
The upper and lower bound for estimated __ at a given level of significance | \(\hat \beta\) | \(\hat \mu \vert x_{i, \text{new}}\) | \(\hat y \vert x_{i, \text{new}}\) \(=\hat \mu \vert x_{i, \text{new}} + \hat u \vert x_{i, \text{new}}\) |
Univariate Linear Regression (Asymptotic Approach) | \(\left\{ \text{RMSE} \sqrt{\dfrac{1}{n_\text{cal}} + \dfrac{\bar x^2}{n_\text{cal} \sigma^2_x}} , \dfrac{\text{RMSE}}{\sqrt{n_\text{cal} \sigma^2_x} }\right\}\) | \(\text{RMSE} \times \sqrt{\dfrac{1}{n_\text{cal}} + \dfrac{(x_{i, \text{new}}- \bar x )^2}{n_\text{cal} \sigma_x^2}}\) | \(\text{RMSE} \times \sqrt{\dfrac{1}{n_\text{cal}} + \dfrac{(x_{i, \text{new}} - \bar x )^2}{n_\text{cal} \sigma_x^2} \ \textcolor{hotpink}{+ 1}}\) |
Multivariate Linear Regression (Asymptotic Approach) | \({\text{RMSE} \times \sqrt{\text{Cov}_{jj}}}\) | \(\text{RMSE} \times \sqrt{X_{i, \text{new}}^T \cdot \text{Cov} \cdot X_{i, \text{new}} }\) | \(\text{RMSE} \times \sqrt{X_{i, \text{new}}^T \cdot \text{Cov} \cdot X_{i, \text{new}} \ \textcolor{hotpink}{+ 1}}\) |
Multivariate Non-Linear Regression (Asymptotic + Delta Approach) | \({\text{RMSE} \times \sqrt{\text{IF}_{jj}}}\) | \(\text{RMSE} \times \sqrt{ J_{i, \text{new}}^T \cdot \text{IF} \cdot J_{i, \text{new}} }\)![]() | \(\text{RMSE} \times \sqrt{J_{i, \text{new}}^T \cdot \text{IF} \cdot J_{i, \text{new}} \ \textcolor{hotpink}{+ 1} }\) |
where - \(\text{Cov}\): Covariance matrix - \(\text{Cov} = (X' X)^{-1}\) - \(J\): Jacobean matrix - \(J_{i, \text{new}} = \dfrac{\partial \hat y_{i, \text{new}}}{\partial \beta}\) - \(H\): Hessian matrix - \(H \approx (J^T J)\) - \(\text{IF}:\) Inverse Fischer - \(\text{IF} = H^{-1}\)
High values for non-diagonal elements of \(\text{Cov}_\beta\) means that the errors of \(\beta\) are correlated with each other.
Degree of freedom \(= n - k - 1\), where
- \(n =\) sample size
- \(k=\) no of input variables
Confidence and prediction intervals are narrowest at \(X = \bar X\), and get wider further from this point.
Under homoskedasticity, $$ \begin{aligned} \hat V(\hat \beta) &= (X' X)^{-1} \hat \sigma^2 \ &=\dfrac{\hat \sigma^2}{\hat u_j' \hat u_j} \end{aligned} $$
Note¶
- RMSE = RMSE of validation data
- If your validation error distribution is not normal, or you have a lot of data, you can use the quantiles of validation error distribution for the confidence intervals
Intervals using Models’ Prediction¶
For each data point, take __ of multiple models
- average
- 5th quantile
- 95th quantile
Predictive Density¶
Describes the full probabilistic distribution \(\forall x\)
Trajectories/Scenarios¶
Equally-likely samples of multivariate predictive densities
Uncertainty Propagation¶
Function | Variance |
---|---|
\(aA\) | \(= a^2\sigma_A^2\) |
\(aA + bB\) | \(= a^2\sigma_A^2 + b^2\sigma_B^2 + 2ab\,\text{Cov(A, B)}\) |
\(aA - bB\) | \(= a^2\sigma_A^2 + b^2\sigma_B^2 - 2ab\,\text{Cov(A, B)}\) |
\(AB\) | \(\approx f^2 \left[\left(\frac{\sigma_A}{A}\right)^2 + \left(\frac{\sigma_B}{B}\right)^2 + 2\frac{\text{Cov(A, B)}}{AB} \right]\) |
\(\frac{A}{B}\) | \(\approx f^2 \left[\left(\frac{\sigma_A}{A}\right)^2 + \left(\frac{\sigma_B}{B}\right)^2 - 2\frac{\text{Cov(A, B)}}{AB} \right]\) |
\(\frac{A}{A+B}\) | \(\approx \frac{f^2}{\left(A+B\right)^2} \left(\frac{B^2}{A^2}\sigma_A^2 +\sigma_B^2 - 2\frac{B}{A} \text{Cov(A, B)} \right)\) |
\(a A^{b}\) | \(\approx \left( {a}{b}{A}^{b-1}{\sigma_A} \right)^2 = \left( \frac{{f}{b}{\sigma_A}}{A} \right)^2\) |
\(a \ln(bA)\) | \(\approx \left(a \frac{\sigma_A}{A} \right)^2\)[^4] |
\(a \log_{10}(bA)\) | \(\approx \left(a \frac{\sigma_A}{A \ln(10)} \right)^2\)[^5] |
\(a e^{bA}\) | \(\approx f^2 \left( b\sigma_A \right)^2\)[^6] |
\(a^{bA}\) | \(\approx f^2 (b\ln(a)\sigma_A)^2\) |
\(a \sin(bA)\) | \(\approx \left[ a b \cos(b A) \sigma_A \right]^2\) |
\(a \cos \left( b A \right)\) | \(\approx \left[ a b \sin(b A) \sigma_A \right]^2\) |
\(a \tan \left( b A \right)\) | \(\left[ a b \sec^2(b A) \sigma_A \right]^2\) |
\(A^B\) | \(\approx f^2 \left[ \left( \frac{B}{A}\sigma_A \right)^2 +\left( \ln(A)\sigma_B \right)^2 + 2 \frac{B \ln(A)}{A} \text{Cov(A, B)} \right]\) |
\(\sqrt{aA^2 \pm bB^2}\) | \(\approx \left(\frac{A}{f}\right)^2 a^2\sigma_A^2 + \left(\frac{B}{f}\right)^2 b^2\sigma_B^2 \pm 2ab\frac{AB}{f^2}\,\text{Cov(A, B)}\) |
For uncorrelated variables (\(\rho_{AB}=0\), \(\text{Cov(A, B)}=0\)) expressions for more complicated functions can be derived by combining simpler functions. For example, repeated multiplication, assuming no correlation, gives \(f = ABC; \qquad \left(\frac{\sigma_f}{f}\right)^2 \approx \left(\frac{\sigma_A}{A}\right)^2 + \left(\frac{\sigma_B}{B}\right)^2+ \left(\frac{\sigma_C}{C}\right)^2.\)
For the case \(f = AB\) we also have Goodman's expression[^7] for the exact variance: for the uncorrelated case it is \(V(XY)= E(X)^2 V(Y) + E(Y)^2 V(X) + E((X-E(X))^2 (Y-E(Y))^2)\) and therefore we have: \(\sigma_f^2 = A^2\sigma_B^2 + B^2\sigma_A^2 + \sigma_A^2\sigma_B^2\)
Effect of correlation on differences¶
If A and B are uncorrelated, their difference A-B will have more variance than either of them. An increasing positive correlation (\(\rho_{AB}\to 1\)) will decrease the variance of the difference, converging to zero variance for perfectly correlated variables with the same variance. On the other hand, a negative correlation (\(\rho_{AB}\to -1\)) will further increase the variance of the difference, compared to the uncorrelated case.
For example, the self-subtraction f=A-A has zero variance \(\sigma_f^2=0\) only if the variate is perfectly autocorrelated (\(\rho_A=1\)). If A is uncorrelated, \(\rho_A=0\), then the output variance is twice the input variance, \(\sigma_f^2=2\sigma^2_A\). And if A is perfectly anticorrelated, \(\rho_A=-1\), then the input variance is quadrupled in the output, \(\sigma_f^2=4\sigma^2_A\) (notice \(1-\rho_A=2\) for f = aA − aA in the table above).
Value at Risk Models¶
- Derive the risk profile of the firm
- Protect firm against unacceptably large concentrations
- Quantify potential losses
- Collect data
- Graph the data to inspect data quality
- Transform prices data into returns form (percentage diff of prices)
- Look at the frequency distribution
- Obtain the standard deviation (volatility)
- Multiply volatility with one-sided \(Z_1\) to estimate 99% worst-case loss