Of course there was a time (or multiple times) that I learnt Support Vector Machines, especially during the time of kover (oh my, it has v2.0.0 now!). But apparently I’ve forgotten all the details. I had to write it down this time.
References first:
- StatQuest: Machine Learning Foundamentals: Bias and Variance
- StatQuest: Supoort Vector Machines Part 1-3.
some concepts
- Bias: the inability for a method (like linear regression) to capture the true relationship (such as a curve) is called bias.
- Variance (in machine learning): the differences in fits between datasets (such as traning vs. testing).
- Bias variance trade-off: women can’t have it all. Sometimes you have to allow mis-classification to set more appropriate thresholds.
- Overfitting: low bias (to the training set), usually high variance.
A good model has both low bias and low variability (not variance). Therefore you need to find a sweet spot (Joe and his racquetball lessons!) between a simple model (high bias and low variability) and complex/overfitting model (low bias but high variability). There are some common ways:
- regularization
- boosting
-
bagging (such as in random forest)
-
Cross validation Every observation gets to be training or testing at some point and the result is summarized. If you divide the data into N blocks, it’s called a N-Fold Cross Validation. If each block only contains 1 observation, then it becomes Leave One Out Cross Validation. N usually equals 10.
- Margin: The SHORTEST distance between the observations and the threshold. It would be as large as it can be if is halfway between two observations (because we only consider the shorter distance). A Maximal Margin Classifier set the threshold between observations by maximizing the margins. It is super sensitive to the outliers in the trainng data.
What kind of problem are we facing. Is it