This green curve is a Sinusoidal Curve, from which we draw the followings-
- Some data points at some specific intervals,
- And we just add some noise to this dataset which is given by the blue dot or blue circles.
So the idea here is, we want to fit into a polynomial and to vary the degree of the polynomial and see how the fix look like. We fit them by using an error term to perform the regression with least squared error
So t is the ground truth or the correct answer and whatever your polynomial outputs, and it is given by y, so your error is basically what you are going to use,
So, at the extreme left we have a zero degree polynomial, which is nothing but a constant term, so as you can see there is a red line, which is actually the fit, the fitted curve, of course does not match the data that we have used.
At the middle we use a first degree polynomial, which is nothing but a linear fit, and did not fit
At the right we have third degree polynomial.
So ideally when you are done with the fit, you would expect the red curve to lie close to the green curve, but in this case, in between samples is actually off.
So even the with higher degree polynomial is, we are able to fit every point exactly, so that our fitting error is very small. We see that in points, other than the blue circles, it is actually quite far from the ground truth.
Graphs of the Root-Mean-Square Error, evaluated on the training set and on an independent test set for various values of M
So, if we actually plot the error, the error as we defined previously, so as you fit the error 4 different values of M that is degree of the polynomial, see that as we hit the higher degree polynomials, the error on the training data is very small, which is what the blue circle indicates. However the test data error, this starts to diverge. So, this phenomenon is referred to as overshooting. Similarly has become back here, we see that again, there is quite high when we are using a polynomial of degree zero, that is we are just fitting it to a constant function. So in both the cases, we have a fairly large error, one in this end of the spectrum, we can call this under fitting and at this end of the spectrum, we will call it over fitting.
- Bias Variance trade off relates to model complexity which is nothing but the number of parameters and the basis functions () used in the model.
-
$\hat{Y}$ is what is called a statistical estimate of the true model 𝑌
Expectation value of the difference between the model prediction and the correct value. Here the expectation is over the X and different data sets.
The variance is the variance on the predictions of the model trained using different data sets.
- Complex models with many parameters might have a small bias but large variance.
- Simple models with few parameters might have a large bias but small variance.