Here I implemented SVM from scratch (only numpy was used) and tried to train it with different data and hyperparameters. An optimization algorithm I use here is SMO (i.e. Sequential Minimal Optimization), which is commonly used with Support Vector Machine. Although my implementation is a simplified version of the algorithm.
P.S. The testing_smo.ipynb
file was used to try different implementations for SVM, as well as the optimization techniques. For example, I started with a SGD optimization, so in the file you may see a learning rate variable (just a "leftlover"). Now it just repeats my main SMO script tho.
- Scalability: SMO is designed to work well with large datasets.
- Convergence: SMO has a proven convergence guarantee, which means that it will converge to the optimal solution given enough iterations.
- Flexibility: SMO can handle a wide range of kernel functions, including linear, polynomial, and RBF kernels.
- Memory efficiency: SMO only needs to store a small number of variables in memory, which makes it memory-efficient.
I used this paper to implement the algorithm. Here you can learn how it works from a mathematical point of view. Or, if you by some chance are interested in a full version of SMO, here you can read a paper about it.
In this section I will quickly explain how to work with my SVM implementation.
def __init__(X: ndarray, y: ndarray, kernel: str, c: float = 1, tol: float = 10e-4, sigma: float = 1)
- X - a 2D numpy array; the first dimention contains data points, the second one contains the features.
- y - a 1D numpy array; consists of the labels for every data point in X; those labels should be either 1 or -1 (only binary classification).
- kernel - a string; "linear" or "rbf" (i.e names of the kernel functions).
- C - regularization parameter (just read the paper).
- tol - numerical tolerance (just read the paper).
- sigma - a hyperparameter for the RBF kernel, isn't needed while using the linear kernel function.
def predict(features: ndarray, labels: ndarray) -> tuple[ndarray, float]
- features - a 2D numpy array; the first dimention contains data points, the second one contains the features.
- labels - a 1D numpy array; consists of the labels for every data point in features; those labels should be either 1 or -1 (only binary classification).
- RETURNS - a tuple with the predictions and an accuracy respectively.
def fit() -> tuple[ndarray, float]
- BEHAVIOR - basically, performes only one "tuning step" through all the alphas; should be executed multiple times
- RETURNS - a tuple with the predictions and an accuracy respectively.
def get_support_vectors(zero: float = 10e-5) -> ndarray
- zero - a number that is being compared with all alphas as those that don't belong to support vectors may not converge to zero completely.
- RETURNS - a numpy array of the support vectors from the training dataset.
The number of iterations is always 30, numerical tolerance is always the same: 10e-4. A random seed for numpy (to generate alphas) is 42. You can find the dataset on Kaggle.
№ | Kernel | C | Gamma | Training Accuracy | Testing Accuracy |
---|---|---|---|---|---|
1 | Linear | 0.01 | - | 72.13% | 61.03% |
2 | Linear | 0.1 | - | 69.47% | 56.19% |
3 | Linear | 1 | - | 66.81% | 56.49% |
4 | Linear | 10 | - | 64.99% | 56.19% |
5 | Linear | 100 | - | 65.12% | 56.19% |
6 | RBF | 0.01 | 0.1 | 59.38% | 61.63% |
7 | RBF | 0.1 | 0.1 | 81.23% | 60.72% |
8 | RBF | 1 | 0.1 | 86.41% | 61.33% |
9 | RBF | 10 | 0.1 | 94.82% | 67.37% |
10 | RBF | 100 | 0.1 | 94.54% | 60.73% |
11 | RBF | 10 | 1 | 97.20% | 53.78% |
12 | RBF | 10 | 0.01 | 81.79% | 77.34% |
In this subsection I will visualize some data and the decision boundaries my model makes for the data. For this purpose I use two auto-generated datasets from the scikit-learn library: blobs and circles. And because it's better to always have the same data, I saved those as .csv files. P.S. All the datapoints within a blue area (the gutter) are the support vectors
Used params:
- Numerical Tolerance = 0.001
- Iterations = 150
- Random Seed (for generating alphas) = 42
- Kernel = "Linear"
Plots | C |
---|---|
![]() |
0.01 |
![]() |
1 |
![]() |
100 |
Used params:
- Numerical Tolerance = 0.001
- Iterations = 25
- Random Seed (for generating alphas) = 42
- Kernel = "RBF"
Plots | C | Gamma |
---|---|---|
![]() |
0.01 | 0.1 |
![]() |
0.1 | 1 |
![]() |
100 | 0.1 |
![]() |
2 | 35 |