This chapter deals with a machine learning method termed as Support Vector Machines (SVMs).
Introduction
Support vector machines (SVMs) are powerful yet flexible supervised machine learning methods used for classification, regression, and, outliers’ detection. SVMs are very efficient in high dimensional spaces and generally are used in classification problems. SVMs are popular and memory efficient because they use a subset of training points in the decision function.
The main goal of SVMs is to divide the datasets into number of classes in order to find a maximum marginal hyperplane (MMH) which can be done in the following two steps −
- Support Vector Machines will first generate hyperplanes iteratively that separates the classes in the best way.
- After that it will choose the hyperplane that segregate the classes correctly.
Some important concepts in SVM are as follows −
- Support Vectors − They may be defined as the datapoints which are closest to the hyperplane. Support vectors help in deciding the separating line.
- Hyperplane − The decision plane or space that divides set of objects having different classes.
- Margin − The gap between two lines on the closet data points of different classes is called margin.
Following diagrams will give you an insight about these SVM concepts −
SVM in Scikit-learn supports both sparse and dense sample vectors as input.
Classification of SVM
Scikit-learn provides three classes namely SVC, NuSVC and LinearSVC which can perform multiclass-class classification.
SVC
It is C-support vector classification whose implementation is based on libsvm. The module used by scikit-learn is sklearn.svm.SVC. This class handles the multiclass support according to one-vs-one scheme.
Parameters
Followings table consist the parameters used by sklearn.svm.SVC class −
Sr.No | Parameter & Description |
---|---|
1 | C − float, optional, default = 1.0It is the penalty parameter of the error term. |
2 | kernel − string, optional, default = ‘rbf’This parameter specifies the type of kernel to be used in the algorithm. we can choose any one among, ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. The default value of kernel would be ‘rbf’. |
3 | degree − int, optional, default = 3It represents the degree of the ‘poly’ kernel function and will be ignored by all other kernels. |
4 | gamma − {‘scale’, ‘auto’} or float,It is the kernel coefficient for kernels ‘rbf’, ‘poly’ and ‘sigmoid’. |
5 | optinal default − = ‘scale’If you choose default i.e. gamma = ‘scale’ then the value of gamma to be used by SVC is 1/(𝑛_𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠∗𝑋.𝑣𝑎𝑟()).On the other hand, if gamma= ‘auto’, it uses 1/𝑛_𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠. |
6 | coef0 − float, optional, Default=0.0An independent term in kernel function which is only significant in ‘poly’ and ‘sigmoid’. |
7 | tol − float, optional, default = 1.e-3This parameter represents the stopping criterion for iterations. |
8 | shrinking − Boolean, optional, default = TrueThis parameter represents that whether we want to use shrinking heuristic or not. |
9 | verbose − Boolean, default: falseIt enables or disable verbose output. Its default value is false. |
10 | probability − boolean, optional, default = trueThis parameter enables or disables probability estimates. The default value is false, but it must be enabled before we call fit. |
11 | max_iter − int, optional, default = -1As name suggest, it represents the maximum number of iterations within the solver. Value -1 means there is no limit on the number of iterations. |
12 | cache_size − float, optionalThis parameter will specify the size of the kernel cache. The value will be in MB(MegaBytes). |
13 | random_state − int, RandomState instance or None, optional, default = noneThis parameter represents the seed of the pseudo random number generated which is used while shuffling the data. Followings are the options −int − In this case, random_state is the seed used by random number generator.RandomState instance − In this case, random_state is the random number generator.None − In this case, the random number generator is the RandonState instance used by np.random. |
14 | class_weight − {dict, ‘balanced’}, optionalThis parameter will set the parameter C of class j to 𝑐𝑙𝑎𝑠𝑠_𝑤𝑒𝑖𝑔ℎ𝑡[𝑗]∗𝐶 for SVC. If we use the default option, it means all the classes are supposed to have weight one. On the other hand, if you choose class_weight:balanced, it will use the values of y to automatically adjust weights. |
15 | decision_function_shape − ovo’, ‘ovr’, default = ‘ovr’This parameter will decide whether the algorithm will return ‘ovr’ (one-vs-rest) decision function of shape as all other classifiers, or the original ovo(one-vs-one) decision function of libsvm. |
16 | break_ties − boolean, optional, default = falseTrue − The predict will break ties according to the confidence values of decision_functionFalse − The predict will return the first class among the tied classes. |
Attributes
Followings table consist the attributes used by sklearn.svm.SVC class −
Sr.No | Attributes & Description |
---|---|
1 | support_ − array-like, shape = [n_SV]It returns the indices of support vectors. |
2 | support_vectors_ − array-like, shape = [n_SV, n_features]It returns the support vectors. |
3 | n_support_ − array-like, dtype=int32, shape = [n_class]It represents the number of support vectors for each class. |
4 | dual_coef_ − array, shape = [n_class-1,n_SV]These are the coefficient of the support vectors in the decision function. |
5 | coef_ − array, shape = [n_class * (n_class-1)/2, n_features]This attribute, only available in case of linear kernel, provides the weight assigned to the features. |
6 | intercept_ − array, shape = [n_class * (n_class-1)/2]It represents the independent term (constant) in decision function. |
7 | fit_status_ − intThe output would be 0 if it is correctly fitted. The output would be 1 if it is incorrectly fitted. |
8 | classes_ − array of shape = [n_classes]It gives the labels of the classes. |
Implementation Example
Like other classifiers, SVC also has to be fitted with following two arrays −
- An array X holding the training samples. It is of size [n_samples, n_features].
- An array Y holding the target values i.e. class labels for the training samples. It is of size [n_samples].
Following Python script uses sklearn.svm.SVC class −
import numpy as np X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.svm import SVC SVCClf = SVC(kernel = 'linear',gamma = 'scale', shrinking = False,) SVCClf.fit(X, y)
Output
SVC(C = 1.0, cache_size = 200, class_weight = None, coef0 = 0.0, decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linear', max_iter = -1, probability = False, random_state = None, shrinking = False, tol = 0.001, verbose = False)
Example
Now, once fitted, we can get the weight vector with the help of following python script −
SVCClf.coef_
Output
array([[0.5, 0.5]])
Example
Similarly, we can get the value of other attributes as follows −
SVCClf.predict([[-0.5,-0.8]])
Output
array([1])
Example
SVCClf.n_support_
Output
array([1, 1])
Example
SVCClf.support_vectors_
Output
array( [ [-1., -1.], [ 1., 1.] ] )
Example
SVCClf.support_
Output
array([0, 2])
Example
SVCClf.intercept_
Output
array([-0.])
Example
SVCClf.fit_status_
Output
0
NuSVC
NuSVC is Nu Support Vector Classification. It is another class provided by scikit-learn which can perform multi-class classification. It is like SVC but NuSVC accepts slightly different sets of parameters. The parameter which is different from SVC is as follows −
- nu − float, optional, default = 0.5
It represents an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Its value should be in the interval of (o,1].
Rest of the parameters and attributes are same as of SVC.
Implementation Example
We can implement the same example using sklearn.svm.NuSVC class also.
import numpy as np X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) y = np.array([1, 1, 2, 2]) from sklearn.svm import NuSVC NuSVCClf = NuSVC(kernel = 'linear',gamma = 'scale', shrinking = False,) NuSVCClf.fit(X, y)
Output
NuSVC(cache_size = 200, class_weight = None, coef0 = 0.0, decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linear', max_iter = -1, nu = 0.5, probability = False, random_state = None, shrinking = False, tol = 0.001, verbose = False)
We can get the outputs of rest of the attributes as did in the case of SVC.
LinearSVC
It is Linear Support Vector Classification. It is similar to SVC having kernel = ‘linear’. The difference between them is that LinearSVC implemented in terms of liblinear while SVC is implemented in libsvm. That’s the reason LinearSVC has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples.
If we talk about its parameters and attributes then it does not support ‘kernel’ because it is assumed to be linear and it also lacks some of the attributes like support_, support_vectors_, n_support_, fit_status_ and, dual_coef_.
However, it supports penalty and loss parameters as follows −
- penalty − string, L1 or L2(default = ‘L2’)This parameter is used to specify the norm (L1 or L2) used in penalization (regularization).
- loss − string, hinge, squared_hinge (default = squared_hinge)It represents the loss function where ‘hinge’ is the standard SVM loss and ‘squared_hinge’ is the square of hinge loss.
Implementation Example
Following Python script uses sklearn.svm.LinearSVC class −
from sklearn.svm import LinearSVC from sklearn.datasets import make_classification X, y = make_classification(n_features = 4, random_state = 0) LSVCClf = LinearSVC(dual = False, random_state = 0, penalty = 'l1',tol = 1e-5) LSVCClf.fit(X, y)
Output
LinearSVC(C = 1.0, class_weight = None, dual = False, fit_intercept = True, intercept_scaling = 1, loss = 'squared_hinge', max_iter = 1000, multi_class = 'ovr', penalty = 'l1', random_state = 0, tol = 1e-05, verbose = 0)
Example
Now, once fitted, the model can predict new values as follows −
LSVCClf.predict([[0,0,0,0]])
Output
[1]
Example
For the above example, we can get the weight vector with the help of following python script −
LSVCClf.coef_
Output
[[0. 0. 0.91214955 0.22630686]]
Example
Similarly, we can get the value of intercept with the help of following python script −
LSVCClf.intercept_
Output
[0.26860518]
Regression with SVM
As discussed earlier, SVM is used for both classification and regression problems. Scikit-learn’s method of Support Vector Classification (SVC) can be extended to solve regression problems as well. That extended method is called Support Vector Regression (SVR).
Basic similarity between SVM and SVR
The model created by SVC depends only on a subset of training data. Why? Because the cost function for building the model doesn’t care about training data points that lie outside the margin.
Whereas, the model produced by SVR (Support Vector Regression) also only depends on a subset of the training data. Why? Because the cost function for building the model ignores any training data points close to the model prediction.
Scikit-learn provides three classes namely SVR, NuSVR and LinearSVR as three different implementations of SVR.
SVR
It is Epsilon-support vector regression whose implementation is based on libsvm. As opposite to SVC There are two free parameters in the model namely ‘C’ and ‘epsilon’.
- epsilon − float, optional, default = 0.1
It represents the epsilon in the epsilon-SVR model, and specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
Rest of the parameters and attributes are similar as we used in SVC.
Implementation Example
Following Python script uses sklearn.svm.SVR class −
from sklearn import svm X = [[1, 1], [2, 2]] y = [1, 2] SVRReg = svm.SVR(kernel = ’linear’, gamma = ’auto’) SVRReg.fit(X, y)
Output
SVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, epsilon = 0.1, gamma = 'auto', kernel = 'linear', max_iter = -1, shrinking = True, tol = 0.001, verbose = False)
Example
Now, once fitted, we can get the weight vector with the help of following python script −
SVRReg.coef_
Output
array([[0.4, 0.4]])
Example
Similarly, we can get the value of other attributes as follows −
SVRReg.predict([[1,1]])
Output
array([1.1])
Similarly, we can get the values of other attributes as well.
NuSVR
NuSVR is Nu Support Vector Regression. It is like NuSVC, but NuSVR uses a parameter nu to control the number of support vectors. And moreover, unlike NuSVC where nu replaced C parameter, here it replaces epsilon.
Implementation Example
Following Python script uses sklearn.svm.SVR class −
from sklearn.svm import NuSVR import numpy as np n_samples, n_features = 20, 15 np.random.seed(0) y = np.random.randn(n_samples) X = np.random.randn(n_samples, n_features) NuSVRReg = NuSVR(kernel = 'linear', gamma = 'auto',C = 1.0, nu = 0.1)^M NuSVRReg.fit(X, y)
Output
NuSVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, gamma = 'auto', kernel = 'linear', max_iter = -1, nu = 0.1, shrinking = True, tol = 0.001, verbose = False)
Example
Now, once fitted, we can get the weight vector with the help of following python script −
NuSVRReg.coef_
Output
array( [ [-0.14904483, 0.04596145, 0.22605216, -0.08125403, 0.06564533, 0.01104285, 0.04068767, 0.2918337 , -0.13473211, 0.36006765, -0.2185713 , -0.31836476, -0.03048429, 0.16102126, -0.29317051] ] )
Similarly, we can get the value of other attributes as well.
LinearSVR
It is Linear Support Vector Regression. It is similar to SVR having kernel = ‘linear’. The difference between them is that LinearSVR implemented in terms of liblinear, while SVC implemented in libsvm. That’s the reason LinearSVR has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples.
If we talk about its parameters and attributes then it does not support ‘kernel’ because it is assumed to be linear and it also lacks some of the attributes like support_, support_vectors_, n_support_, fit_status_ and, dual_coef_.
However, it supports ‘loss’ parameters as follows −
- loss − string, optional, default = ‘epsilon_insensitive’
It represents the loss function where epsilon_insensitive loss is the L1 loss and the squared epsilon-insensitive loss is the L2 loss.
Implementation Example
Following Python script uses sklearn.svm.LinearSVR class −
from sklearn.svm import LinearSVR from sklearn.datasets import make_regression X, y = make_regression(n_features = 4, random_state = 0) LSVRReg = LinearSVR(dual = False, random_state = 0, loss = 'squared_epsilon_insensitive',tol = 1e-5) LSVRReg.fit(X, y)
Output
LinearSVR( C=1.0, dual=False, epsilon=0.0, fit_intercept=True, intercept_scaling=1.0, loss='squared_epsilon_insensitive', max_iter=1000, random_state=0, tol=1e-05, verbose=0 )
Example
Now, once fitted, the model can predict new values as follows −
LSRReg.predict([[0,0,0,0]])
Output
array([-0.01041416])
Example
For the above example, we can get the weight vector with the help of following python script −
LSRReg.coef_
Output
array([20.47354746, 34.08619401, 67.23189022, 87.47017787])
Example
Similarly, we can get the value of intercept with the help of following python script −
LSRReg.intercept_
Output
array([-0.01041416])
Next Topic : Click Here
Pingback: Scikit-Learn : Stochastic Gradient Descent | Adglob Infosystem Pvt Ltd
Fantastic blog article. Fantastic.