Scikit-Learn : Support Vector Machines

Scikit-Learn : Support Vector Machines

This chapter deals with a machine learning method termed as Support Vector Machines (SVMs).

Introduction

Support vector machines (SVMs) are powerful yet flexible supervised machine learning methods used for classification, regression, and, outliersโ€™ detection. SVMs are very efficient in high dimensional spaces and generally are used in classification problems. SVMs are popular and memory efficient because they use a subset of training points in the decision function.

The main goal of SVMs is to divide the datasets into number of classes in order to find a maximum marginal hyperplane (MMH) which can be done in the following two steps โˆ’

  • Support Vector Machines will first generate hyperplanes iteratively that separates the classes in the best way.
  • After that it will choose the hyperplane that segregate the classes correctly.

Some important concepts in SVM are as follows โˆ’

  • Support Vectors โˆ’ They may be defined as the datapoints which are closest to the hyperplane. Support vectors help in deciding the separating line.
  • Hyperplane โˆ’ The decision plane or space that divides set of objects having different classes.
  • Margin โˆ’ The gap between two lines on the closet data points of different classes is called margin.

Following diagrams will give you an insight about these SVM concepts โˆ’

marginal hyperplane

SVM in Scikit-learn supports both sparse and dense sample vectors as input.

Classification of SVM

Scikit-learn provides three classes namely SVC, NuSVC and LinearSVC which can perform multiclass-class classification.

SVC

It is C-support vector classification whose implementation is based on libsvm. The module used by scikit-learn is sklearn.svm.SVC. This class handles the multiclass support according to one-vs-one scheme.

Parameters

Followings table consist the parameters used by sklearn.svm.SVC class โˆ’

Sr.NoParameter & Description
1C โˆ’ float, optional, default = 1.0It is the penalty parameter of the error term.
2kernel โˆ’ string, optional, default = โ€˜rbfโ€™This parameter specifies the type of kernel to be used in the algorithm. we can choose any one among, โ€˜linearโ€™, โ€˜polyโ€™, โ€˜rbfโ€™, โ€˜sigmoidโ€™, โ€˜precomputedโ€™. The default value of kernel would be โ€˜rbfโ€™.
3degree โˆ’ int, optional, default = 3It represents the degree of the โ€˜polyโ€™ kernel function and will be ignored by all other kernels.
4gamma โˆ’ {โ€˜scaleโ€™, โ€˜autoโ€™} or float,It is the kernel coefficient for kernels โ€˜rbfโ€™, โ€˜polyโ€™ and โ€˜sigmoidโ€™.
5optinal default โˆ’ = โ€˜scaleโ€™If you choose default i.e. gamma = โ€˜scaleโ€™ then the value of gamma to be used by SVC is 1/(๐‘›_๐‘“๐‘’๐‘Ž๐‘ก๐‘ข๐‘Ÿ๐‘’๐‘ โˆ—๐‘‹.๐‘ฃ๐‘Ž๐‘Ÿ()).On the other hand, if gamma= โ€˜autoโ€™, it uses 1/๐‘›_๐‘“๐‘’๐‘Ž๐‘ก๐‘ข๐‘Ÿ๐‘’๐‘ .
6coef0 โˆ’ float, optional, Default=0.0An independent term in kernel function which is only significant in โ€˜polyโ€™ and โ€˜sigmoidโ€™.
7tol โˆ’ float, optional, default = 1.e-3This parameter represents the stopping criterion for iterations.
8shrinking โˆ’ Boolean, optional, default = TrueThis parameter represents that whether we want to use shrinking heuristic or not.
9verbose โˆ’ Boolean, default: falseIt enables or disable verbose output. Its default value is false.
10probability โˆ’ boolean, optional, default = trueThis parameter enables or disables probability estimates. The default value is false, but it must be enabled before we call fit.
11max_iter โˆ’ int, optional, default = -1As name suggest, it represents the maximum number of iterations within the solver. Value -1 means there is no limit on the number of iterations.
12cache_size โˆ’ float, optionalThis parameter will specify the size of the kernel cache. The value will be in MB(MegaBytes).
13random_state โˆ’ int, RandomState instance or None, optional, default = noneThis parameter represents the seed of the pseudo random number generated which is used while shuffling the data. Followings are the options โˆ’int โˆ’ In this case, random_state is the seed used by random number generator.RandomState instance โˆ’ In this case, random_state is the random number generator.None โˆ’ In this case, the random number generator is the RandonState instance used by np.random.
14class_weight โˆ’ {dict, โ€˜balancedโ€™}, optionalThis parameter will set the parameter C of class j to ๐‘๐‘™๐‘Ž๐‘ ๐‘ _๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก[๐‘—]โˆ—๐ถ for SVC. If we use the default option, it means all the classes are supposed to have weight one. On the other hand, if you choose class_weight:balanced, it will use the values of y to automatically adjust weights.
15decision_function_shape โˆ’ ovoโ€™, โ€˜ovrโ€™, default = โ€˜ovrโ€™This parameter will decide whether the algorithm will return โ€˜ovrโ€™ (one-vs-rest) decision function of shape as all other classifiers, or the original ovo(one-vs-one) decision function of libsvm.
16break_ties โˆ’ boolean, optional, default = falseTrue โˆ’ The predict will break ties according to the confidence values of decision_functionFalse โˆ’ The predict will return the first class among the tied classes.

Attributes

Followings table consist the attributes used by sklearn.svm.SVC class โˆ’

Sr.NoAttributes & Description
1support_ โˆ’ array-like, shape = [n_SV]It returns the indices of support vectors.
2support_vectors_ โˆ’ array-like, shape = [n_SV, n_features]It returns the support vectors.
3n_support_ โˆ’ array-like, dtype=int32, shape = [n_class]It represents the number of support vectors for each class.
4dual_coef_ โˆ’ array, shape = [n_class-1,n_SV]These are the coefficient of the support vectors in the decision function.
5coef_ โˆ’ array, shape = [n_class * (n_class-1)/2, n_features]This attribute, only available in case of linear kernel, provides the weight assigned to the features.
6intercept_ โˆ’ array, shape = [n_class * (n_class-1)/2]It represents the independent term (constant) in decision function.
7fit_status_ โˆ’ intThe output would be 0 if it is correctly fitted. The output would be 1 if it is incorrectly fitted.
8classes_ โˆ’ array of shape = [n_classes]It gives the labels of the classes.

Implementation Example

Like other classifiers, SVC also has to be fitted with following two arrays โˆ’

  • An array X holding the training samples. It is of size [n_samples, n_features].
  • An array Y holding the target values i.e. class labels for the training samples. It is of size [n_samples].

Following Python script uses sklearn.svm.SVC class โˆ’

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
SVCClf = SVC(kernel = 'linear',gamma = 'scale', shrinking = False,)
SVCClf.fit(X, y)

Output

SVC(C = 1.0, cache_size = 200, class_weight = None, coef0 = 0.0,
   decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linear',
   max_iter = -1, probability = False, random_state = None, shrinking = False,
   tol = 0.001, verbose = False)

Example

Now, once fitted, we can get the weight vector with the help of following python script โˆ’

SVCClf.coef_

Output

array([[0.5, 0.5]])

Example

Similarly, we can get the value of other attributes as follows โˆ’

SVCClf.predict([[-0.5,-0.8]])

Output

array([1])

Example

SVCClf.n_support_

Output

array([1, 1])

Example

SVCClf.support_vectors_

Output

array(
   [
      [-1., -1.],
      [ 1., 1.]
   ]
)

Example

SVCClf.support_

Output

array([0, 2])

Example

SVCClf.intercept_

Output

array([-0.])

Example

SVCClf.fit_status_

Output

0

NuSVC

NuSVC is Nu Support Vector Classification. It is another class provided by scikit-learn which can perform multi-class classification. It is like SVC but NuSVC accepts slightly different sets of parameters. The parameter which is different from SVC is as follows โˆ’

  • nu โˆ’ float, optional, default = 0.5

It represents an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Its value should be in the interval of (o,1].

Rest of the parameters and attributes are same as of SVC.

Implementation Example

We can implement the same example using sklearn.svm.NuSVC class also.

import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import NuSVC
NuSVCClf = NuSVC(kernel = 'linear',gamma = 'scale', shrinking = False,)
NuSVCClf.fit(X, y)

Output

NuSVC(cache_size = 200, class_weight = None, coef0 = 0.0,
   decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linear',
   max_iter = -1, nu = 0.5, probability = False, random_state = None,
   shrinking = False, tol = 0.001, verbose = False)

We can get the outputs of rest of the attributes as did in the case of SVC.

LinearSVC

It is Linear Support Vector Classification. It is similar to SVC having kernel = โ€˜linearโ€™. The difference between them is that LinearSVC implemented in terms of liblinear while SVC is implemented in libsvm. Thatโ€™s the reason LinearSVC has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples.

If we talk about its parameters and attributes then it does not support โ€˜kernelโ€™ because it is assumed to be linear and it also lacks some of the attributes like support_, support_vectors_, n_support_, fit_status_ and, dual_coef_.

However, it supports penalty and loss parameters as follows โˆ’

  • penalty โˆ’ string, L1 or L2(default = โ€˜L2โ€™)This parameter is used to specify the norm (L1 or L2) used in penalization (regularization).
  • loss โˆ’ string, hinge, squared_hinge (default = squared_hinge)It represents the loss function where โ€˜hingeโ€™ is the standard SVM loss and โ€˜squared_hingeโ€™ is the square of hinge loss.

Implementation Example

Following Python script uses sklearn.svm.LinearSVC class โˆ’

from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification
X, y = make_classification(n_features = 4, random_state = 0)
LSVCClf = LinearSVC(dual = False, random_state = 0, penalty = 'l1',tol = 1e-5)
LSVCClf.fit(X, y)

Output

LinearSVC(C = 1.0, class_weight = None, dual = False, fit_intercept = True,
   intercept_scaling = 1, loss = 'squared_hinge', max_iter = 1000,
   multi_class = 'ovr', penalty = 'l1', random_state = 0, tol = 1e-05, verbose = 0)

Example

Now, once fitted, the model can predict new values as follows โˆ’

LSVCClf.predict([[0,0,0,0]])

Output

[1]

Example

For the above example, we can get the weight vector with the help of following python script โˆ’

LSVCClf.coef_

Output

[[0. 0. 0.91214955 0.22630686]]

Example

Similarly, we can get the value of intercept with the help of following python script โˆ’

LSVCClf.intercept_

Output

[0.26860518]

Regression with SVM

As discussed earlier, SVM is used for both classification and regression problems. Scikit-learnโ€™s method of Support Vector Classification (SVC) can be extended to solve regression problems as well. That extended method is called Support Vector Regression (SVR).

Basic similarity between SVM and SVR

The model created by SVC depends only on a subset of training data. Why? Because the cost function for building the model doesnโ€™t care about training data points that lie outside the margin.

Whereas, the model produced by SVR (Support Vector Regression) also only depends on a subset of the training data. Why? Because the cost function for building the model ignores any training data points close to the model prediction.

Scikit-learn provides three classes namely SVR, NuSVR and LinearSVR as three different implementations of SVR.

SVR

It is Epsilon-support vector regression whose implementation is based on libsvm. As opposite to SVC There are two free parameters in the model namely โ€˜Cโ€™ and โ€˜epsilonโ€™.

  • epsilon โˆ’ float, optional, default = 0.1

It represents the epsilon in the epsilon-SVR model, and specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.

Rest of the parameters and attributes are similar as we used in SVC.

Implementation Example

Following Python script uses sklearn.svm.SVR class โˆ’

from sklearn import svm
X = [[1, 1], [2, 2]]
y = [1, 2]
SVRReg = svm.SVR(kernel = โ€™linearโ€™, gamma = โ€™autoโ€™)
SVRReg.fit(X, y)

Output

SVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, epsilon = 0.1, gamma = 'auto',
   kernel = 'linear', max_iter = -1, shrinking = True, tol = 0.001, verbose = False)

Example

Now, once fitted, we can get the weight vector with the help of following python script โˆ’

SVRReg.coef_

Output

array([[0.4, 0.4]])

Example

Similarly, we can get the value of other attributes as follows โˆ’

SVRReg.predict([[1,1]])

Output

array([1.1])

Similarly, we can get the values of other attributes as well.

NuSVR

NuSVR is Nu Support Vector Regression. It is like NuSVC, but NuSVR uses a parameter nu to control the number of support vectors. And moreover, unlike NuSVC where nu replaced C parameter, here it replaces epsilon.

Implementation Example

Following Python script uses sklearn.svm.SVR class โˆ’

from sklearn.svm import NuSVR
import numpy as np
n_samples, n_features = 20, 15
np.random.seed(0)
y = np.random.randn(n_samples)
X = np.random.randn(n_samples, n_features)
NuSVRReg = NuSVR(kernel = 'linear', gamma = 'auto',C = 1.0, nu = 0.1)^M
NuSVRReg.fit(X, y)

Output

NuSVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, gamma = 'auto',
   kernel = 'linear', max_iter = -1, nu = 0.1, shrinking = True, tol = 0.001,
   verbose = False)

Example

Now, once fitted, we can get the weight vector with the help of following python script โˆ’

NuSVRReg.coef_

Output

array(
   [
      [-0.14904483, 0.04596145, 0.22605216, -0.08125403, 0.06564533,
      0.01104285, 0.04068767, 0.2918337 , -0.13473211, 0.36006765,
      -0.2185713 , -0.31836476, -0.03048429, 0.16102126, -0.29317051]
   ]
)

Similarly, we can get the value of other attributes as well.

LinearSVR

It is Linear Support Vector Regression. It is similar to SVR having kernel = โ€˜linearโ€™. The difference between them is that LinearSVR implemented in terms of liblinear, while SVC implemented in libsvm. Thatโ€™s the reason LinearSVR has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples.

If we talk about its parameters and attributes then it does not support โ€˜kernelโ€™ because it is assumed to be linear and it also lacks some of the attributes like support_, support_vectors_, n_support_, fit_status_ and, dual_coef_.

However, it supports โ€˜lossโ€™ parameters as follows โˆ’

  • loss โˆ’ string, optional, default = โ€˜epsilon_insensitiveโ€™

It represents the loss function where epsilon_insensitive loss is the L1 loss and the squared epsilon-insensitive loss is the L2 loss.

Implementation Example

Following Python script uses sklearn.svm.LinearSVR class โˆ’

from sklearn.svm import LinearSVR
from sklearn.datasets import make_regression
X, y = make_regression(n_features = 4, random_state = 0)
LSVRReg = LinearSVR(dual = False, random_state = 0,
loss = 'squared_epsilon_insensitive',tol = 1e-5)
LSVRReg.fit(X, y)

Output

LinearSVR(
   C=1.0, dual=False, epsilon=0.0, fit_intercept=True,
   intercept_scaling=1.0, loss='squared_epsilon_insensitive',
   max_iter=1000, random_state=0, tol=1e-05, verbose=0
)

Example

Now, once fitted, the model can predict new values as follows โˆ’

LSRReg.predict([[0,0,0,0]])

Output

array([-0.01041416])

Example

For the above example, we can get the weight vector with the help of following python script โˆ’

LSRReg.coef_

Output

array([20.47354746, 34.08619401, 67.23189022, 87.47017787])

Example

Similarly, we can get the value of intercept with the help of following python script โˆ’

LSRReg.intercept_

Output

array([-0.01041416])

Next Topic : Click Here

This Post Has 2 Comments

Leave a Reply