This chapter, we will discuss Extended Linear Modeling and focusses on the polynomial features and pipelining tools in Sklearn.
Introduction to Polynomial Features
Linear models trained on non-linear functions of data generally maintains the fast performance of linear methods. It also allows them to fit a much wider range of data. Thatβs the reason in machine learning such linear models, that are trained on nonlinear functions, are used.
One such example is that a simple linear regression can be extended by constructing polynomial features from the coefficients.
Mathematically, suppose we have standard linear regression model then for 2-D data it would look like this βY=W0+W1X1+W2X2
Now, we can combine the features in second-order polynomials and our model will look like as follows βY=W0+W1X1+W2X2+W3X1X2+W4X21+W5X22
The above is still a linear model. Here, we saw that the resulting polynomial regression is in the same class of linear models and can be solved similarly.
To do so, scikit-learn provides a module named PolynomialFeatures. This module transforms an input data matrix into a new data matrix of given degree.
Parameters
Followings table consist the parameters used by PolynomialFeatures module
Sr.No | Parameter & Description |
---|---|
1 | degree β integer, default = 2It represents the degree of the polynomial features. |
2 | interaction_only β Boolean, default = falseBy default, it is false but if set as true, the features that are products of most degree distinct input features, are produced. Such features are called interaction features. |
3 | include_bias β Boolean, default = trueIt includes a bias column i.e. the feature in which all polynomials powers are zero. |
4 | order β str in {βCβ, βFβ}, default = βCβThis parameter represents the order of output array in the dense case. βFβ order means faster to compute but on the other hand, it may slow down subsequent estimators. |
Attributes
Followings table consist the attributes used by PolynomialFeatures module
Sr.No | Attributes & Description |
---|---|
1 | powers_ β array, shape (n_output_features, n_input_features)It shows powers_ [i,j] is the exponent of the jth input in the ith output. |
2 | n_input_features _ β intAs name suggests, it gives the total number of input features. |
3 | n_output_features _ β intAs name suggests, it gives the total number of polynomial output features. |
Implementation Example
Following Python script uses PolynomialFeatures transformer to transform array of 8 into shape (4,2) β
from sklearn.preprocessing import PolynomialFeatures import numpy as np Y = np.arange(8).reshape(4, 2) poly = PolynomialFeatures(degree=2) poly.fit_transform(Y)
Output
array( [ [ 1., 0., 1., 0., 0., 1.], [ 1., 2., 3., 4., 6., 9.], [ 1., 4., 5., 16., 20., 25.], [ 1., 6., 7., 36., 42., 49.] ] )
Streamlining using Pipeline tools
The above sort of preprocessing i.e. transforming an input data matrix into a new data matrix of a given degree, can be streamlined with the Pipeline tools, which are basically used to chain multiple estimators into one.
Example
The below python scripts using Scikit-learnβs Pipeline tools to streamline the preprocessing (will fit to an order-3 polynomial data).
#First, import the necessary packages. from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline import numpy as np #Next, create an object of Pipeline tool Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))]) #Provide the size of array and order of polynomial data to fit the model. x = np.arange(5) y = 3 - 2 * x + x ** 2 - x ** 3 Stream_model = model.fit(x[:, np.newaxis], y) #Calculate the input polynomial coefficients. Stream_model.named_steps['linear'].coef_
Output
array([ 3., -2., 1., -1.])
The above output shows that the linear model trained on polynomial features is able to recover the exact input polynomial coefficients.
Next Topic : Click Here
Pingback: Scikit Learn - Multi Task Elastic Net | Adglob Infosystem Pvt Ltd