Scikit-Learn Tutorial

In this guide, we will discuss Scikit-learn Tutorial. Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

Audience

This tutorial will be useful for graduates, postgraduates, and research students who either have an interest in this Machine Learning subject or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner.

Prerequisites

The reader must have basic knowledge about Machine Learning. He/she should also be aware about Python, NumPy, Scipy, Matplotlib. If you are new to any of these concepts, we recommend you take up tutorials concerning these topics, before you dig further into this tutorial.

Introduction

In this chapter, we will understand what is Scikit-Learn or Sklearn, origin of Scikit-Learn and some other related topics such as communities and contributors responsible for development and maintenance of Scikit-Learn, its prerequisites, installation and its features.

What is Scikit-Learn (Sklearn)

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

Origin of Scikit-Learn

It was originally called scikits.learn and was initially developed by David Cournapeau as a Google summer of code project in 2007. Later, in 2010, Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, and Vincent Michel, from FIRCA (French Institute for Research in Computer Science and Automation), took this project at another level and made the first public release (v0.1 beta) on 1st Feb. 2010.

Let’s have a look at its version history βˆ’

  • May 2019: scikit-learn 0.21.0
  • March 2019: scikit-learn 0.20.3
  • December 2018: scikit-learn 0.20.2
  • November 2018: scikit-learn 0.20.1
  • September 2018: scikit-learn 0.20.0
  • July 2018: scikit-learn 0.19.2
  • July 2017: scikit-learn 0.19.0
  • September 2016. scikit-learn 0.18.0
  • November 2015. scikit-learn 0.17.0
  • March 2015. scikit-learn 0.16.0
  • July 2014. scikit-learn 0.15.0
  • August 2013. scikit-learn 0.14

Community & contributors

Scikit-learn is a community effort and anyone can contribute to it. This project is hosted on https://github.com/scikit-learn/scikit-learn. Following people are currently the core contributors to Sklearn’s development and maintenance βˆ’

  • Joris Van den Bossche (Data Scientist)
  • Thomas J Fan (Software Developer)
  • Alexandre Gramfort (Machine Learning Researcher)
  • Olivier Grisel (Machine Learning Expert)
  • Nicolas Hug (Associate Research Scientist)
  • Andreas Mueller (Machine Learning Scientist)
  • Hanmin Qin (Software Engineer)
  • Adrin Jalali (Open Source Developer)
  • Nelle Varoquaux (Data Science Researcher)
  • Roman Yurchak (Data Scientist)

Various organisations like Booking.com, JP Morgan, Evernote, Inria, AWeber, Spotify and many more are using Sklearn.

Prerequisites

Before we start using scikit-learn latest release, we require the following βˆ’

  • Python (>=3.5)
  • NumPy (>= 1.11.0)
  • Scipy (>= 0.17.0)li
  • Joblib (>= 0.11)
  • Matplotlib (>= 1.5.1) is required for Sklearn plotting capabilities.
  • Pandas (>= 0.18.0) is required for some of the scikit-learn examples using data structure and analysis.

Installation

If you already installed NumPy and Scipy, following are the two easiest ways to install scikit-learn βˆ’

Using pip

Following command can be used to install scikit-learn via pip βˆ’

pip install -U scikit-learn

Using conda

Following command can be used to install scikit-learn via conda βˆ’

conda install scikit-learn

On the other hand, if NumPy and Scipy is not yet installed on your Python workstation then, you can install them by using either pip or conda.

Another option to use scikit-learn is to use Python distributions like Canopy and Anaconda because they both ship the latest version of scikit-learn.

Features

Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is focused on modeling the data. Some of the most popular groups of models provided by Sklearn are as follows βˆ’

Supervised Learning algorithms βˆ’ Almost all the popular supervised learning algorithms, like Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-learn.

Unsupervised Learning algorithms βˆ’ On the other hand, it also has all the popular unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to unsupervised neural networks.

Clustering βˆ’ This model is used for grouping unlabeled data.

Cross Validation βˆ’ It is used to check the accuracy of supervised models on unseen data.

Dimensionality Reduction βˆ’ It is used for reducing the number of attributes in data which can be further used for summarisation, visualisation and feature selection.

Ensemble methods βˆ’ As name suggest, it is used for combining the predictions of multiple supervised models.

Feature extraction βˆ’ It is used to extract the features from data to define the attributes in image and text data.

Feature selection βˆ’ It is used to identify useful attributes to create supervised models.

Open Source βˆ’ It is open source library and also commercially usable under BSD license.

Next Topic : Click Here

This Post Has One Comment

Leave a Reply