What is feature scaling and why do we need to perform scaling before training and testing..?

5 min readNov 26, 2020

Feature scaling — A technique to standardize the independent features present in the data in a fixed range, handle highly varying magnitudes or values or units in a single range and helps in speeding up the calculations in an algorithm.

Why Feature scaling? A step of Data Pre-Processing which is applied to independent variables or features of data,basically helps to normalize the data within a particular range. Sometimes, it also helps in speeding up the calculations in an algorithm.

Few feature scaling technique are listed below:

1.MinMax normalisation
2.Standardization
3.Max Abs scaling
4.Robust Scaling
5.Quantile transformer scaling
6.Power transformer scaling
7.Unit vectors scaling

we will see all the technique one by one

MinMax normalization technique re-scales a feature or observation value with distribution value between 0 and 1 or a given range,MinMax shrinks the data within the range of -1 to 1 if there are negative values, and can set the range like [0,1] or [0,5] or [-1,1],This technique responds well if the standard deviation is small and when a distribution is not Gaussian.
we can use the MinMax normalization technique using sklearn library
sklearn.preprocessing.MinMaxScaler

Below is the formula and python code to calculate the MinMaxScaler:

Standardization technique is used to re-scales a feature value so that it has distribution with 0 mean value and variance equals to 1, scaling happen independently on each feature by computing the relevant statistics on the samples in the training set.

If data is not normally distributed, this is not the best Scaler to use.

we can use the Standardization technique using sklearn library

sklearn.preprocessing.StandardScaler

Below is the formula and python code to calculate Standardization:

Max Abs Scaling technique scale and translates each feature individually in such a way that the maximal absolute value of each feature in the training set is 1.0 and minimum absolute is 0.0. Scale each feature by its maximum absolute value,on positive-only data, this Scaler behaves similarly to Min Max Scaler.

we can use the Max Abs Scaling technique using sklearn library sklearn.preprocessing.MaxAbsScaler

Below is the formula and python code to calculate Max Abs Scaling:

Robust Scaling technique is robust to outliers, If our data contains many outliers, scaling using the mean and standard deviation of the data won’t work well,This Scaling technique removes the median and scales the data according to the quantile range(defaults to IQR: Interquartile Range).

we can use the Max Abs Scaling technique using sklearn library sklearn.preprocessing.robust_scale

Below is the formula and python code to calculate Robust Scaling

Quantile Transformer Scaling technique transforms the features to follow a uniform or a normal distribution,quantile transform will map a variable’s probability distribution to another probability distribution, transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution,The obtained values are then mapped to the desired output distribution using the associated quantile function.
Then a Quantile Transformer is used to map the data distribution Gaussian and standardize the result, centering the values on the mean value of 0 and a standard deviation of 1.0.

we can use the Quantile Transformer Scaling technique using sklearn library sklearn.preprocessing.quantile_transform

Power Transformer Scaling technique is a family of parametric, monotonic transformations that are applied to make data more Gaussian-like.This is useful for modeling issues related to the variability of a variable that is unequal across the range.power transform finds the optimal scaling factor in stabilizing variance and minimizing skewness through maximum likelihood estimation.

we can use the power Transformer Scaling technique using sklearn library sklearn.preprocessing.power_transform

Unit Vector Scaling technique is done considering the whole feature vector to be of unit length,Unit vector scaling means dividing each component by the Euclidean length of the vector (L2 Norm),Unit Vector technique produces values of range [0,1]. When dealing with features with hard boundaries, this is quite useful ex. when dealing with image data, the colors can range from only 0 to 255.

we can use the Unit Vector Scaling technique using sklearn library sklearn.preprocessing.power_transform

See that 👏 icon? Send my article some claps
Connect with me via linkedin, github and on medium👈 and Buy me a coffee if you like this blog.
Source code you can find it here 👈

What is feature scaling and why do we need to perform scaling before training and testing..?

Written by Gautam Kumar

Responses (1)