Parametric vs Non-parametric algorithms

When browsing through tutorials and reading books, sometimes we see the terms ‘Parametric‘ and ‘Non-parametric‘ algorithms. The authors of the tutorials and books suggest when we should use this type of algorithms and when to use the other, but how do we differentiate between these two? This blog post will show us how!

Let me take a subtle note before we begin, hopefully, this will not drag your mood down, but the separating line of Parametric and Non-parametric is somehow vague, meaning the discrimination is a bit ambiguous. Nevertheless, it is just “a bit ambiguous”, not “very ambiguous”, so don’t be depressed, ok?

Properties

$\blacktriangleright$ The most important factor to determine if an algorithm is parametric or not depends on its nature and assumptions: a parametric algorithm has some assumptions about the underlying characteristics of the function it estimates, while a non-parametric algorithm does not.

E.g.

Parametric	Non-parametric
The Linear Regression has the assumption that the predictor variables have a linear and additive relationship to the target variable.	K-Nearest Neighbors does not make any assumption about the data, its classification of a data point depends purely on the known points around.

$\blacktriangleright$ The second trait is: for parametric algorithms, the number of parameters is usually fixed, while for non-parametric algorithms, it can potentially grow to infinity, depending on the training data.

E.g.

Parametric	Non-parametric
The Linear Regression has a fixed number of weights, which is pre-defined before we train the model.	For the Decision Tree, the number of nodes in the tree is not constant. Depending on the size and the width of input data that the number of nodes can rise indefinitely.

$\blacktriangleright$ The third and also the last property is: The parameters of parametric models usually have their meaning (i.e. role) in explaining how the models work, while this is not true for non-parametric models.

E.g.

Parametric	Non-parametric
Each weight of a Linear Regression model represents how a predictor variable affects the target variable. A big absolute weight means the predictor has a big influence on the target, while a nearly-zero weight says that the predictor is somehow not important at all.	For deep neural networks, the value of weights on the connections and the bias terms do not tell anything specific.

Examples

Some parametric algorithms are:

Linear regression
Logistic regression
Gaussian Naive Bayes

Some non-parametric algorithms are:

K-nearest neighbors
Decision tree
Deep neural networks
Support vector machine
Naive Bayes with Density estimation

Advantages

Each type of algorithm has its own advantages and disadvantages. While we look at them from this angle, there are only 2 types, so the advantages of this type are the disadvantages of the other type and vice versa.

The parametric algorithms usually have below strengths:

Simpler and more intuitive.
Faster to train and give predictions.
Require fewer data.
Interpretable.

To the contrary, non-parametric algorithms thrive on:

Have flexibility, more versatile as they do not make assumptions about data.
Stronger performance if being provided enough data.

To sum up

In this blog post, we discussed the differences and how to differentiate parametric and non-parametric algorithms. In short, parametric algorithms (represented by Linear regression) make strong assumptions about the data, thus they require fewer data to train and lesser time to run. On the other hand, Non-parametric algorithms (e.g. Deep neural networks) do not assume prior knowledge of the data, which makes them more versatile.

Test your understanding

Parametric vs Non-parametric algorithms - Quiz

1 / 7

What are the usual traits of a parametric method? Choose all that apply.

Its number of parameters is only known after the training terminates.

Each of its parameters alone does not have any specific meaning.

The speed of training is slow.

The speed of training is fast.

It does not assume any properties of the data.

Each of its parameters has its own specific meaning.

Its number of parameters is fixed before the training begins.

It assume some properties of the data.

2 / 7

What are some examples of non-parametric methods? Choose all that apply.

Support vector machine.

Logistic regression.

Decision tree.

Gaussian Naive Bayes.

3 / 7

In general, what are some advantages of non-parametric models? Choose all that apply.

More interpretable.

Easier to implement.

Faster.

Have potential to give better predictions.

Require less data.

More simple and intuitive.

More versatile and flexible.

4 / 7

In general, what are some advantages of parametric models? Choose all that apply.

More versatile and flexible.

Easier to implement.

More simple and intuitive.

Require less data.

More interpretable.

Faster.

Have potential to give better predictions.

5 / 7

What are the usual traits of a non-parametric method? Choose all that apply.

Its number of parameters is fixed before the training begins.

The speed of training is slow.

Its number of parameters is only known after the training terminates.

Each of its parameters alone does not have any specific meaning.

Each of its parameters has its own specific meaning.

It assume some properties of the data.

It does not assume any properties of the data.

The speed of training is fast.

6 / 7

What are some examples of parametric methods? Choose all that apply.

Deep neural networks.

Naive Bayes with density estimation.

K-nearest neighbors.

Linear regression.

Gaussian Naive Bayes.

7 / 7

What is the most prominent factor for discriminating parametric and non-parametric algorithms?

The factor of whether each parameter of the model has a specific meaning or not.

The factor of whether the number of parameters is fixed or not.

The factor of whether the algorithm has assumptions about the data or not.

Your score is

Please rate this quiz

References:

Wikipedia’s page about the parametric model: link
A Quora question of parametric and non-parametric algorithms: link
A question on StackExchange about the same topic: link

Tung M Phung's Blog

Parametric vs Non-parametric algorithms

Leave a ReplyCancel reply