
The Transformer neural network architecture
The Transformer neural networks, explained in details. Continue reading The Transformer neural network architecture
The Transformer neural networks, explained in details. Continue reading The Transformer neural network architecture
Introduction to and comparison of Batch Norm, Weight Norm, Layer Norm, Instance Norm, and Group Norm. Continue reading Deep Learning normalization methods
Get your first intuition with Attention right with a minimal code-example. Continue reading Attention in Deep Learning, your starting point (with code)
Where and how you should start with LSTM? Continue reading LSTM: where to start?
In this article, various types of sampling techniques for imbalanced datasets are discussed in depth with examples and analysis. Continue reading Imbalanced Learning: sampling techniques
An ensemble of trees (in the form of bagging, random forest, or boosting) is usually preferred over one decision tree alone. Continue reading Ensemble: Bagging, Random Forest, Boosting and Stacking
Google Play Store dataset with 53k apps and 1m4 comments, scraped on April 2020. Continue reading Data Scraping: Android App Dataset from Google Play Store
This dataset attempts to encapsulate information about the statistics of their posts, from post name, author, published date, to the number of shares and comments, etc. Continue reading Data scraping: KDnuggets.com’s post statistics
A dataset scraped from Versus.com, containing various phone types’ specifications, e.g. screen size, battery power, pixel density. Continue reading Data scraping: Phone dataset from Versus.com
This City dataset is obtained from scraping Versus.com Continue reading Data scraping: City dataset from Versus.com
This post attempts to give readers a practical example of how to clean a dataset (in particular, the Google Play Store Apps dataset). Continue reading Data Cleaning case study: Google Play Store Dataset
This article attempts to give a comprehensive view of Index in Postgresql. Continue reading SQL – Postgresql Indexing
The new Coronavirus is spreading fiercely. This article describes how to make a geo-map to showcase the status of this disease around the world using Python and Plotly. Continue reading Coronavirus cases: an interactive geo-map with Plotly
This article attempts to make PCA crystal clear to anyone who wishes to understand it thoroughly, step-by-step, in both high and low-level concepts. Continue reading Principal Component Analysis fully explained
In this blog post, we introduce the creation and alteration of tables along with auxiliary information like data types and constraints. Continue reading SQL – create and alter tables
A window function performs the calculation of each row over a set of rows (a group / a partition / a window of rows) that they belong to. Continue reading SQL – window functions
This article summarize all functions in Postgresql, from numeric, string, datetime to other functions. Continue reading SQL – list of Postgresql functions
In this blog post, we make it through the 2 means of combining data: row-wise (UNION, INTERSECT, etc.) and column-wise (JOIN, etc.) Continue reading SQL – combining data (UNION, JOIN, etc.)
SQL aggregation functions in Postgresql, including general-purpose, concatenative, statistics, ordered-sets, and ranking aggregation. Continue reading SQL – aggregate functions
An introduction to SQL queries, including SELECT, FROM, WHERE, ORDER BY, LIMIT, alias (AS), DISTINCT, equal (=), not equal (!= or <>) Continue reading SQL – an introduction to basic SELECT queries
In statistics and data mining, we often encounter the word ‘control’, mostly from terms like control variables and control groups. In fact, a control variable has slightly different meanings in different fields Continue reading Control Variable explained
Through various experiments, ELU is accepted by many researchers as a good successor of the original version (ReLU). Continue reading ELU activation: A comprehensive analysis
What is an Unpaired 2-sample T-test? Let’s analyze this definition from scratch. Continue reading Unpaired Two-sample T-test (Independent T-test)
What is a Paired 2-sample T-test? Let’s analyze this definition from scratch. Continue reading Paired Two-sample T-test (Dependent T-test)
This article attempts to summarize the popular evaluation metrics for binary classification problems. Continue reading Binary Classification Evaluation Summary
We introduce an alternative for the ROC: the Precision-Recall curve (PR-curve), which is a more reliable measurement for the cases when Positive samples are rare. Continue reading Precision-Recall curve: an overview
The well-known ROC curve plot, the Area Under the ROC Curve (AUC), and its variants. Continue reading ROC curve and AUC: a comprehensive overview
The Confusion Matrix is a square table representing the predictions of a classification model. Continue reading How to read the Confusion Matrix
Batch Normalization (BatchNorm) is a very frequently used technique in Deep Learning, however, the reason why it works is often interpreted ambiguously. Continue reading Batch Normalization and why it works
Information Gain, Gain Ratio and Gini Index are the three fundamental criteria to measure the quality of a split in Decision Tree. Continue reading Information Gain, Gain Ratio and Gini Index
In the previous blogs, we have discussed Logistic Regression and its assumptions. Today, the main topic is the theoretical and empirical goods and bads of this model. Continue reading Logistic Regression: Advantages and Disadvantages
Despise its simplicity, ReLU previously achieved the top performance over various tasks of modern Machine Learning. Continue reading Rectifier Linear Unit (ReLU)
The sigmoid and tanh activation functions were very frequently used in the past but have been losing popularity in the era of Deep learning. Continue reading Sigmoid, tanh activations and their loss of popularity
In this blog post, we show and explain the Bayes formula, how to build a Naive Bayes classifier, its assumptions, strengths, and weakness. Continue reading Naive Bayes classifier: a comprehensive guide
When these requirements, or assumptions, hold true, we know that our Logistic model has expressed the best performance it can. Continue reading Assumptions of Logistic Regression
Applications, such as using Machine Learning to boost the recruitment process, may bring more harm than good. Continue reading How Intelligent systems damage the data
I have seen enough threads saying that Correlation does NOT imply Causality. Yes, that is true, but how about the other way around? Continue reading Does Causality imply Correlation?
A dummy variable is a variable (or feature, predictor, column) whose values can be either 0 or 1. Continue reading When to add a dummy variable?
In Machine Learning, while some predictive models allow categorical variables in the data, most require all predictor variables to be continuous Continue reading How to convert Categorical Variables to Numerical Variables
Following the previous overview, this article attempts to delve deeper into Logistic Regression. Continue reading Logistic Regression tutorial
The first blog on a series about Logistic Regression. Continue reading An overview of Logistic Regression
Train and cross-validate your Linear regression on Python with pre-defined or customized evaluation functions. Continue reading Linear Regression in Python
Many people have a tendency to always do feature centering, scaling or normalizing right before applying predictive models to the data… Continue reading When to do feature centering, scaling and normalization?
How do we distinguish Parametric and Non-parametric algorithms? By reading this article. Continue reading Parametric vs Non-parametric algorithms
This article presents the formulas for coming up with the best-fitted linear regression line. Continue reading How to make a Linear Regressor? (theory)
In a machine learning project, after crawling or collecting data, we have to split it into at least 2 parts: training and validation data. Continue reading Splitting data into a Training set and a Validation set
T-score, t-statistic, t-distribution and t-test belong to the T-family, which is very closely related to the Z-family. Continue reading T-statistic, T-test and the T family
Not only does Linear regression give us a model for prediction, but it also tells us about how accurate the model is, by the means of Confidence Intervals. Continue reading Confidence Intervals for Linear Regression Coefficients
Python, the most common programming language for practicing Machine Learning – Data Mining (ML and DM) today, and Jupyter, a convenient environment for writing Python code. Continue reading Introduction to Python and Jupyter
Hypothesis Testing is the process of verifying if a hypothesis is viable or not, i.e. should we reject a hypothesis in favor of the other. Continue reading Hypothesis Testing
We take a set of samples from a given Normal distribution. How extreme is this set? Continue reading Z-score on a sample set
Z-score (together with Z-test, Z-distribution, Z-statistic, etc.) is a very frequently used term from statistics being applied in Machine Learning. Continue reading Z-score, Z-statistic, Z-test, Z-distribution
If you are considering using Linear regression for your production pipeline, you should be aware of its 4 drawbacks. Continue reading Disadvantages of Linear Regression
Linear regression is frequently used in practice because of these 7 reasons. Continue reading Advantages of Linear Regression
In which cases does Linear Regression perform the best? In which cases should we use other algorithms? Continue reading Assumptions of Linear Regression
Let’s examine everything we need to know about over-fitting and under-fitting. Continue reading Over-fitting and Under-fitting
We tackle Regularization for Linear Regression by answering 5 questions: What, When, Where, How, and Why? Continue reading Regularization for Linear regression
This article differentiate objective functions from evaluation functions and elaborate some examples of them. Continue reading Regression Objective and Evaluation Functions
a number of different types of Linear regression based on various points of view Continue reading Types of Linear regression
Linear regression is arguably the most popular Machine learning model out there. Continue reading Introduction to Linear Regression
This is my 5th blog on a series of data visualization with charts for specific purposes. Continue reading Charts to compare different objects
This blog discuss QQ-plot, PP-plot, Probability plot, the relationship and the confusion amongst these three. Continue reading QQ-plot versus PP-plot versus Probability plot
The top ML’s applications that are around to enhance the satisfaction of our lives. Continue reading Machine Learning Applications
Mining from data is not a simple task and the help of libraries makes the process more ẹnoyable. Continue reading Numpy, Pandas, Scikit-learn and Matplotlib
Needleman Wunsch Algorithm utilizes Dynamic programming to align 2 sequences in the optimal way. Continue reading Optimal alignment – Needleman Wunsch Algorithm
The Regular expression (regex, or just re) is a means of representation, used for string matching and searching. Continue reading Basic Regular Expression
Human learning and machine learning are very similar. They are similar in the sense that both involve learning, the process of understanding and creating new knowledge. Continue reading Different types of Machine Learning
This is my 4th blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show relationships between (or among) variables
This is my 3rd blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show trends
This is my 2rd blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show the distribution
This is my first blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show the proportion
Matplotlib is undeniably the most prevalent name in the family of visualization libraries in Python. Continue reading Basic plots with Matplotlib
Feature selection is hard but very important. Continue reading Feature selection with sklearn
This blog post attempts to address why NaNs are bad and how we can fix them. Continue reading How to deal with missing values (NaNs)
Getting started with Machine Learning What are Computer Science, Artificial Intelligence and Machine Learning? Different types of Machine Learning Machine Learning Applications Introduction to Python and Jupyter Numpy, Pandas, Scikit-learn and Matplotlib Data Scraping Data scraping: City dataset from Versus.com Data scraping: Phone dataset from Versus.com Data scraping: KDnuggets.com’s post statistics Data Scraping: Android App Dataset from Google Play Store Data Cleaning Data Cleaning case study: Google Play Store Dataset Preparatory Phase Control Variable Splitting data into a Training set and a Validation set Imbalanced Learning: sampling techniques Exploratory Data Analysis QQ plot versus PP plot versus Probability plot Multicollinearity … Continue reading Data Mining – Machine Learning
There are many people who get confused when hearing about Computer Science (CS), Artificial Intelligence (AI), and Machine Learning (ML). Continue reading What are Computer Science, Artificial Intelligence and Machine Learning?
To better understand the definition of collinearity, let’s start with an example… Continue reading What is Multicollinearity (or Collinearlity) ?