Machine Learning - Data MiningLeave a comment

Is ChatGPT as good as humans? A study in the field of programming education

July 20, 2023December 28, 2023 Tung.M.Phung

How far is ChatGPT from Artificial General Intelligence? A recent study compares ChatGPT’s performance to human for programming education. Continue reading Is ChatGPT as good as humans? A study in the field of programming education

Deep Learning, Machine Learning - Data MiningLeave a comment

Advanced (yet simple) Prompt Engineering techniques for Large Language Models

April 1, 2023April 1, 2023 Tung.M.Phung

Advanced (yet simple) prompt engineering methods that are proven to work well in various domains. Continue reading Advanced (yet simple) Prompt Engineering techniques for Large Language Models

Deep Learning, Machine Learning - Data MiningLeave a comment

Conversational AI: ChatGPT alternatives and their advantages

March 18, 2023March 18, 2023 Tung.M.Phung

ChatGPT is not the only AI that can converse naturally with humans. In this article, we introduce alternatives for ChatGPT with their unique advantages. Continue reading Conversational AI: ChatGPT alternatives and their advantages

Deep Learning, Machine Learning - Data MiningLeave a comment

Yes, we can now make ChatGPT as good as human tutors for programming education

January 1, 2023December 28, 2023 Tung.M.Phung

A straightforward technique that significantly enhances ChatGPT’s hint generation capabilities, placing it on par with expert human tutors for programming education. Continue reading Yes, we can now make ChatGPT as good as human tutors for programming education

Classification Models, Deep Learning, Exploratory Data Analysis, Feature Engineering, Machine Learning - Data Mining1 Comment

Case study: Machine Learning and Deep Learning for Knowledge Tracing in Programming Education

May 8, 2022May 14, 2022 Tung.M.Phung

Applying Machine Learning and Deep Learning to solve the Knowledge Tracing problem in the context of Programming classrooms. Continue reading Case study: Machine Learning and Deep Learning for Knowledge Tracing in Programming Education

Deep Learning, Feature Engineering, Machine Learning - Data MiningLeave a comment

Transforming everything to vectors with Deep Learning: from Word2Vec, Node2Vec, to Code2Vec and Data2Vec

April 17, 2022March 26, 2023 Tung.M.Phung

Let us discuss the state-of-the-art methods for transforming every kind of input data into fixed-length vectors of continuous values, including Word2Vec, Doc2Vec, Image2Vec, Node2Vec, Edge2Vec, Code2Vec, and Data2Vec. Continue reading Transforming everything to vectors with Deep Learning: from Word2Vec, Node2Vec, to Code2Vec and Data2Vec

Machine Learning - Data MiningLeave a comment

Reinforcement Learning approaches for the Join Optimization problem in Database: DQ, ReJoin, Neo, RTOS, and Bao

March 15, 2022March 31, 2022 Tung.M.Phung

How can we use Reinforcement Learning for the problem of Join optimization in Database? Let us take a look at the 5 recent and outstanding approaches from top researchers in the world: DQ, ReJoin, Neo, RTOS, and Bao. Continue reading Reinforcement Learning approaches for the Join Optimization problem in Database: DQ, ReJoin, Neo, RTOS, and Bao

Deep Learning, Machine Learning - Data Mining, Text Mining1 Comment

A review of pre-trained language models: from BERT, RoBERTa, to ELECTRA, DeBERTa, BigBird, and more

December 10, 2021March 26, 2023 Tung.M.Phung

Let us review a list of pretrained language models, including BERT, Transformer-XL, XLNet, RoBERTa, DistilBERT, ALBERT, BART, ELECTRA, ConvBERT, DeBERTa, and BigBird. Continue reading A review of pre-trained language models: from BERT, RoBERTa, to ELECTRA, DeBERTa, BigBird, and more

Miscellaneous, Software Development1 Comment

A summary of modern C++ features

October 3, 2021February 15, 2022 Tung.M.Phung

This blog post summarizes the main features of modern C++ from the developer’s point of view. A must-read if you are preparing for an interview. Continue reading A summary of modern C++ features

Miscellaneous, Software DevelopmentLeave a comment

Pythonic Python

September 28, 2021January 24, 2023 Tung.M.Phung

A pool of unique features in the Python programming language. A must read if you are preparing for an interview. Continue reading Pythonic Python

Deep Learning, Machine Learning - Data Mining, Reinforcement LearningLeave a comment

Reinforcement learning: Q Learning, Deep Q Learning introduction with Tensorflow

September 7, 2021November 11, 2021 Tung.M.Phung

Q Learning, Deep Q Learning, Double Q Learning, and Dueling Q Learning explained. TensorFlow code provided. Continue reading Reinforcement learning: Q Learning, Deep Q Learning introduction with Tensorflow

Exploratory Data Analysis, Machine Learning - Data MiningLeave a comment

On ensuring fairness: Statistical parity vs Causal graphs

July 31, 2021July 31, 2021 Tung.M.Phung

We tackle the problem of ensuring fairness in machine learning, from using the traditional statistical parity to exploiting a causal network. Continue reading On ensuring fairness: Statistical parity vs Causal graphs

Machine Learning - Data Mining, Subgroup DiscoveryLeave a comment

Subgroup Discovery: Beyond coverage and mean-shift

June 26, 2021May 17, 2021 Tung.M.Phung

An outstanding subgroup might need more than just a big size and a difference average value from the population. Continue reading Subgroup Discovery: Beyond coverage and mean-shift

Deep Learning, Machine Learning - Data MiningLeave a comment

Some thoughts regarding Deep Learning’s achievements, hype, and challenges

May 29, 2021May 22, 2022 Tung.M.Phung

A critical view of Deep Learning. Continue reading Some thoughts regarding Deep Learning’s achievements, hype, and challenges

Exploratory Data Analysis, Feature Engineering, Machine Learning - Data Mining2 Comments

A survey of correlation analysis methods

May 24, 2021May 24, 2021 Tung.M.Phung

A summary of popular methods to analyze the dependency between variables. Continue reading A survey of correlation analysis methods

Machine Learning - Data Mining, MiscellaneousLeave a comment

Information Theory concepts: Entropy, Mutual Information, KL-Divergence, and more

May 20, 2021June 25, 2021 Tung.M.Phung

An intuitive introduction to Information Content, Encoding, Entropy, Joint Entropy, Cross Entropy, KL Divergence, Conditional Entropy, and Mutual Information… Continue reading Information Theory concepts: Entropy, Mutual Information, KL-Divergence, and more

Machine Learning - Data Mining, Text MiningLeave a comment

Bit-parallel algorithms for generalized string matching

April 17, 2021July 14, 2021 Tung.M.Phung

The extended Shift-AND algorithm for generalized string patterns. Continue reading Bit-parallel algorithms for generalized string matching

Machine Learning - Data Mining, String, Text Mining1 Comment

Shift-AND algorithm for exact pattern matching

April 15, 2021April 17, 2021 Tung.M.Phung

A simple introduction to the Shift-AND algorithm for string matching with example. Continue reading Shift-AND algorithm for exact pattern matching

Deep Learning, Machine Learning - Data Mining3 Comments

The Transformer neural network architecture

January 11, 2021January 23, 2024 Tung.M.Phung

The Transformer neural networks, explained in details. Continue reading The Transformer neural network architecture

Deep Learning, Machine Learning - Data MiningLeave a comment

Deep Learning normalization methods

January 6, 2021January 6, 2021 Tung.M.Phung

Introduction to and comparison of Batch Norm, Weight Norm, Layer Norm, Instance Norm, and Group Norm. Continue reading Deep Learning normalization methods

Deep Learning, Machine Learning - Data MiningLeave a comment

Attention in Deep Learning, your starting point (with code)

December 30, 2020December 31, 2020 Tung.M.Phung

Get your first intuition with Attention right with a minimal code-example. Continue reading Attention in Deep Learning, your starting point (with code)

Deep Learning, Machine Learning - Data MiningLeave a comment

LSTM: where to start?

December 27, 2020April 3, 2021 Tung.M.Phung

Where and how you should start with LSTM? Continue reading LSTM: where to start?

Machine Learning - Data Mining, Preparatory PhaseLeave a comment

Imbalanced Learning: sampling techniques

May 24, 2020April 10, 2022 Tung.M.Phung

Various types of sampling techniques for imbalanced datasets are discussed in depth with examples and analysis. Get yourself familiar with over-/under-sampling, SMOTE, ADA-SYN, sampling with cleaning, boosting, clustering, and more. Continue reading Imbalanced Learning: sampling techniques

Classification Models, Machine Learning - Data Mining, Regression ModelsLeave a comment

Ensemble: Bagging, Random Forest, Boosting and Stacking

May 4, 2020July 31, 2020 Tung.M.Phung

An ensemble of trees (in the form of bagging, random forest, or boosting) is usually preferred over one decision tree alone. Continue reading Ensemble: Bagging, Random Forest, Boosting and Stacking

Data Crawling and Scraping, Machine Learning - Data Mining4 Comments

Data Scraping: Android App Dataset from Google Play Store

April 30, 2020July 31, 2020 Tung.M.Phung

Google Play Store dataset with 53k apps and 1m4 comments, scraped on April 2020. Continue reading Data Scraping: Android App Dataset from Google Play Store

Data Crawling and Scraping, Machine Learning - Data MiningLeave a comment

Data scraping: KDnuggets.com’s post statistics

April 28, 2020July 31, 2020 Tung.M.Phung

This dataset attempts to encapsulate information about the statistics of their posts, from post name, author, published date, to the number of shares and comments, etc. Continue reading Data scraping: KDnuggets.com’s post statistics

Data Crawling and Scraping, Machine Learning - Data MiningLeave a comment

Data scraping: Phone dataset from Versus.com

April 27, 2020December 21, 2021 Tung.M.Phung

A dataset scraped from Versus.com, containing various phone types’ specifications, e.g. screen size, battery power, pixel density. Continue reading Data scraping: Phone dataset from Versus.com

Data Crawling and Scraping, Machine Learning - Data MiningLeave a comment

Data scraping: City dataset from Versus.com

April 26, 2020December 21, 2021 Tung.M.Phung

This City dataset is obtained from scraping Versus.com Continue reading Data scraping: City dataset from Versus.com

Data Cleaning, Machine Learning - Data MiningLeave a comment

Data Cleaning case study: Google Play Store Dataset

April 17, 2020July 31, 2020 Tung.M.Phung

This post attempts to give readers a practical example of how to clean a dataset (in particular, the Google Play Store Apps dataset). Continue reading Data Cleaning case study: Google Play Store Dataset

Database, Machine Learning - Data MiningLeave a comment

SQL – Postgresql Indexing

April 5, 2020July 31, 2020 Tung.M.Phung

This article attempts to give a comprehensive view of Index in Postgresql. Continue reading SQL – Postgresql Indexing

DataVisualization, Machine Learning - Data MiningLeave a comment

Coronavirus cases: an interactive geo-map with Plotly

April 1, 2020July 31, 2020 Tung.M.Phung

The new Coronavirus is spreading fiercely. This article describes how to make a geo-map to showcase the status of this disease around the world using Python and Plotly. Continue reading Coronavirus cases: an interactive geo-map with Plotly

Feature Engineering, Machine Learning - Data MiningLeave a comment

Principal Component Analysis fully explained

March 13, 2020December 30, 2020 Tung.M.Phung

This article attempts to make PCA crystal clear to anyone who wishes to understand it thoroughly, step-by-step, in both high and low-level concepts. Continue reading Principal Component Analysis fully explained

Database, Machine Learning - Data MiningLeave a comment

SQL – create and alter tables

March 7, 2020July 31, 2020 Tung.M.Phung

In this blog post, we introduce the creation and alteration of tables along with auxiliary information like data types and constraints. Continue reading SQL – create and alter tables

Database, Machine Learning - Data MiningLeave a comment

SQL – window functions

March 4, 2020July 31, 2020 Tung.M.Phung

A window function performs the calculation of each row over a set of rows (a group / a partition / a window of rows) that they belong to. Continue reading SQL – window functions

Database, Machine Learning - Data MiningLeave a comment

SQL – list of Postgresql functions

March 3, 2020July 31, 2020 Tung.M.Phung

This article summarize all functions in Postgresql, from numeric, string, datetime to other functions. Continue reading SQL – list of Postgresql functions

Database, Machine Learning - Data MiningLeave a comment

SQL – combining data (UNION, JOIN, etc.)

March 1, 2020July 31, 2020 Tung.M.Phung

In this blog post, we make it through the 2 means of combining data: row-wise (UNION, INTERSECT, etc.) and column-wise (JOIN, etc.) Continue reading SQL – combining data (UNION, JOIN, etc.)

Database, Machine Learning - Data MiningLeave a comment

SQL – aggregate functions

February 26, 2020July 31, 2020 Tung.M.Phung

SQL aggregation functions in Postgresql, including general-purpose, concatenative, statistics, ordered-sets, and ranking aggregation. Continue reading SQL – aggregate functions

Database, Machine Learning - Data MiningLeave a comment

SQL – an introduction to basic SELECT queries

February 24, 2020July 31, 2020 Tung.M.Phung

An introduction to SQL queries, including SELECT, FROM, WHERE, ORDER BY, LIMIT, alias (AS), DISTINCT, equal (=), not equal (!= or <>) Continue reading SQL – an introduction to basic SELECT queries

Machine Learning - Data Mining, Preparatory PhaseLeave a comment

Control Variable explained

February 17, 2020July 31, 2020 Tung.M.Phung

In statistics and data mining, we often encounter the word ‘control’, mostly from terms like control variables and control groups. In fact, a control variable has slightly different meanings in different fields Continue reading Control Variable explained

Deep Learning, Machine Learning - Data MiningLeave a comment

ELU activation: A comprehensive analysis

February 16, 2020July 31, 2020 Tung.M.Phung

Through various experiments, ELU is accepted by many researchers as a good successor of the original version (ReLU). Continue reading ELU activation: A comprehensive analysis

Machine Learning - Data Mining, StatisticsLeave a comment

Unpaired Two-sample T-test (Independent T-test)

January 21, 2020July 31, 2020 Tung.M.Phung

What is an Unpaired 2-sample T-test? Let’s analyze this definition from scratch. Continue reading Unpaired Two-sample T-test (Independent T-test)

Machine Learning - Data Mining, StatisticsLeave a comment

Paired Two-sample T-test (Dependent T-test)

January 18, 2020July 31, 2020 Tung.M.Phung

What is a Paired 2-sample T-test? Let’s analyze this definition from scratch. Continue reading Paired Two-sample T-test (Dependent T-test)

Classification Models, Machine Learning - Data MiningLeave a comment

Binary Classification Evaluation Summary

January 15, 2020July 31, 2020 Tung.M.Phung

This article attempts to summarize the popular evaluation metrics for binary classification problems. Continue reading Binary Classification Evaluation Summary

Classification Models, Machine Learning - Data MiningLeave a comment

Precision-Recall curve: an overview

January 12, 2020July 31, 2020 Tung.M.Phung

We introduce an alternative for the ROC: the Precision-Recall curve (PR-curve), which is a more reliable measurement for the cases when Positive samples are rare. Continue reading Precision-Recall curve: an overview

Classification Models, Machine Learning - Data MiningLeave a comment

ROC curve and AUC: a comprehensive overview

January 10, 2020July 31, 2020 Tung.M.Phung

The well-known ROC curve plot, the Area Under the ROC Curve (AUC), and its variants. Continue reading ROC curve and AUC: a comprehensive overview

Classification Models, Machine Learning - Data MiningLeave a comment

How to read the Confusion Matrix

January 8, 2020July 31, 2020 Tung.M.Phung

The Confusion Matrix is a square table representing the predictions of a classification model. Continue reading How to read the Confusion Matrix

Deep Learning, Machine Learning - Data MiningLeave a comment

Batch Normalization and why it works

January 6, 2020January 4, 2021 Tung.M.Phung

Batch Normalization (BatchNorm) is a very frequently used technique in Deep Learning, however, the reason why it works is often interpreted ambiguously. Continue reading Batch Normalization and why it works

Classification Models, Machine Learning - Data Mining6 Comments

Information Gain, Gain Ratio and Gini Index

January 4, 2020July 31, 2020 Tung.M.Phung

Information Gain, Gain Ratio and Gini Index are the three fundamental criteria to measure the quality of a split in Decision Tree. Continue reading Information Gain, Gain Ratio and Gini Index

Classification Models, Machine Learning - Data Mining2 Comments

Logistic Regression: Advantages and Disadvantages

December 28, 2019July 31, 2020 Tung.M.Phung

In the previous blogs, we have discussed Logistic Regression and its assumptions. Today, the main topic is the theoretical and empirical goods and bads of this model. Continue reading Logistic Regression: Advantages and Disadvantages

Deep Learning, Machine Learning - Data Mining1 Comment

Rectifier Linear Unit (ReLU)

December 28, 2019July 31, 2020 Tung.M.Phung

Despise its simplicity, ReLU previously achieved the top performance over various tasks of modern Machine Learning. Continue reading Rectifier Linear Unit (ReLU)

Deep Learning, Machine Learning - Data MiningLeave a comment

Sigmoid, tanh activations and their loss of popularity

December 27, 2019July 31, 2020 Tung.M.Phung

The sigmoid and tanh activation functions were very frequently used in the past but have been losing popularity in the era of Deep learning. Continue reading Sigmoid, tanh activations and their loss of popularity

Classification Models, Machine Learning - Data MiningLeave a comment

Naive Bayes classifier: a comprehensive guide

December 25, 2019July 31, 2020 Tung.M.Phung

In this blog post, we show and explain the Bayes formula, how to build a Naive Bayes classifier, its assumptions, strengths, and weakness. Continue reading Naive Bayes classifier: a comprehensive guide

Classification Models, Machine Learning - Data MiningLeave a comment

Assumptions of Logistic Regression

December 22, 2019July 31, 2020 Tung.M.Phung

When these requirements, or assumptions, hold true, we know that our Logistic model has expressed the best performance it can. Continue reading Assumptions of Logistic Regression

Machine Learning - Data Mining, MiscellaneousLeave a comment

How Intelligent systems damage the data

December 21, 2019July 31, 2020 Tung.M.Phung

Applications, such as using Machine Learning to boost the recruitment process, may bring more harm than good. Continue reading How Intelligent systems damage the data

Exploratory Data Analysis, Machine Learning - Data MiningLeave a comment

Does Causality imply Correlation?

December 21, 2019July 31, 2020 Tung.M.Phung

I have seen enough threads saying that Correlation does NOT imply Causality. Yes, that is true, but how about the other way around? Continue reading Does Causality imply Correlation?

Feature Engineering, Machine Learning - Data MiningLeave a comment

When to add a dummy variable?

December 19, 2019July 31, 2020 Tung.M.Phung

A dummy variable is a variable (or feature, predictor, column) whose values can be either 0 or 1. Continue reading When to add a dummy variable?

Feature Engineering, Machine Learning - Data Mining3 Comments

How to convert Categorical Variables to Numerical Variables

December 18, 2019July 31, 2020 Tung.M.Phung

In Machine Learning, while some predictive models allow categorical variables in the data, most require all predictor variables to be continuous Continue reading How to convert Categorical Variables to Numerical Variables

Classification Models, Machine Learning - Data MiningLeave a comment

Logistic Regression tutorial

December 13, 2019December 3, 2020 Tung.M.Phung

Following the previous overview, this article attempts to delve deeper into Logistic Regression. Continue reading Logistic Regression tutorial

Classification Models, Machine Learning - Data MiningLeave a comment

An overview of Logistic Regression

December 7, 2019July 31, 2020 Tung.M.Phung

The first blog on a series about Logistic Regression. Continue reading An overview of Logistic Regression

Machine Learning - Data Mining, Regression ModelsLeave a comment

Linear Regression in Python

December 6, 2019July 31, 2020 Tung.M.Phung

Train and cross-validate your Linear regression on Python with pre-defined or customized evaluation functions. Continue reading Linear Regression in Python

Feature Engineering, Machine Learning - Data MiningLeave a comment

When to do feature centering, scaling and normalization?

December 5, 2019July 31, 2020 Tung.M.Phung

Many people have a tendency to always do feature centering, scaling or normalizing right before applying predictive models to the data… Continue reading When to do feature centering, scaling and normalization?

Machine Learning - Data Mining, MiscellaneousLeave a comment

Parametric vs Non-parametric algorithms

November 28, 2019July 31, 2020 Tung.M.Phung

How do we distinguish Parametric and Non-parametric algorithms? By reading this article. Continue reading Parametric vs Non-parametric algorithms

Machine Learning - Data Mining, Regression ModelsLeave a comment

How to make a Linear Regressor? (theory)

November 24, 2019July 31, 2020 Tung.M.Phung

This article presents the formulas for coming up with the best-fitted linear regression line. Continue reading How to make a Linear Regressor? (theory)

Machine Learning - Data Mining, Preparatory PhaseLeave a comment

Splitting data into a Training set and a Validation set

November 23, 2019July 31, 2020 Tung.M.Phung

In a machine learning project, after crawling or collecting data, we have to split it into at least 2 parts: training and validation data. Continue reading Splitting data into a Training set and a Validation set

Machine Learning - Data Mining, StatisticsLeave a comment

T-statistic, T-test and the T family

November 12, 2019July 31, 2020 Tung.M.Phung

T-score, t-statistic, t-distribution and t-test belong to the T-family, which is very closely related to the Z-family. Continue reading T-statistic, T-test and the T family

Machine Learning - Data Mining, Regression ModelsLeave a comment

Confidence Intervals for Linear Regression Coefficients

November 9, 2019November 15, 2020 Tung.M.Phung

Not only does Linear regression give us a model for prediction, but it also tells us about how accurate the model is, by the means of Confidence Intervals. Continue reading Confidence Intervals for Linear Regression Coefficients

Intro to ML, Machine Learning - Data MiningLeave a comment

Introduction to Python and Jupyter

October 27, 2019July 31, 2020 Tung.M.Phung

Python, the most common programming language for practicing Machine Learning – Data Mining (ML and DM) today, and Jupyter, a convenient environment for writing Python code. Continue reading Introduction to Python and Jupyter

Machine Learning - Data Mining, Statistics2 Comments

Hypothesis Testing

October 18, 2019July 31, 2020 Tung.M.Phung

Hypothesis Testing is the process of verifying if a hypothesis is viable or not, i.e. should we reject a hypothesis in favor of the other. Continue reading Hypothesis Testing

Machine Learning - Data Mining, Statistics4 Comments

Z-score on a sample set

October 13, 2019July 31, 2020 Tung.M.Phung

We take a set of samples from a given Normal distribution. How extreme is this set? Continue reading Z-score on a sample set

Machine Learning - Data Mining, Statistics2 Comments

Z-score, Z-statistic, Z-test, Z-distribution

October 12, 2019July 31, 2020 Tung.M.Phung

Z-score (together with Z-test, Z-distribution, Z-statistic, etc.) is a very frequently used term from statistics being applied in Machine Learning. Continue reading Z-score, Z-statistic, Z-test, Z-distribution

Machine Learning - Data Mining, Regression ModelsLeave a comment

Disadvantages of Linear Regression

October 12, 2019July 31, 2020 Tung.M.Phung

If you are considering using Linear regression for your production pipeline, you should be aware of its 4 drawbacks. Continue reading Disadvantages of Linear Regression

Machine Learning - Data Mining, Regression ModelsLeave a comment

Advantages of Linear Regression

October 8, 2019July 31, 2020 Tung.M.Phung

Linear regression is frequently used in practice because of these 7 reasons. Continue reading Advantages of Linear Regression

Machine Learning - Data Mining, Regression ModelsLeave a comment

Assumptions of Linear Regression

October 6, 2019November 15, 2020 Tung.M.Phung

In which cases does Linear Regression perform the best? In which cases should we use other algorithms? Continue reading Assumptions of Linear Regression

Machine Learning - Data Mining, MiscellaneousLeave a comment

Over-fitting and Under-fitting

October 5, 2019July 31, 2020 Tung.M.Phung

Let’s examine everything we need to know about over-fitting and under-fitting. Continue reading Over-fitting and Under-fitting

Machine Learning - Data Mining, Regression ModelsLeave a comment

Regularization for Linear regression

October 5, 2019July 31, 2020 Tung.M.Phung

We tackle Regularization for Linear Regression by answering 5 questions: What, When, Where, How, and Why? Continue reading Regularization for Linear regression

Machine Learning - Data Mining, Regression ModelsLeave a comment

Regression Objective and Evaluation Functions

October 4, 2019November 15, 2020 Tung.M.Phung

This article differentiate objective functions from evaluation functions and elaborate some examples of them. Continue reading Regression Objective and Evaluation Functions

Machine Learning - Data Mining, Regression ModelsLeave a comment

Types of Linear regression

September 30, 2019July 31, 2020 Tung.M.Phung

a number of different types of Linear regression based on various points of view Continue reading Types of Linear regression

Machine Learning - Data Mining, Regression ModelsLeave a comment

Introduction to Linear Regression

September 29, 2019July 31, 2020 Tung.M.Phung

Linear regression is arguably the most popular Machine learning model out there. Continue reading Introduction to Linear Regression

DataVisualization, Machine Learning - Data MiningLeave a comment

Charts to compare different objects

September 28, 2019July 31, 2020 Tung.M.Phung

This is my 5th blog on a series of data visualization with charts for specific purposes. Continue reading Charts to compare different objects

Exploratory Data Analysis, Machine Learning - Data Mining2 Comments

QQ-plot versus PP-plot versus Probability plot

September 25, 2019July 31, 2020 Tung.M.Phung

This blog discuss QQ-plot, PP-plot, Probability plot, the relationship and the confusion amongst these three. Continue reading QQ-plot versus PP-plot versus Probability plot

Intro to ML, Machine Learning - Data MiningLeave a comment

Machine Learning Applications

September 21, 2019July 31, 2020 Tung.M.Phung

The top ML’s applications that are around to enhance the satisfaction of our lives. Continue reading Machine Learning Applications

Intro to ML, Machine Learning - Data MiningLeave a comment

Numpy, Pandas, Scikit-learn and Matplotlib

September 20, 2019July 31, 2020 Tung.M.Phung

Mining from data is not a simple task and the help of libraries makes the process more ẹnoyable. Continue reading Numpy, Pandas, Scikit-learn and Matplotlib

Machine Learning - Data Mining, Text MiningLeave a comment

Optimal alignment – Needleman Wunsch Algorithm

September 19, 2019July 31, 2020 Tung.M.Phung

Needleman Wunsch Algorithm utilizes Dynamic programming to align 2 sequences in the optimal way. Continue reading Optimal alignment – Needleman Wunsch Algorithm

Machine Learning - Data Mining, Text MiningLeave a comment

Basic Regular Expression

September 18, 2019July 31, 2020 Tung.M.Phung

The Regular expression (regex, or just re) is a means of representation, used for string matching and searching. Continue reading Basic Regular Expression

Intro to ML, Machine Learning - Data MiningLeave a comment

Different types of Machine Learning

September 16, 2019July 31, 2020 Tung.M.Phung

Human learning and machine learning are very similar. They are similar in the sense that both involve learning, the process of understanding and creating new knowledge. Continue reading Different types of Machine Learning

DataVisualization, Machine Learning - Data MiningLeave a comment

Charts to show relationships between (or among) variables

September 14, 2019July 31, 2020 Tung.M.Phung

This is my 4th blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show relationships between (or among) variables

DataVisualization, Machine Learning - Data MiningLeave a comment

Charts to show trends

September 14, 2019July 31, 2020 Tung.M.Phung

This is my 3rd blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show trends

DataVisualization, Machine Learning - Data MiningLeave a comment

Charts to show the distribution

September 12, 2019July 31, 2020 Tung.M.Phung

This is my 2rd blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show the distribution

DataVisualization, Machine Learning - Data MiningLeave a comment

Charts to show the proportion

September 11, 2019July 31, 2020 Tung.M.Phung

This is my first blog on a series of data visualization with charts for specific purposes. Continue reading Charts to show the proportion

DataVisualization, Machine Learning - Data MiningLeave a comment

Basic plots with Matplotlib

September 7, 2019July 31, 2020 Tung.M.Phung

Matplotlib is undeniably the most prevalent name in the family of visualization libraries in Python. Continue reading Basic plots with Matplotlib

Feature Engineering, Machine Learning - Data MiningLeave a comment

Feature selection with sklearn

September 4, 2019July 31, 2020 Tung.M.Phung

Feature selection is hard but very important. Continue reading Feature selection with sklearn

Feature Engineering, Machine Learning - Data MiningLeave a comment

How to deal with missing values (NaNs)

August 31, 2019July 31, 2020 Tung.M.Phung

This blog post attempts to address why NaNs are bad and how we can fix them. Continue reading How to deal with missing values (NaNs)

Machine Learning - Data MiningLeave a comment

Data Mining – Machine Learning

August 30, 2019December 28, 2023 Tung.M.Phung

Getting started with Machine Learning What are Computer Science, Artificial Intelligence and Machine Learning? Different types of Machine Learning Machine Learning Applications Introduction to Python and Jupyter Numpy, Pandas, Scikit-learn and Matplotlib Data Scraping Data scraping: City dataset from Versus.com Data scraping: Phone dataset from Versus.com Data scraping: KDnuggets.com’s post statistics Data Scraping: Android App Dataset from Google Play Store Data Cleaning Data Cleaning case study: Google Play Store Dataset Preparatory Phase Control Variable Splitting data into a Training set and a Validation set Imbalanced Learning: sampling techniques Exploratory Data Analysis QQ plot versus PP plot versus Probability plot Multicollinearity … Continue reading Data Mining – Machine Learning

Intro to ML, Machine Learning - Data MiningLeave a comment

What are Computer Science, Artificial Intelligence and Machine Learning?

August 30, 2019July 31, 2020 Tung.M.Phung

There are many people who get confused when hearing about Computer Science (CS), Artificial Intelligence (AI), and Machine Learning (ML). Continue reading What are Computer Science, Artificial Intelligence and Machine Learning?

Exploratory Data Analysis, Machine Learning - Data MiningLeave a comment

What is Multicollinearity (or Collinearlity) ?

August 28, 2019July 31, 2020 Tung.M.Phung

To better understand the definition of collinearity, let’s start with an example… Continue reading What is Multicollinearity (or Collinearlity) ?