Data Mining – Machine Learning

Data Mining and Machine Learning

Getting started with Machine Learning

What are Computer Science, Artificial Intelligence and Machine Learning?

Different types of Machine Learning

Machine Learning Applications

Introduction to Python and Jupyter

Numpy, Pandas, Scikit-learn and Matplotlib

Data Scraping

Data scraping: City dataset from Versus.com

Data scraping: Phone dataset from Versus.com

Data scraping: KDnuggets.com’s post statistics

Data Scraping: Android App Dataset from Google Play Store

Data Cleaning

Data Cleaning case study: Google Play Store Dataset

Preparatory Phase

Control Variable

Splitting data into a Training set and a Validation set

Imbalanced Learning: sampling techniques

Exploratory Data Analysis

QQ plot versus PP plot versus Probability plot

Multicollinearity (or Collinearlity)

Does Causality imply Correlation?

Feature Engineering

How to deal with missing values (NaNs)

Feature Selection with sklearn

When to do feature centering, scaling and normalization?

How to convert Categorical Variables to Numerical Variables?

When to add a dummy variable?

Principal Component Analysis

Regression Models

Linear Regression

Introduction to Linear Regression

Types of Linear Regression

How to make a Linear Regressor? (theory)

Regularization

Linear Regression in Python

Assumptions

Confidence Intervals for Coefficients

Advantages

Disadvantages

Evaluation

Regression Objective and Evaluation Functions

Classification Models

Logistic Regression

An overview of Logistic Regression

Logistic Regression tutorial

Assumptions

Advantages and Disadvantages

Other algorithms

Naive Bayes classifier: a comprehensive guide

Information Gain, Gain Ratio and Gini Index

Ensemble: Bagging, Random Forest, Boosting and Stacking

Evaluation

How to read the Confusion Matrix

ROC curve and AUC: a comprehensive overview

Precision-Recall curve: an overview

Binary Classification Evaluation Summary

Deep Learning

Sigmoid, tanh activations and their loss of popularity

Rectifier Linear Unit (ReLU)

ELU activation: A comprehensive analysis

Batch Normalization and why it works

LSTM: where to start?

Attention in Deep Learning, your starting point (with code)

The Transformer neural network architecture

Deep Learning normalization methods

Data Visualization

Basic plots with Matplotlib

Charts to show proportions

Charts to show distributions

Charts to show trends

Charts to show relationships between variables

Charts to compare different samples

Coronavirus cases: an interactive geo-map with Plotly

Story Telling

Text Mining

Basic regular expression

Optimal aligning – Needleman Wunsch Algorithm

Shift-AND algorithm for exact pattern matching

Bit-parallel algorithms for generalized string matching

Statistics

Hypothesis testing

Z-score, Z-statistic, Z-test, Z-distribution

Z-score on a sample set

T-statistic, T-test and the T-family

Paired Two-sample T-test (Dependent T-test)

Unpaired Two-sample T-test (Independent T-test)

Miscellaneous

Over-fitting and Under-fitting

Parametric vs Non-parametric algorithms

How Intelligent systems damage the data

Database

SQL – an introduction to basic SELECT queries

SQL – aggregate functions

SQL – combining data (UNION, JOIN, etc.)

SQL – list of Postgresql functions

SQL – window functions

SQL – create and alter tables

SQL – Postgresql Indexing

Leave a Reply