Causal Inference 2: Propensity Score and Matching

2021-05-05

In the Part 1 we talked about the basic concepts of causal effect and confounding. In this post we will proceeed with discussing about how to control the confounders with matching.

Read more

Causal Inference 1: Causal Effects and Confounding

2021-04-15

Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:

  • The confusion over causal inference
  • The important causal assumptions
  • The concept of causal effects
  • Confounding and Directed Acyclic Graphs
Read more

Prediction of Children Anaemia Rate by LASSO

2021-03-06

This post investigates the five factors that are related to anaemia in children by using the data collected from the World Health Organization. The method we will use is LASSO, which is a classic penalized regression. In this post we will see how LASSO filter out the variable for us and its prediction performance compared with our baseline model, linear regression.
To implement LASSO in R, the package I used is "glmnet".

Read more

Classification of Image Data by Simple NN

2021-02-07

Intuitively, simple neural network is a combination of many (linear) transformations, which is similar to mixture model in some way. It allows to transform the input data in a more sophisticated way that a single linear model could not achieve. Simple neural network is the foundation for many other more advanced neural network models e.g., Recurrent Neural Network and Long Short Term Memory (LSTM). By the way, I posted a project of LSTM here please feel free to check it out if you are interested.

The content of this post includes:

  1. The basics of feedforward neural network.
  2. The application of it with the help of TensorFlow and Keras.
  3. Several useful parameter tunings.

The main reference is Deep Learning, Goodfellow et al, Chapter 6.

Read more
Classification of Image Data by Simple NN

Regression Tree, Random Forest and XGBoost Algorithm

2021-02-06

Tree-based methods are conceptually easy to comprehend and they render advantages like easy visualization and data-preprocessing. It is a powerful tool for both numeric and categorical prediction. In this post I will introduce how to predict baseball player salary by Decision Tree and Random Forest from algorithm coding to package usage.

Read more
Regression Tree, Random Forest and XGBoost Algorithm

The EM Algorithm from Scratch

2021-01-30

Expectation-maximization (EM) algorithm is a powerful unsupervised
machine learning tool. Conceptually, It is quite similar to k-means
algorithm, which I shared in this post.
However, instead of clustering through estimated means, it cluster
through estimating the distributions parameters and then evaluate how
likely is each observation belong to distributions. Another difference
is that EM uses soft assignment while k-means uses hard assignment.

Read more
The EM Algorithm from Scratch

Data Visualization with ggplot2

2021-01-28

In this post I will share some frequently used ggplot2 commands when
making data visualization.

Read more
Data Visualization with ggplot2

K-means Clustering Algorithm from Scratch

2021-01-23
The k-means algorithm is a well-known unsupervised machine learning algorithm. From The elements of Statistical Learnin...
Read more

Modelling Daily Dow Jones Industrial Average by GARCH

2021-01-14

GARCH is a well known model to capture the volatility in the data. It can be useful to deal with the financial or time series data. This blog will explain the model structure, intuition, application and evaluation.

Read more
Modelling Daily Dow Jones Industrial Average by GARCH