2021-05-05

In the Part 1 we talked about the basic concepts of causal effect and confounding. In this post we will proceeed with discussing about how to control the confounders with matching.

2021-04-15

Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:

- The confusion over causal inference
- The important causal assumptions
- The concept of causal effects
- Confounding and Directed Acyclic Graphs

2021-03-06

This post investigates the five factors that are related to anaemia in children by using the data collected from the World Health Organization. The method we will use is LASSO, which is a classic penalized regression. In this post we will see how LASSO filter out the variable for us and its prediction performance compared with our baseline model, linear regression.

To implement LASSO in R, the package I used is "glmnet".

2021-02-07

Intuitively, simple neural network is a combination of many (linear) transformations, which is similar to mixture model in some way. It allows to transform the input data in a more sophisticated way that a single linear model could not achieve. Simple neural network is the foundation for many other more advanced neural network models e.g., Recurrent Neural Network and Long Short Term Memory (LSTM). By the way, I posted a project of LSTM here please feel free to check it out if you are interested.

The content of this post includes:

- The basics of
**feedforward neural network**. - The application of it with the help of TensorFlow and Keras.
- Several useful parameter tunings.

The main reference is Deep Learning, Goodfellow et al, Chapter 6.

2021-02-06

Tree-based methods are conceptually easy to comprehend and they render advantages like easy visualization and data-preprocessing. It is a powerful tool for both numeric and categorical prediction. In this post I will introduce how to predict baseball player salary by Decision Tree and Random Forest from algorithm coding to package usage.

2021-01-30

Expectation-maximization (EM) algorithm is a powerful unsupervised

machine learning tool. Conceptually, It is quite similar to k-means

algorithm, which I shared in this post.

However, instead of clustering through estimated means, it cluster

through estimating the distributions parameters and then evaluate how

likely is each observation belong to distributions. Another difference

is that EM uses soft assignment while k-means uses hard assignment.

2021-01-28

In this post I will share some frequently used ggplot2 commands when

making data visualization.

2021-01-23

The k-means algorithm is a well-known unsupervised machine learning algorithm. From The elements of Statistical Learnin...

Read more
2021-01-14

**GARCH** is a well known model to capture the volatility in the data. It can be useful to deal with the financial or time series data. This blog will explain the model structure, intuition, application and evaluation.