In the Part 1 we talked about the basic concepts of causal effect and confounding. In this post we will proceeed with discussing about how to control the confounders with matching.
Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:
This post investigates the five factors that are related to anaemia in children by using the data collected from the World Health Organization. The method we will use is LASSO, which is a classic penalized regression. In this post we will see how LASSO filter out the variable for us and its prediction performance compared with our baseline model, linear regression.
To implement LASSO in R, the package I used is "glmnet".
Intuitively, simple neural network is a combination of many (linear) transformations, which is similar to mixture model in some way. It allows to transform the input data in a more sophisticated way that a single linear model could not achieve. Simple neural network is the foundation for many other more advanced neural network models e.g., Recurrent Neural Network and Long Short Term Memory (LSTM). By the way, I posted a project of LSTM here please feel free to check it out if you are interested.
The content of this post includes:
The main reference is Deep Learning, Goodfellow et al, Chapter 6.
Tree-based methods are conceptually easy to comprehend and they render advantages like easy visualization and data-preprocessing. It is a powerful tool for both numeric and categorical prediction. In this post I will introduce how to predict baseball player salary by Decision Tree and Random Forest from algorithm coding to package usage.
Expectation-maximization (EM) algorithm is a powerful unsupervised
machine learning tool. Conceptually, It is quite similar to k-means
algorithm, which I shared in this post.
However, instead of clustering through estimated means, it cluster
through estimating the distributions parameters and then evaluate how
likely is each observation belong to distributions. Another difference
is that EM uses soft assignment while k-means uses hard assignment.
In this post I will share some frequently used ggplot2 commands when
making data visualization.
GARCH is a well known model to capture the volatility in the data. It can be useful to deal with the financial or time series data. This blog will explain the model structure, intuition, application and evaluation.