R

Survival Analysis 4: Cox proportional hazards model

2023-08-30

This post will briefly share the derivation, estimation, assumption and application of the Cox proportional hazards (PH) model. In addition, it will also mention using ANOVA to test two nested models.

Read more

Survival Analysis 3: Non-Parametric Comparison of Survival Functions

2023-07-27

This post is to share the two common non-parametric tests of comparing the survival functions: Log-Rank Test & Generalized Wilcoxon Test, as well as their corresponding calculations in the detailed process.

Read more

Survival Analysis 2: Non-Parametric Estimation of Survival Functions

2023-07-21

Concepts of survival function estimations and corresponding calculations both manually and in R.

Read more
Survival Analysis 2: Non-Parametric Estimation of Survival Functions

Causal Inference 4: Instrumental variable

2021-06-03

Intromental variables (IV) is an alternative causal inference method that does not rely on the ignorability assumption.

Read more

Causal Inference 3: Inverse probability of treatment weighting, IPTW

2021-05-28

In this post we will continue on discussing the estimate of causal effects. We will talk about intuition of IPTW, some key definitions like weighting, marginal structual models. And in the end we will show a data example in R.

Read more

Causal Inference 2: Propensity Score and Matching

2021-05-05

In the Part 1 we talked about the basic concepts of causal effect and confounding. In this post we will proceeed with discussing about how to control the confounders with matching.

Read more

Causal Inference 1: Causal Effects and Confounding

2021-04-15

Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:

  • The confusion over causal inference
  • The important causal assumptions
  • The concept of causal effects
  • Confounding and Directed Acyclic Graphs
Read more

Prediction of Children Anaemia Rate by LASSO

2021-03-06

This post investigates the five factors that are related to anaemia in children by using the data collected from the World Health Organization. The method we will use is LASSO, which is a classic penalized regression. In this post we will see how LASSO filter out the variable for us and its prediction performance compared with our baseline model, linear regression.
To implement LASSO in R, the package I used is "glmnet".

Read more

Regression Tree, Random Forest and XGBoost Algorithm

2021-02-06

Tree-based methods are conceptually easy to comprehend and they render advantages like easy visualization and data-preprocessing. It is a powerful tool for both numeric and categorical prediction. In this post I will introduce how to predict baseball player salary by Decision Tree and Random Forest from algorithm coding to package usage.

Read more
Regression Tree, Random Forest and XGBoost Algorithm

The EM Algorithm from Scratch

2021-01-30

Expectation-maximization (EM) algorithm is a powerful unsupervised
machine learning tool. Conceptually, It is quite similar to k-means
algorithm, which I shared in this post.
However, instead of clustering through estimated means, it cluster
through estimating the distributions parameters and then evaluate how
likely is each observation belong to distributions. Another difference
is that EM uses soft assignment while k-means uses hard assignment.

Read more
The EM Algorithm from Scratch

Data Visualization with ggplot2

2021-01-28

In this post I will share some frequently used ggplot2 commands when
making data visualization.

Read more
Data Visualization with ggplot2

K-means Clustering Algorithm from Scratch

2021-01-23
The k-means algorithm is a well-known unsupervised machine learning algorithm. From The elements of Statistical Learnin...
Read more

Modelling Daily Dow Jones Industrial Average by GARCH

2021-01-14

GARCH is a well known model to capture the volatility in the data. It can be useful to deal with the financial or time series data. This blog will explain the model structure, intuition, application and evaluation.

Read more
Modelling Daily Dow Jones Industrial Average by GARCH

Evaluate Wine by LSTM and Simple NN

2021-01-10

This project is focused on solving the question: Is it possible to let the machine evaluate a wine like a sommelier?
The answer is yes! With the help of simple Neural Network and Long short-term memory(LSTM), we can make it possible.

Read more
Evaluate Wine by LSTM and Simple NN