This post will briefly share the derivation, estimation, assumption and application of the Cox proportional hazards (PH) model. In addition, it will also mention using ANOVA to test two nested models.
This post is to share the two common non-parametric tests of comparing the survival functions: Log-Rank Test & Generalized Wilcoxon Test, as well as their corresponding calculations in the detailed process.
Concepts of survival function estimations and corresponding calculations both manually and in R.
Intromental variables (IV) is an alternative causal inference method that does not rely on the ignorability assumption.
In this post we will continue on discussing the estimate of causal effects. We will talk about intuition of IPTW, some key definitions like weighting, marginal structual models. And in the end we will show a data example in R.
In the Part 1 we talked about the basic concepts of causal effect and confounding. In this post we will proceeed with discussing about how to control the confounders with matching.
Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:
This post investigates the five factors that are related to anaemia in children by using the data collected from the World Health Organization. The method we will use is LASSO, which is a classic penalized regression. In this post we will see how LASSO filter out the variable for us and its prediction performance compared with our baseline model, linear regression.
To implement LASSO in R, the package I used is "glmnet".
Tree-based methods are conceptually easy to comprehend and they render advantages like easy visualization and data-preprocessing. It is a powerful tool for both numeric and categorical prediction. In this post I will introduce how to predict baseball player salary by Decision Tree and Random Forest from algorithm coding to package usage.
Expectation-maximization (EM) algorithm is a powerful unsupervised
machine learning tool. Conceptually, It is quite similar to k-means
algorithm, which I shared in this post.
However, instead of clustering through estimated means, it cluster
through estimating the distributions parameters and then evaluate how
likely is each observation belong to distributions. Another difference
is that EM uses soft assignment while k-means uses hard assignment.
In this post I will share some frequently used ggplot2 commands when
making data visualization.
GARCH is a well known model to capture the volatility in the data. It can be useful to deal with the financial or time series data. This blog will explain the model structure, intuition, application and evaluation.
This project is focused on solving the question: Is it possible to let the machine evaluate a wine like a sommelier?
The answer is yes! With the help of simple Neural Network and Long short-term memory(LSTM), we can make it possible.