Introduction - basic fitting and plotting of parametric survival models with the survival package
survival is the most commonly used package for survival analysis in R. Using the built-in
lung dataset we can get started with Survival Analysis by fitting a regression model with the
survreg() function, creating a curve with
survfit(), and plotting predicted survival curves by calling the
predict method for this package with new data.
In the example below we plot 2 predicted curves and vary
sex between the 2 sets of new data, to visualize its effect:
require(survival) s <- with(lung,Surv(time,status)) sWei <- survreg(s ~ as.factor(sex)+age+ph.ecog+wt.loss+ph.karno,dist='weibull',data=lung) fitKM <- survfit(s ~ sex,data=lung) plot(fitKM) lines(predict(sWei, newdata = list(sex = 1, age = 1, ph.ecog = 1, ph.karno = 90, wt.loss = 2), type = "quantile", p = seq(.01, .99, by = .01)), seq(.99, .01, by =-.01), col = "blue") lines(predict(sWei, newdata = list(sex = 2, age = 1, ph.ecog = 1, ph.karno = 90, wt.loss = 2), type = "quantile", p = seq(.01, .99, by = .01)), seq(.99, .01, by =-.01), col = "red")
Kaplan Meier estimates of survival curves and risk set tables with survminer
install.packages('survminer') source("https://bioconductor.org/biocLite.R") biocLite("RTCGA.clinical") # data for examples library(RTCGA.clinical) survivalTCGA(BRCA.clinical, OV.clinical, extract.cols = "admin.disease_code") -> BRCAOV.survInfo library(survival) fit <- survfit(Surv(times, patient.vital_status) ~ admin.disease_code, data = BRCAOV.survInfo) library(survminer) ggsurvplot(fit, risk.table = TRUE)
ggsurvplot( fit, # survfit object with calculated statistics. risk.table = TRUE, # show risk table. pval = TRUE, # show p-value of log-rank test. conf.int = TRUE, # show confidence intervals for # point estimaes of survival curves. xlim = c(0,2000), # present narrower X axis, but not affect # survival estimates. break.time.by = 500, # break X axis in time intervals by 500. ggtheme = theme_RTCGA(), # customize plot and risk table with a theme. risk.table.y.text.col = T, # colour risk table text annotations. risk.table.y.text = FALSE # show bars instead of names in text annotations # in legend of risk table )
Random Forest Survival Analysis with randomForestSRC
Just as the random forest algorithm may be applied to regression and classification tasks, it can also be extended to survival analysis.
In the example below a survival model is fit and used for prediction, scoring, and performance analysis using the package
randomForestSRC from CRAN.
require(randomForestSRC) set.seed(130948) #Other seeds give similar comparative results x1 <- runif(1000) y <- rnorm(1000, mean = x1, sd = .3) data <- data.frame(x1 = x1, y = y) head(data)
x1 y 1 0.9604353 1.3549648 2 0.3771234 0.2961592 3 0.7844242 0.6942191 4 0.9860443 1.5348900 5 0.1942237 0.4629535 6 0.7442532 -0.0672639
(modRFSRC <- rfsrc(y ~ x1, data = data, ntree=500, nodesize = 5))
Sample size: 1000 Number of trees: 500 Minimum terminal node size: 5 Average no. of terminal nodes: 208.258 No. of variables tried at each split: 1 Total no. of variables: 1 Analysis: RF-R Family: regr Splitting rule: mse % variance explained: 32.08 Error rate: 0.11
x1new <- runif(10000) ynew <- rnorm(10000, mean = x1new, sd = .3) newdata <- data.frame(x1 = x1new, y = ynew) survival.results <- predict(modRFSRC, newdata = newdata) survival.results
Sample size of test (predict) data: 10000 Number of grow trees: 500 Average no. of grow terminal nodes: 208.258 Total no. of grow variables: 1 Analysis: RF-R Family: regr % variance explained: 34.97 Test set error rate: 0.11