I want to do a linear regression in R using the
lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm responses. I can imagine doing for loop for each state then doing the regression inside the loop and adding the results of each regression to a vector. That does not seem very R-like, however. In SAS I would do a 'by' statement and in SQL I would do a 'group by'. What's the R way of doing this?
Here's one way using the
library(lme4) d <- data.frame(state=rep(c('NY', 'CA'), c(10, 10)), year=rep(1:10, 2), response=c(rnorm(10), rnorm(10))) xyplot(response ~ year, groups=state, data=d, type='l') fits <- lmList(response ~ year | state, data=d) fits #------------ Call: lmList(formula = response ~ year | state, data = d) Coefficients: (Intercept) year CA -1.34420990 0.17139963 NY 0.00196176 -0.01852429 Degrees of freedom: 20 total; 16 residual Residual standard error: 0.8201316
Here's an approach using the plyr package:
d <- data.frame( state = rep(c('NY', 'CA'), 10), year = rep(1:10, 2), response= rnorm(20) ) library(plyr) # Break up d by state, then fit the specified model to each piece and # return a list models <- dlply(d, "state", function(df) lm(response ~ year, data = df)) # Apply coef to each model and return a data frame ldply(models, coef) # Print the summary of each model l_ply(models, summary, .print = TRUE)