How to read data when some numbers contain commas as thousand separator?


I have a csv file where some of the numerical values are expressed as strings with commas as thousand separator, e.g. "1,513" instead of 1513. What is the simplest way to read the data into R?

I can use read.csv(..., colClasses="character"), but then I have to strip out the commas from the relevant elements before converting those columns to numeric, and I can't find a neat way to do that.

3/19/2019 2:46:44 AM

Accepted Answer

I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub, I think this is about as neat as I can do:

x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})
10/6/2009 3:15:28 AM

Not sure about how to have read.csv interpret it properly, but you can use gsub to replace "," with "", and then convert the string to numeric using as.numeric:

y <- c("1,200","20,000","100","12,111")
as.numeric(gsub(",", "", y))
# [1]  1200 20000 100 12111

This was also answered previously on R-Help (and in Q2 here).

Alternatively, you can pre-process the file, for instance with sed in unix.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow