Merging columns within a dataframe for variable number of columns in r -
i have data frame below. merge columns v2 onwards (pref common between) exclude nas in merged column. there variable amounts of columns nas in each row.
v1 v2 v3 v4 v5 v6 v7 chr11:69464719-69502928 ccnd1 oraov1 na na na na chr7:55075808-55093954 egfr na na na na na chr3:169389459-169490555 terc arpm1 na na na na chr1:150496857-150678056 ensa mcl1 adamtsl4 golph3l hormad1 mir4257 the result is:
v1 v2 chr11:69464719-69502928 ccnd1,oraov1 chr7:55075808-55093954 egfr chr3:169389459-169490555 terc,arpm1 chr1:150496857-150678056 ensa,mcl1,adamtsl4,golph3l,hormad1,mir4257 i know how concatenate fixed columns variable columns exclusion of na has thrown me.
we loop on rows using apply margin=1 (excluding 1st column), paste non-na elements (tostring wrapper paste(., collapse=', '))
v2 <- apply(df1[-1],1, function(x) tostring(x[!is.na(x)])) res <- data.frame(v1=df1[,1], v2, stringsasfactors=false) res # v1 v2 #1 chr11:69464719-69502928 ccnd1, oraov1 #2 chr7:55075808-55093954 egfr #3 chr3:169389459-169490555 terc, arpm1 #4 chr1:150496857-150678056 ensa, mcl1, adamtsl4, golph3l, hormad1, mir4257 or using melt data.table, convert dataset long form, group 'v1' , paste elements of 'value' column. initially, convert 'data.frame' 'data.table' using setdt.
library(data.table) melt(setdt(df1), id.var='v1', na.rm=true)[, list(v2=tostring(value)) , v1] # v1 v2 #1: chr11:69464719-69502928 ccnd1, oraov1 #2: chr7:55075808-55093954 egfr #3: chr3:169389459-169490555 terc, arpm1 #4: chr1:150496857-150678056 ensa, mcl1, adamtsl4, golph3l, hormad1, mir4257 data
df1 <- structure(list(v1 = c("chr11:69464719-69502928", "chr7:55075808-55093954", "chr3:169389459-169490555", "chr1:150496857-150678056"), v2 = c("ccnd1", "egfr", "terc", "ensa"), v3 = c("oraov1", na, "arpm1", "mcl1" ), v4 = c(na, na, na, "adamtsl4"), v5 = c(na, na, na, "golph3l" ), v6 = c(na, na, na, "hormad1"), v7 = c(na, na, na, "mir4257" )), .names = c("v1", "v2", "v3", "v4", "v5", "v6", "v7"), class = "data.frame", row.names = c(na, -4l))
Comments
Post a Comment