Merging columns within a dataframe for variable number of columns in r -


i have data frame below. merge columns v2 onwards (pref common between) exclude nas in merged column. there variable amounts of columns nas in each row.

v1                          v2    v3        v4     v5       v6         v7     chr11:69464719-69502928    ccnd1 oraov1     na     na       na         na      chr7:55075808-55093954     egfr    na       na     na       na         na      chr3:169389459-169490555   terc   arpm1     na     na       na         na      chr1:150496857-150678056   ensa   mcl1   adamtsl4 golph3l   hormad1  mir4257 

the result is:

v1                          v2 chr11:69464719-69502928    ccnd1,oraov1 chr7:55075808-55093954     egfr chr3:169389459-169490555   terc,arpm1 chr1:150496857-150678056   ensa,mcl1,adamtsl4,golph3l,hormad1,mir4257 

i know how concatenate fixed columns variable columns exclusion of na has thrown me.

we loop on rows using apply margin=1 (excluding 1st column), paste non-na elements (tostring wrapper paste(., collapse=', '))

v2 <- apply(df1[-1],1, function(x) tostring(x[!is.na(x)])) res <- data.frame(v1=df1[,1], v2, stringsasfactors=false) res #                       v1                                              v2 #1  chr11:69464719-69502928                                   ccnd1, oraov1 #2   chr7:55075808-55093954                                            egfr #3 chr3:169389459-169490555                                     terc, arpm1 #4 chr1:150496857-150678056 ensa, mcl1, adamtsl4, golph3l, hormad1, mir4257 

or using melt data.table, convert dataset long form, group 'v1' , paste elements of 'value' column. initially, convert 'data.frame' 'data.table' using setdt.

library(data.table) melt(setdt(df1), id.var='v1', na.rm=true)[, list(v2=tostring(value)) , v1] #                        v1                                              v2 #1:  chr11:69464719-69502928                                   ccnd1, oraov1 #2:   chr7:55075808-55093954                                            egfr #3: chr3:169389459-169490555                                     terc, arpm1 #4: chr1:150496857-150678056 ensa, mcl1, adamtsl4, golph3l, hormad1, mir4257 

data

df1 <- structure(list(v1 = c("chr11:69464719-69502928",  "chr7:55075808-55093954",  "chr3:169389459-169490555", "chr1:150496857-150678056"), v2 = c("ccnd1",  "egfr", "terc", "ensa"), v3 = c("oraov1", na, "arpm1", "mcl1" ), v4 = c(na, na, na, "adamtsl4"), v5 = c(na, na, na, "golph3l" ), v6 = c(na, na, na, "hormad1"), v7 = c(na, na, na, "mir4257" )), .names = c("v1", "v2", "v3", "v4", "v5", "v6", "v7"),  class = "data.frame", row.names = c(na, -4l)) 

Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -