r - Fastest way to map a new data frame column based on two other columns -
i have data frame looks this:
id|value 01| 100 01| 101 01| 300 #edited case left out 02| 300 03| 100 03| 101 04| 100
and add new column looks @ both id , values assigned each id.
for example: if id has both value 100 , 101 add category a. if id has value of 300 add category b. if id has 1 value (either 100 or 101, not both) assign category c.
result:
id|value|category 01| 100 | 01| 101 | 01| 300 | b #edited case left out 02| 300 | b 03| 100 | 03| 101 | 04| 100 | c
i understand can loop through , assign category, question whether there faster vectorized way?
a couple of options data.table
we number of elements per 'id' '100', '101' , add them together. output 0, 1, or 2 corresponding none, single element, or both present. can converted factor
, change labels
'a' '2', 'b' '0' , 'c' '1'.
library(data.table) setdt(df2)[, indx:=sum(unique(value)==100)+sum(unique(value)==101), id][, category:=factor(indx, levels=c(2,0,1), labels=letters[1:3]) ][, indx:=null][] # id value category #1: 1 100 #2: 1 101 #3: 2 300 b #4: 3 100 #5: 3 101 #6: 4 100 c
or create named vector ('v1') , use index map character elements (tostring(...)
) grouped 'id'.
v1 <- c('100, 101' = 'a', '300'='b', '100'= 'c', '101'='c') setdt(df2)[, category := v1[tostring(sort(unique(value)))], by=id][] # id value category #1: 1 100 #2: 1 101 #3: 2 300 b #4: 3 100 #5: 3 101 #6: 4 100 c
update
based on new dataset , new condition, can modify first solution
setdt(df3)[, indx:= sum(unique(value)==100) + sum(unique(value)==101), id][, category:= factor(indx, levels=c(2,0,1), labels=letters[1:3])][ value==300, category:='b'][, indx:=null][] # id value category #1: 1 100 #2: 1 101 #3: 1 300 b #4: 2 300 b #5: 3 100 #6: 3 101 #7: 4 100 c
or using second option
v1 <- c('100, 101' = 'a', '100, 101, 300' = 'a', '300'='b', '100'= 'c', '101'='c') setdt(df3)[, category := v1[tostring(sort(unique(value)))], by=id][value==300, category := 'b'][] # id value category #1: 1 100 #2: 1 101 #3: 1 300 b #4: 2 300 b #5: 3 100 #6: 3 101 #7: 4 100 c
data
df2 <- structure(list(id = c(1l, 1l, 2l, 3l, 3l, 4l), value = c(100l, 101l, 300l, 100l, 101l, 100l)), .names = c("id", "value"), row.names = c(na, -6l), class = "data.frame") df3 <- structure(list(id = c(1l, 1l, 1l, 2l, 3l, 3l, 4l), value = c(100l, 101l, 300l, 300l, 100l, 101l, 100l)), .names = c("id", "value"), class = "data.frame", row.names = c(na, -7l))
Comments
Post a Comment