r - How do I apply a function to row subsets of a data.table where each call returns a data.table -


here's data.table

dt <- data.table(group = c("a","a","a","b","b","b"), x = c(1,3,5,1,3,5), y= c(3,5,8,2,8,9)) dt    group x y 1:     1 3 2:     3 5 3:     5 8 4:     b 1 2 5:     b 3 8 6:     b 5 9 

and here's function operates on data.table , returns data.table

myfunc <- function(dt){   # hyman spline interpolation (which preserves monotonicity)    newdt <- data.table(x = seq(min(dt$x), max(dt$x)))   newdt$y <- spline(x = dt$x, y = dt$y, xout = newdt$x, method = "hyman")$y   return(newdt) } 

how apply myfunc each subset of dt defined "group" column? in other words, want efficient, generalized way this

result <- rbind(myfunc(dt[group=="a"]), myfunc(dt[group=="b"])) result     x     y  1: 1 3.000  2: 2 3.875  3: 3 5.000  4: 4 6.375  5: 5 8.000  6: 1 2.000  7: 2 5.688  8: 3 8.000  9: 4 8.875 10: 5 9.000 

edit: i've updated sample dataset , myfunc because think simplistic , invited work-arounds actual problem i'm trying solve.

the whole idea of data.table being both memory efficient , fast. never use $ within data.table scope (only in rare situations) , don't create data.table objects within data.tables environment (currently, .sd has overhead).

in case can take advantage of data.table's non-standard evaluation capabilities , define function follows

myfunc <- function(x, y){    temp = seq(min(x), max(x))    y = spline(x = x, y = y, xout = temp, method = "hyman")$y    list(x = temp, y = y) } 

then implementation within dt scope straight forward

dt[, myfunc(x, y), = group] #     group x      y #  1:     1 3.0000 #  2:     2 3.8750 #  3:     3 5.0000 #  4:     4 6.3750 #  5:     5 8.0000 #  6:     b 1 2.0000 #  7:     b 2 5.6875 #  8:     b 3 8.0000 #  9:     b 4 8.8750 # 10:     b 5 9.0000 

Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -