haskell - What are the space complexities of inits and tails? -

March 15, 2011

tl; dr

after reading passage persistence in okasaki's purely functional data structures , going on illustrative examples singly linked lists (which how haskell's lists implemented), left wondering space complexities of data.list's inits , tails...

it seems me that

the space complexity of tails linear in length of argument, and
the space complexity of inits quadratic in length of argument,

but simple benchmark indicates otherwise.

rationale

with tails, original list can shared. computing tails xs consists in walking along list xs , creating new pointer each element of list; no need recreate part of xs in memory.

in contrast, because each element of inits xs "ends in different way", there can no such sharing, , possible prefixes of xs must recreated scratch in memory.

benchmark

the simple benchmark below shows there isn't of difference in memory allocation between 2 functions:

-- main.hs  import data.list (inits, tails)  main =     let intrange = [1 .. 10 ^ 4] :: [int]     print $ sum intrange     print $ finits intrange     print $ ftails intrange  finits :: [int] -> int finits = sum . map sum . inits  ftails :: [int] -> int ftails = sum . map sum . tails

after compiling main.hs file with

ghc -prof -fprof-auto -o2 -rtsopts main.hs

and running

./main +rts -p

the main.prof file reports following:

cost centre module  %time %alloc  finits      main     60.1   64.9 ftails      main     39.9   35.0

the memory allocated finits , allocated ftails have same order of magnitude... hum...

what going on?

are conclusions space complexities of tails (linear) , inits (quadratic) correct?
if so, why ghc allocate memory finits , ftails? list fusion have this?
or benchmark flawed?

the implementation of inits in haskell report, identical or identical implementations used base 4.7.0.1 (ghc 7.8.3) horribly slow. in particular, fmap applications stack recursively, forcing successive elements of result gets slower , slower.

inits [1,2,3,4] = [] : fmap (1:) (inits [2,3,4])  = [] : fmap (1:) ([] : fmap (2:) (inits [3,4]))  = [] : [1] : fmap (1:) (fmap (2:)) ([] : fmap (3:) (inits [4])) ....

the simplest asymptotically optimal implementation, explored bertram felgenhauer, based on applying take successively larger arguments:

inits xs = [] : go (1 :: int) xs   go !l (_:ls) = take l xs : go (l+1) ls   go _  []     = []

felgenhauer able eke performance out of using private, non-fusing version of take, still not fast be.

the following simple implementation faster in cases:

inits = map reverse . scanl (flip (:)) []

in weird corner cases (like map head . inits), simple implementation asymptotically non-optimal. therefore wrote version using same technique, based on chris okasaki's banker's queues, both asymptotically optimal , fast. joachim breitner optimized further, using strict scanl' rather usual scanl, , implementation got ghc 7.8.4. inits can produce spine of result in o(n) time; forcing entire result requires o(n^2) time because none of conses can shared among different initial segments. if want absurdly fast inits , tails, best bet use data.sequence; louis wasserman's implementation is magical. possibility use data.vector—it presumably uses slicing such things.

Search This Blog

Plus Code