haskell - How to parse into records? -


i have asked question before, learned from. discovered implementation resulted in list of strings rather list of records. file parsing has records looks like;

sp|p30375|1a01_gorgo class histocompatibility antigen gogo-a*0101 alpha chain os=gorilla gorilla gorilla pe=2 sv=1 mavmaprtlvlllsgalaltqtwagshsmryfstsvsrpgrgeprfiavgyvddtqfvrf dsdaasqrmeprapwieqegpeywdrntrnvkahsqtdrvdlgtlrgyynqsedgshtiq rmygcdvgsdgrflrgyqqdaydgkdyialnedlrswtaadmaaeitkrkweaahfaeql raylegtcvewlrrhlengketlqrtdapkthmthhavsdheailrcwalsfypaeitlt wqrdgedqtqdtelvetrpagdgtfqkwaavvvpsgqeqrytchvqheglpepltlrwep ssqptipivgiiaglvlfgaviagavvaavrwrrkssdrkggsysqaassdsaqgsdvsl tackv sp|p30443|1a01_human hla class histocompatibility antigen a-1 alpha chain os=homo sapiens gn=hla-a pe=1 sv=1 mavmaprtlllllsgalaltqtwagshsmryfftsvsrpgrgeprfiavgyvddtqfvrf dsdaasqkmeprapwieqegpeywdqetrnmkahsqtdranlgtlrgyynqsedgshtiq imygcdvgpdgrflrgyrqdaydgkdyialnedlrswtaadmaaqitkrkweavhaaeqr rvylegrcvdglrrylengketlqrtdppkthmthhpisdheatlrcwalgfypaeitlt wqrdgedqtqdtelvetrpagdgtfqkwaavvvpsgeeqrytchvqheglpkpltlrwel ssqptipivgiiaglvllgavitgavvaavmwrrkssdrkggsytqaassdsaqgsdvsl tackv

just before sp there ">", planned use records divide point. so, how can end with:

[[>sp|p30375|1a01_gorgo class histocompatibility antigen gogo-a*0101 alpha chain os=gorilla gorilla gorilla pe=2 sv=1 mavmaprtlvlllsgalaltqtwagshsmryfstsvsrpgrgeprfiavgyvddtqfvrf dsdaasqrmeprapwieqegpeywdrntrnvkahsqtdrvdlgtlrgyynqsedgshtiq rmygcdvgsdgrflrgyqqdaydgkdyialnedlrswtaadmaaeitkrkweaahfaeql raylegtcvewlrrhlengketlqrtdapkthmthhavsdheailrcwalsfypaeitlt wqrdgedqtqdtelvetrpagdgtfqkwaavvvpsgqeqrytchvqheglpepltlrwep ssqptipivgiiaglvlfgaviagavvaavrwrrkssdrkggsysqaassdsaqgsdvsl tackv] [>sp|p30443|1a01_human hla class histocompatibility antigen a-1 alpha chain os=homo sapiens gn=hla-a pe=1 sv=1 mavmaprtlllllsgalaltqtwagshsmryfftsvsrpgrgeprfiavgyvddtqfvrf dsdaasqkmeprapwieqegpeywdqetrnmkahsqtdranlgtlrgyynqsedgshtiq imygcdvgpdgrflrgyrqdaydgkdyialnedlrswtaadmaaqitkrkweavhaaeqr rvylegrcvdglrrylengketlqrtdppkthmthhpisdheatlrcwalgfypaeitlt wqrdgedqtqdtelvetrpagdgtfqkwaavvvpsgeeqrytchvqheglpkpltlrwel ssqptipivgiiaglvllgavitgavvaavmwrrkssdrkggsytqaassdsaqgsdvsl tackv]] 

using parsec? code started out with; how parse uniprot-file parsec?

as far understand problem need parse records separated '>'. records string containing characters '>' , looking this:

import control.applicative ((*>)) import text.parsec  import text.parsec.bytestring  (parser,parsefromfile)  type record = string   parserfile :: filepath -> io [record] parserfile filename =       r <- parsefromfile parserecords filename       case r of         left  msg  -> error . show $ msg         right xs -> return xs   parserecords :: parser [record] parserecords = many1 $ (char '>')  *> (many1 $ noneof ['>']) 

the "parsefromfile" function read data using efficient binary representation , takes argument parser analyze stream of bytestrings resulting reading file.

now, records begin '>' symbol, therefore need parser match '>' symbol @ beggining , store rest of symbols in list until next '>' symbol.


Comments

Popular posts from this blog

javascript - AngularJS custom datepicker directive -

javascript - jQuery date picker - Disable dates after the selection from the first date picker -