algorithm - Single pass EDI parsing without an XML schema - Possible? -


before sigh , hold head in hands, please understand i'm working pretty old system on rather tight timeline.

we have single pass edi parser written in business language. currently, data definitions including loop level, area, , name of each segment stored in database table. table assigns each segment within area incremental sequence number. e.g., 004010 810 header area:

  • segment sequence
  • big 5
  • nte 10
  • cur 15
  • ref 20
  • ynq 25
  • per 30
  • n1 (start of loop) 35
  • n2 40

etc. etc.

so, if read segments in order appear in standard, can each 1 can assigned sequential number, "depth" (how many loops "down" appears) , name (2-3 characters).

the algorithm followed parser @ present follows:

reset currentarea 1 each segment in document {    search segment's name in table restricting area >= currentarea.    if not found, have error.    else    {       if area changed       {          empty temporary "search bounds" table.  create single record upper bound equal max(sequence in current area) , lower bound equal min(sequence in current area).       }       if area did not change       {           search next segment matching, within bounds of last "bounds" record created.           if segment found , loop level changed result           {              create new bounds record lower bound = min(sequence in current loop) , upper bound = max(sequence in current loop).           }           if segment not found within searched bounds           {              "pop" bounds record out of table widen search, repeat recursively until segment having same name found.           }       }    } } 

unfortunately, i'm not sure have time or means implement xml based solution using actual document schema. researching several such parsers, , seem able magically arrange edi according schema, no matter how looks.

the problem i'm facing this:

in 945 document, detail area looks (excerpt):

<detail>    <lx>     <man>    <pal>    <n9>       <w12 (loop header)>       <g69>       ...       <miscellaneous other segments>       ...       <ls>          <lx (loop header)>          ...          <miscellaneous other segments in lx loop>          ...       <le>       ... </detail> 

in raw data, have:

lx*1~ man*gm*0000803225000421444452~ n9*2i*12150-1~ w12*cc*2*2*0*ea*101199007289*vn*10007~ n9*li*1~ lx*5~ man*gm*0000803225000421444453~ n9*2i*12150-2~ ... (other segments)

based on algorithm above, when second lx segment hit, there "loop bounding" record first segment in w12 (w12) last possible segment within w12 (fa.fa2). thus, when performing search on document's standard table, next lx found in definition lx opens its' own loop within w12. wrong - detail area resetting here, , lx first segment in area, not start of w12.lx loop. due naive nature of parser, cannot distinguish since bottom search on standards table based on loops.

changing parser @ start of area (top down) rather current model creates opposite problem. if trading partner intended open inner w12.lx loop, parser interpret start of new detail area.

is solving case possible single pass parser that's using standards defined in table i've described? finding way hack xml solution our rather old system approach here? since edi not have "end tags", way can sure loop on "looking ahead" in document scenarios impossible, man segment appearing after inner w12.lx (since detail area must reset man segment used again).

i'm @ end of rope, , ideas welcome.

yes, can done in single pass parser (i did that).
indicate, correct loop level should indicated in table.
tric keep track of in table.
read new segment, in table.
start lookup last segment in table read.
lookup in table complicated:
1. new record can in same loop-level
2. can repeat of same loop
3. loop might have ended , need further in message
if not found, give error.
if found, go to new segment in incoming file etc.

afaik absolutely necessary have in table if segments/loops mandatory or conditional - if want general tool can parse message/transaction types.

problem run ls/le loop. 2nd lx loop embedded in ls/le segments. ls/le invented solve 'collision' problem. if in ls loop, should terminated le segment.


Comments

Popular posts from this blog

tcpdump - How to check if server received packet (acknowledged) -