arrays - Count of column elements using linux/bash -


i have many tab delimited files have data this

header1               header2....... headern cat bat                mat pat hat                     rat rat                  [not applicable] [not available]      [not applicable] 

i need count of number of valid rows every header. invalid entries [not available], [not applicable] etc. trying header elements in array.everything fine till here. however, finding difficulties in getting count every header. using array store row values header. problem array storing [not different element , available] other element. also, first header, 'cat bat' should 1 entry, array storing 2 entries well.

let's start tab-separated file:

$ cat file header1         header2 cat bat         mat pat hat             rat rat             [not applicable] [not available] [not applicable] 

for each column, following counts entries not start [not a:

$ awk -f'\t' 'nr==1{for (i=1;i<=nf;i++)h[i]=$i;next} {for (i=1;i<=nf;i++)c[i]+=($i !~ /[[]not a/)} end{for (i=1;i<=nf;i++)print h[i],c[i]}' file header1 3 header2 2 

how works

  • -f'\t'

    this sets field separator tab.

  • nr==1{for (i=1;i<=nf;i++)h[i]=$i;next}

    for first row, saves headers in array h , skips rest of commands , jumps next line.

  • {for (i=1;i<=nf;i++)c[i]+=($i !~ /[[]not a/)}

    for lines after first, goes through each column , increments c[i] if value of column i not start [not a.

  • end{for (i=1;i<=nf;i++)print h[i],c[i]}

    after last line has been read, prints out results.

update

suppose that, in addition [not applicable] , [not available], want ignore [unavailable] (note: lower case). in case, make slight change regex:

awk -f'\t' 'nr==1{for (i=1;i<=nf;i++)h[i]=$i;next} {for (i=1;i<=nf;i++)c[i]+=($i !~ /[[](not a|unavailable)/)} end{for (i=1;i<=nf;i++)print h[i],c[i]}' file 

Comments

Popular posts from this blog

Payment information shows nothing in one page checkout page magento -

tcpdump - How to check if server received packet (acknowledged) -