arrays - Count of column elements using linux/bash -
i have many tab delimited files have data this
header1 header2....... headern cat bat mat pat hat rat rat [not applicable] [not available] [not applicable]
i need count of number of valid rows every header. invalid entries [not available], [not applicable] etc. trying header elements in array.everything fine till here. however, finding difficulties in getting count every header. using array store row values header. problem array storing [not different element , available] other element. also, first header, 'cat bat' should 1 entry, array storing 2 entries well.
let's start tab-separated file:
$ cat file header1 header2 cat bat mat pat hat rat rat [not applicable] [not available] [not applicable]
for each column, following counts entries not start [not a
:
$ awk -f'\t' 'nr==1{for (i=1;i<=nf;i++)h[i]=$i;next} {for (i=1;i<=nf;i++)c[i]+=($i !~ /[[]not a/)} end{for (i=1;i<=nf;i++)print h[i],c[i]}' file header1 3 header2 2
how works
-f'\t'
this sets field separator tab.
nr==1{for (i=1;i<=nf;i++)h[i]=$i;next}
for first row, saves headers in array
h
, skips rest of commands , jumpsnext
line.{for (i=1;i<=nf;i++)c[i]+=($i !~ /[[]not a/)}
for lines after first, goes through each column , increments
c[i]
if value of columni
not start[not a
.end{for (i=1;i<=nf;i++)print h[i],c[i]}
after last line has been read, prints out results.
update
suppose that, in addition [not applicable]
, [not available]
, want ignore [unavailable]
(note: lower case). in case, make slight change regex:
awk -f'\t' 'nr==1{for (i=1;i<=nf;i++)h[i]=$i;next} {for (i=1;i<=nf;i++)c[i]+=($i !~ /[[](not a|unavailable)/)} end{for (i=1;i<=nf;i++)print h[i],c[i]}' file
Comments
Post a Comment