replace - How can I swap numbers inside data block of repeating format using linux commands? -
i have huge data file, , hope swap numbers of 2nd column only, in following format file. file have 25,000,000 dataset, , 8768 lines each.
%% edited: shorter 10 line example. sorry inconvenience. typical 1 data block.
# dataset 1 # # number of lines 10 # # header lines 5 11 3 10 120 90 0 0.952 0.881 0.898 2.744 0.034 0.030 10 12 3 5 125 112 0 0.952 0.897 0.905 2.775 0.026 0.030 50 10 3 48 129 120 0 1.061 0.977 0.965 3.063 0.001 0.026 120 2 4 5 50 186 193 0 0.881 0.965 0.899 0.917 3.669 0.000 -0.005 125 3 4 10 43 186 183 0 0.897 0.945 0.910 0.883 3.641 0.000 0.003 186 5 4 120 125 249 280 0 0.899 0.910 0.931 0.961 3.727 0.000 -0.001 193 6 4 120 275 118 268 0 0.917 0.895 0.897 0.937 3.799 0.000 0.023 201 8 4 278 129 131 280 0 0.921 0.837 0.870 0.934 3.572 0.000 0.008 249 9 4 186 355 179 317 0 0.931 0.844 0.907 0.928 3.615 0.000 0.008 280 10 4 186 201 340 359 0 0.961 0.934 0.904 0.898 3.700 0.000 0.033 # # dataset 1 # # number of lines 10 ... as can see, there 7 repeating header lines in head, , 1 trailing line @ end of dataset. header , trailing lines beginning #. result, data have 7 header lines, 8768 data lines, , 1 trailing line, total 8776 lines per data block. 1 trailing line contains sinlge '#'.
i want swap numbers in 2nd columns only. first, want replace
1, 9, 10, 11 => 666 2, 6, 7, 8 => 333 3, 4, 5 => 222 of 2nd column, , then,
666 => 6 333 => 3 222 => 2 of 2nd column. hope conduct replacing repeating dataset.
i tried python, data big, makes memory error. how can perform swapping linux commands sed or awk or cat commands?
thanks
best,
this might work you, you'd have use gnu awk, it's using gensub command , $0 reassignment.
put following executable awk file ( script.awk ):
#!/usr/bin/awk -f begin { a[1] = a[9] = a[10] = a[11] = 6 a[2] = a[6] = a[7] = a[8] = 3 a[3] = a[4] = a[5] = 2 } function swap( c2, val ) { val = a[c2] return( val=="" ? c2 : val ) } /^( [0-9]+ )/ { $0 = gensub( /^( [0-9]+)( [0-9]+)/, "\\1 " swap($2), 1 ) } 47 # print line here's breakdown:
begin- set arrayamappings of new values.- create user defined function
swapprovide values 2nd columnaarray or value itself.c2element passed in, whilevalelement local variable ( becuase no 2nd argument passed in ). - when line starts space followed number , space (the pattern), use
gensubreplace first occurrance of first number pattern concatenated space , returnswap(the action). in case, i'm using gensub's replacement text preserve first column data. second column passedswapusing field data identifier of$2. usinggensubshould preserve formatting of data lines. 47- expression evaluates true provides default action of printing$0, data lines might have been modified. line wasn't "data" printed out here w/o modifications.
the provided data doesn't show cases, made own test file:
# 2 skip me 9 2 not going process me 1 1 don't change matting 2 2 4 23242.223 data 3 3 data that's formatted 4 4 7 that's formatted 5 5 data that's formatted 6 6 data that's formatted 7 7 data that's formatted 8 8 data that's formatted 9 9 data that's formatted 10 10 data that's formatted 11 11 data that's formatted 12 12 data that's formatted 13 13 data that's formatted 14 s data that's formatted # other data running executable awk (like ./script.awk data) gives following output:
# 2 skip me 9 2 not going process me 1 6 don't change matting 2 3 4 23242.223 data 3 2 data that's formatted 4 2 7 that's formatted 5 2 data that's formatted 6 3 data that's formatted 7 3 data that's formatted 8 3 data that's formatted 9 6 data that's formatted 10 6 data that's formatted 11 6 data that's formatted 12 12 data that's formatted 13 13 data that's formatted 14 s data that's formatted # other data which looks alright me, i'm not 1 25 million datasets.
you'd want try on smaller sample of data first (the first few datasets?) , redirect stdout temp file perhaps like:
head -n 26328 data | ./script.awk - > tempfile you can learn more elements used in script here:
and of course, should spend quality time reviewing awk related questions , answers on stack overflow ;)
Comments
Post a Comment