Removing duplicates from a csv file using ruby -
i have csv file following data
sno scenario result description 1 sce_1 pass pass 2 sce_2 pass pass 1 sce_1 fail failed
in scenario have 2 same serial numbers. see row has result pass
, remove rest of duplicate rows.
i have tried following still not able it!
csv.open('new.csv', 'w') |csv| csv.read('merged_files.csv').uniq!{|x| x[1]}.each |row| csv << row end end
can me in getting logic!
for purposes of illustration, i've added fourth row table:
require 'csv' arr = csv.read("x.csv") #=> [["sno", "scenario", "result", "description"], # ["1", "sce_1", "pass", "pass"], # ["2", "sce_2", "pass", "pass"], # ["1", "sec_1", "fail", "pass"], # ["3", "sec_3", "fail", "pass"]]
you can remove unwanted elements follows:
arr[1..-1].group_by(&:first).map { |_,a| (a.size > 1) ? a.reject { |e| e[2]=="fail" } : } #=> [[["1", "sce_1", "pass", "pass"]], # [["2", "sce_2", "pass", "pass"]], # [["3", "sec_3", "fail", "pass"]]]
the steps:
h = arr[1..-1].group_by(&:first) #=> {"1"=>[["1", "sce_1", "pass", "pass"], # ["1", "sec_1", "fail", "pass"]], # "2"=>[["2", "sce_2", "pass", "pass"]], # "3"=>[["3", "sec_3", "fail", "pass"]]} h.map { |_,a| (a.size > 1) ? a.reject { |e| e[2]=="fail" } : } #=> [[["1", "sce_1", "pass", "pass"]], # [["2", "sce_2", "pass", "pass"]], # [["3", "sec_3", "fail", "pass"]]]
if, given sno/scenario
there @ 1 "pass"
row, can use enumerable#flat_map instead:
a = h.flat_map { |_,a| (a.size > 1) ? a.reject { |e| e[2]=="fail" } : } #=> [["1", "sce_1", "pass", "pass"], # ["2", "sce_2", "pass", "pass"], # ["3", "sec_3", "fail", "pass"]]
if wish add header row:
a.unshift(arr.first) #=> [["sno", "scenario", "result", "description"], # ["1", "sce_1", "pass", "pass"], # ["2", "sce_2", "pass", "pass"], # ["3", "sec_3", "fail", "pass"]]
if want exclude "fail" rows, if there no correponding "pass" row (as sno == "3"
), can this:
h.flat_map { |_,a| a.reject { |e| e[2]=="fail" } } #=> [["1", "sce_1", "pass", "pass"], # ["2", "sce_2", "pass", "pass"]]
Comments
Post a Comment