sas - Variable check and summary out -
problem/question
i'm trying simple check on on list of variables in data set (revenue, costs, profits, , vcosts) grabs largest , second largest each variable, checks if total greater 90% of sum of variable, , if so, flags variable. want check largest variable not larger 60% of total sum.
i got bit of macro outputs table testing results of sas tablemacro outputs table testing results of sas table i'm trying answer more basic question. doesn't seem hard, can't figure out how setup basic table @ end.
i know variable names.
here sample dataset i've created : https://www.dropbox.com/s/x575w5d551uu47p/dataset%20%281%29.csv?dl=0
desired output
i turn basic table :
into table :
reproducible example
/* create dummy data 3 variables assess */ data have; firm = 1 3; revenue = rand("uniform"); costs = rand("uniform"); profits = rand("uniform"); vcost = rand("uniform"); output; end; run;
based on comment on previous answer. looks top_2_total sum of 2 maximum total values. purpose need code step. i'm using proc transpose , datastep acheieved in previous answer. have coded proc summary top 2 maximum total values , reusing dataset create final answer. let me know if helps.
data have; firm = 1 3; revenue = rand("uniform"); costs = rand("uniform"); profits = rand("uniform"); vcost = rand("uniform"); output; end; run; proc transpose data=have out=want prefix=top_; var revenue--vcost; run; data want; set want end=eof; array top(*) top_3-top_1; call sortn(of top[*]); total=sum(of top[*]); run; /* getting maximum 2 total values using proc summary*/ proc summary data=want nway; output out=total_top_2_rec(drop=_:) idgroup(max(total) out[2](total)=); run; data want; /* loop values previous step , generate top_2_total variable */ if _n_=1 set total_top_2_rec; top_2_total=sum(total_1,total_2); set want; if sum(top_1,top_2) > 0.9 * top_2_total flag90=1; else flag90=0; if top_1 > top_2_total * 0.6 flag60=1; else flag60=0; drop total_1 total_2; run; proc print data=want;run;
edit : have added logic before proc transpose can add variables consider calculation , rest done code. no manual changes required done code executor after that. variables should entered space delimited list.
data have; infile 'c:\dataset (1).csv' missover dsd dlm=',' firstobs=2; input firm v1 v2 v3; run; /* add/remove columns here consider variable */ %let variable_to_consider=v1 v2 v3 ; %let variable_to_consider=%cmpres(&variable_to_consider); proc sql noprint; select count(*) : obs_count have; quit; %let obs_count=&obs_count; proc transpose data=have out=want prefix=top_; var &variable_to_consider; run; data want; set want end=eof; array top(*) top_&obs_count.-top_1; x=dim(top); call sortn(of top[*]); total=sum(of top[*]); keep total top_1 top_2 _name_; run; /* getting maximum 2 total values using proc summary*/ proc summary data=want nway; output out=total_top_2_rec(drop=_:) idgroup(max(total) out[2](total)=); run; data want; /* loop values previous step , generate top_2_total variable */ if _n_=1 set total_top_2_rec; top_2_total=sum(total_1,total_2); set want; if sum(top_1,top_2) > 0.9 * top_2_total flag90=1; else flag90=0; if top_1 > top_2_total * 0.6 flag60=1; else flag60=0; drop total_1 total_2; run; proc print data=want;run;
edit 2014-04-05 : discussed, have updated logic , fixed issues. below updated code.
data have1; firm = 1 3; revenue = rand("uniform"); costs = rand("uniform"); profits = rand("uniform"); vcost = rand("uniform"); output; end; run; data have2; infile 'dataset (1).csv' missover dsd dlm=',' firstobs=2; input firm v1 v2 v3; run; /* add/remove columns here consider variable */ %macro mymacro(input_dataset= ,output_dataset=, variable_to_consider=); %let variable_to_consider=%cmpres(&variable_to_consider); proc sql noprint; select count(*) : obs_count &input_dataset; quit; %let obs_count=&obs_count; proc transpose data=&input_dataset out=&output_dataset prefix=top_; var &variable_to_consider; run; data &output_dataset; set &output_dataset end=eof; array top(*) top_&obs_count.-top_1; x=dim(top); call sortn(of top[*]); total=sum(of top[*]); top_2_total=sum(top_1, top_2); if sum(top_1,top_2) > 0.9 * total flag90=1; else flag90=0; if top_1 > total * 0.6 flag60=1; else flag60=0; keep total top_1 top_2 _name_ top_2_total total flag60 flag90; run; %mend mymacro; %mymacro(input_dataset=have1, output_dataset=want1 ,variable_to_consider=revenue costs profits vcost) %mymacro(input_dataset=have2, output_dataset=want2 ,variable_to_consider=v1 v2 v3 ) proc print data=want1;run; proc print data=want2;run;
Comments
Post a Comment