hadoop - Hive Query performance tuning -
i'm newbie hadoop & hive. can please suggest if there performance tuning steps apache hive running on cloudera 5.2.1 .
what tuning parameters in order improve hive queries performance
hive version :- hive 0.13.1-cdh5.2.1
hive query :-
select distinct a1.chain_number chain_number, a1.chain_description chain_description staff.organization_hierarchy a1;
hive table created external option "stored text format" , table properties below :-
after changing below hive setting have seen 10 sec improvement
set hive.exec.parallel=true;
can please suggest other setting apart above improve hive query performance type of query using.
you can use group by
replace distinct
,because there 1 reduce job distinct
job.
try this
select chain_number, chain_description staff.organization_hierarchy group chain_number, chain_description
if reduce job number still small.you can specific using mapred.reduct.tasks
configure
Comments
Post a Comment