hadoop - Hive Query performance tuning -


i'm newbie hadoop & hive. can please suggest if there performance tuning steps apache hive running on cloudera 5.2.1 .

what tuning parameters in order improve hive queries performance

hive version :- hive 0.13.1-cdh5.2.1

hive query :-

select distinct a1.chain_number chain_number, a1.chain_description chain_description staff.organization_hierarchy a1;

hive table created external option "stored text format" , table properties below :-

after changing below hive setting have seen 10 sec improvement

set hive.exec.parallel=true;

can please suggest other setting apart above improve hive query performance type of query using.

you can use group by replace distinct,because there 1 reduce job distinct job.

try this

 select chain_number, chain_description   staff.organization_hierarchy  group chain_number, chain_description 

if reduce job number still small.you can specific using mapred.reduct.tasks configure


Comments

Popular posts from this blog

javascript - AngularJS custom datepicker directive -

javascript - jQuery date picker - Disable dates after the selection from the first date picker -