apache pig - store files based on date column -
please me.. have scenario below - input file..
id name time-stamp 1234 kiran 18-mar-2015 01:02:31 1234 kiran 18-mar-2015 01:02:31 1234 kiran 19-mar-2015 01:02:31 1234 kiran 18-mar-2015 11:02:31 1234 kiran 20-mar-2015 01:02:00 1234 kiran 11-mar-2015 21:12:31 1234 kiran 18-mar-2015 01:02:31 1234 kiran 30-mar-2015 01:02:31 1234 kiran 22-mar-2015 01:11:00 1234 kiran 30-mar-2015 01:02:31 1234 kiran 19-mar-2015 01:02:00
now need write output files based on dates in time-stamp column output be:
user/username/date/part-m-000000
-- date variable folder name should
user/username/18-mar-2015/part-m-000000
above file contains value on single date
1234 kiran 18-mar-2015 01:02:31 1234 kiran 18-mar-2015 01:02:31 1234 kiran 18-mar-2015 11:02:31 1234 kiran 18-mar-2015 01:02:31
another folder name should
user/username/19-mar-2015/part-m-000000
above file contains value on single date
1234 kiran 19-mar-2015 01:02:31 1234 kiran 19-mar-2015 01:02:00
another folder name should
user/username/20-mar-2015/part-m-000000
above file contains value on singe date
1234 kiran 20-mar-2015 01:02:00
another folder name should
user/username/22-mar-2015/part-m-000000
above file contains value on singe date
1234 kiran 22-mar-2015 01:11:00
another folder name should
user/username/30-mar-2015/part-m-000000
above file contains value on singe date
1234 kiran 30-mar-2015 01:02:31 1234 kiran 30-mar-2015 01:02:31
please me
thank you.. sree
below steps should -
- use date functions convert time-stamp required format.
- group date
- flatten group
- save result of #3 using org.apache.pig.piggybank.storage.multistorage.
Comments
Post a Comment