hadoop - HDFS has file but java.io.FileNotFoundException happens -
i running mapreduce program on hadoop.
the inputformat passes each file path mapper.
i can check file through cmd this,
$ hadoop fs -ls hdfs://slave1.kdars.com:8020/user/hadoop/num_5/13.pdf
found 1 items -rwxrwxrwx 3 hdfs hdfs 184269 2015-03-31 22:50 hdfs://slave1.kdars.com:8020/user/hadoop/num_5/13.pdf
however when try open file mapper side, not working.
15/04/01 06:13:04 info mapreduce.job: task id : attempt_1427882384950_0025_m_000002_2, status : failed error: java.io.filenotfoundexception: hdfs:/slave1.kdars.com:8020/user/hadoop/num_5/13.pdf (no such file or directory)
at java.io.fileinputstream.open(native method) @ java.io.fileinputstream.<init>(fileinputstream.java:146) @ java.io.fileinputstream.<init>(fileinputstream.java:101) @ org.apache.pdfbox.pdmodel.pddocument.load(pddocument.java:1111)
i checked inputformat work fine , mapper have got right file path. mapper code this,
@override public void map(text title, text file, context context) throws ioexception, interruptedexception { long time = system.currenttimemillis(); simpledateformat daytime = new simpledateformat("yyyy-mm-dd hh:mm:ss"); string str = daytime.format(new date(time)); file temp = new file(file.tostring()); if(temp.exists()){ dbmanager.getinstance().insertsql("insert `plagiarismdb`.`workflow` (`type`) value ('"+temp+" exists')"); }else{ dbmanager.getinstance().insertsql("insert `plagiarismdb`.`workflow` (`type`) value ('"+temp+" not exists')"); } }
help me please.
try in mapper method. first import these. import org.apache.hadoop.fs.filesystem; import org.apache.hadoop.fs.path;
use in mapper method.
filesystem fs = filesystem.get(new configuration()); path path= new path(value.tostring()); system.out.println(path); if(fs.exists(path)) context.write(value, one); else context.write(value, zero);
Comments
Post a Comment