perl - How do I extract Amazon reviews from HTML? -


i have been trying write perl script scrap amazon , download product reviews have been unable so. have been using perl modules lwp::simple , html::treebuilder::xpath achieve this.

for html

<div id="revdata-dpreviewsmosthelpfulaui-r1gqhd9gmgbdxp" class="a-row a-spacing-small">   <span class="a-size-mini a-color-state a-text-bold">     verified purchase   </span>   <div class="a-section">     bought replace earlier model got lost in transit when moved. real handy helper have when making tortillas. follow recipe flour tortillas in little recipe book comes it. make few changes    </div> </div>  </div> </div> 

i wanted extract product review. wrote:-

use lwp::simple;  #use html::treebuilder; use html::treebuilder::xpath;  # take asin command line. $asin = shift @argv or die "usage: perl get_reviews.pl <asin>\n";  # assemble url passed asin. $url = "http://amazon.com/o/tg/detail/-/$asin/?vi=customer-reviews";  # set unescape-html rules. quicker uri::escape. %unescape = ('&quot;'=>'"', '&amp;'=>'&', '&nbsp;'=>' '); $unescape_re = join '|' => keys %unescape;  # request url. $content = get($url); die "could not retrieve $url" unless $content; $tree = html::treebuilder::xpath->new_from_content( $content); @data = $tree->findvalues('div[@class ="a-section"]');  foreach (@data) {     print "$_\n"; } 

but not getting output. can please point out mistake?

i think xpath should '//div[@class ="a-section"]' (extra // @ beginning of expression find div anywhere in html)


Comments

Popular posts from this blog

Payment information shows nothing in one page checkout page magento -

tcpdump - How to check if server received packet (acknowledged) -