perl - How do I extract Amazon reviews from HTML? -
i have been trying write perl script scrap amazon , download product reviews have been unable so. have been using perl modules lwp::simple , html::treebuilder::xpath achieve this.
for html
<div id="revdata-dpreviewsmosthelpfulaui-r1gqhd9gmgbdxp" class="a-row a-spacing-small"> <span class="a-size-mini a-color-state a-text-bold"> verified purchase </span> <div class="a-section"> bought replace earlier model got lost in transit when moved. real handy helper have when making tortillas. follow recipe flour tortillas in little recipe book comes it. make few changes </div> </div> </div> </div>
i wanted extract product review. wrote:-
use lwp::simple; #use html::treebuilder; use html::treebuilder::xpath; # take asin command line. $asin = shift @argv or die "usage: perl get_reviews.pl <asin>\n"; # assemble url passed asin. $url = "http://amazon.com/o/tg/detail/-/$asin/?vi=customer-reviews"; # set unescape-html rules. quicker uri::escape. %unescape = ('"'=>'"', '&'=>'&', ' '=>' '); $unescape_re = join '|' => keys %unescape; # request url. $content = get($url); die "could not retrieve $url" unless $content; $tree = html::treebuilder::xpath->new_from_content( $content); @data = $tree->findvalues('div[@class ="a-section"]'); foreach (@data) { print "$_\n"; }
but not getting output. can please point out mistake?
i think xpath should '//div[@class ="a-section"]'
(extra // @ beginning of expression find div
anywhere in html)
Comments
Post a Comment