python - Scrapy time out when doing braod search -
i trying music file types out of copyright.gov website last 3 years using scrapy keep getting error:
user timeout caused connection failure: getting http://cocatalog.loc.gov/cgi-bin/pwebrecon.cgi?pid=jadxim18gk9yx6t-bsyc9oabskwhr&seq=20150331032850&cnt=25&hist=1&search_arg=pau003%3f&search_code=ft%2a took longer 180 seconds.. i know restriction on site (even doing manual search causes website time out. here spider:
from scrapy.spider import basespider scrapy.selector import htmlxpathselector datetime import datetime scrapy.http import formrequest, request scrapy.utils.response import open_in_browser class copyrightspider(basespider): name = "copyright_records" start_urls = ["http://cocatalog.loc.gov/cgi-bin/pwebrecon.cgi?db=local&page=first"] def parse(self, response): yield formrequest.from_response(response, formname='querybox', formdata={'search_arg': 'music?', 'search_code': 'ft*'}, cookies={'s_sess':'%20s_cc%3dtrue%3b%20s_sq%3d%3b', 's_vi':'[cs]v1|2a8cd884851d46db-400019054027b53d[ce]'}, callback=self.parse1) def parse1(self, response): open_in_browser(response) is there way around time out problem?
set download_timeout in settings.py, default value 180 seconds
download_timeout = 360 // 6 minutes.
Comments
Post a Comment