beautifulsoup - Extracting text nested within several tags with Beautiful Soup — Python -
i want extract text "12:25 - 30 mar 2015" beautiful soup html below. how html looks after being processed bs:
<span class="u-floatleft"> · </span> <span class="u-floatleft"> <a class="profiletweet-timestamp js-permalink js-nav js-tooltip" href="/tbantl/status/582333634931126272" title="5:08 pm - 29 mar 2015"> <span class="js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1427674132"> mar 29 </span> i have code, doesn't work:
date = soup.find("a",attrs={"class":"profiletweet-timestamp js-permalink js-nav js-tooltip"})["title"]
this works me:
from bs4 import beautifulsoup html = """<span class="u-floatleft"> · </span> <span class="u-floatleft"> <a class="profiletweet-timestamp js-permalink js-nav js-tooltip" href="/indoz1/status/582443448927543296" title="12:25 - 30 mar 2015"> <span class="js-short-timestamp " data-aria-label-part="last" data-time="1427700314" data-long-form="true"> """ soup = beautifulsoup(html) date = soup.find("a", attrs={"class": "profiletweet-timestamp js-permalink js-nav js-tooltip"})["title"] >>> print(date) '12:25 - 30 mar 2015' without more information, suspect didn't transform html snippet beautifulsoup object. in case, you'd typeerror: find() takes no keyword arguments.
or, alexce points out in comments above, item looking may not present in html parsing. in case, date empty.
finally, unrelated issues you're having above - if you're going parse date datetime object, there's easier way it. grab "data-time" field <span class="js-short-timestamp " ... > , parse using datetime.datetime.fromtimestamp:
from datetime import datetime dt # "data-time" field value string named timestamp data_time = dt.fromtimestamp(int(timestamp)) >>> print(data_time) datetime.datetime(2015, 3, 30, 3, 25, 14)
Comments
Post a Comment