html - substring in python 3 given line number and offset -


i'm trying parsing html page htmlparser library in python 3. function htmlparser.getpos() return line number , offset of last tag parsed.

for example know "string" want starts in line number 10 offset 5 , ends in line number 30 offset 10 how can substring line 10 offset 5 line 30 offset 10 ?

thanks.

html = 'this holds entire html code' myparser.feed(html) #now parser works magic start = (10,5) #this returned htmlparser.getpos(), 10 line number , 5 offset of line end = (30,10) #same here #i want (i know invalid python code) substring = html.substring(start,end) #return html code string line 10 offset 5 line 30 offset 10 

better explanation:

i'm trying substring string.

i understand in python 3 it's called slice: string[a:b] if wanted substring 'jonny' form string 'hello jonny smith' this: substring = 'hello jonny smith'[6:11] problem htmlparser.getpos() returns tuple (line number, offset of line) can't do: substring = multy_line_string[line number:offset]

assuming interested in html parsing try lxml --> http://docs.python-guide.org/en/latest/scenarios/scrape/


Comments

Popular posts from this blog

Payment information shows nothing in one page checkout page magento -

tcpdump - How to check if server received packet (acknowledged) -