html - Python Find & Replace Beautiful Soup -
html - Python Find & Replace Beautiful Soup -
i using beautiful soup replace occurrences of pattern href link within html file
i facing problem described below
modified_contents = re.sub("([^http://*/s]app[a-z]{2}[0-9]{2})", "<a href=\"http://stack.com=\\1\">\\1</a>", str(soup)) sample input 1:
input file contains appdd34 output file contains <a href="http://stack.com=appdd34"> appdd34</a> sample input 2:
input file contains <a href="http://stack.com=appdd34"> appdd34</a> output file contains <a href="http://stack.com=<a href="http://stack.com=appdd34"> appdd34</a>"> <a href="http://stack.com=appdd34"> appdd34</a></a> desired output file 2 same sample input file 2.
how can rectify problem?
this may not exclusively reply problem because don't know entire input file like, hope direction can take.
from beautifulsoup import beautifulsoup, tag text = """appdd34""" soup = beautifulsoup(text) var1 = soup.text text = """<a href="http://stack.com=appdd34"> appdd34</a>""" soup = beautifulsoup(text) var2 = soup.find('a').text soup = beautifulsoup("<p>some new html</p>") tag1 = tag(soup, "a",{'href':'http://stack.com='+var1,}) tag1.insert(0,var1) # insert text tag2 = tag(soup, "a",{'href':'http://stack.com='+var2,}) tag2.insert(0,var2) soup.insert(0,tag1) soup.insert(3,tag2) print soup.prettify() so basically, utilize beautifulsoup extract text , can build tags there.
python html find beautifulsoup
Comments
Post a Comment