html - Python Find & Replace Beautiful Soup -



html - Python Find & Replace Beautiful Soup -

i using beautiful soup replace occurrences of pattern href link within html file

i facing problem described below

modified_contents = re.sub("([^http://*/s]app[a-z]{2}[0-9]{2})", "<a href=\"http://stack.com=\\1\">\\1</a>", str(soup))

sample input 1:

input file contains appdd34 output file contains <a href="http://stack.com=appdd34"> appdd34</a>

sample input 2:

input file contains <a href="http://stack.com=appdd34"> appdd34</a> output file contains <a href="http://stack.com=<a href="http://stack.com=appdd34"> appdd34</a>"> <a href="http://stack.com=appdd34"> appdd34</a></a>

desired output file 2 same sample input file 2.

how can rectify problem?

this may not exclusively reply problem because don't know entire input file like, hope direction can take.

from beautifulsoup import beautifulsoup, tag text = """appdd34""" soup = beautifulsoup(text) var1 = soup.text text = """&lt;a href="http://stack.com=appdd34"&gt; appdd34&lt;/a&gt;""" soup = beautifulsoup(text) var2 = soup.find('a').text soup = beautifulsoup("&lt;p>some new html&lt;/p&gt;") tag1 = tag(soup, "a",{'href':'http://stack.com='+var1,}) tag1.insert(0,var1) # insert text tag2 = tag(soup, "a",{'href':'http://stack.com='+var2,}) tag2.insert(0,var2) soup.insert(0,tag1) soup.insert(3,tag2) print soup.prettify()

so basically, utilize beautifulsoup extract text , can build tags there.

python html find beautifulsoup

Comments

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -