html - Python Find & Replace Beautiful Soup -

February 15, 2013

i using beautiful soup replace occurrences of pattern href link within html file

i facing problem described below

modified_contents = re.sub("([^http://*/s]app[a-z]{2}[0-9]{2})", "<a href=\"http://stack.com=\\1\">\\1</a>", str(soup))

sample input 1:

input file contains appdd34 output file contains <a href="http://stack.com=appdd34"> appdd34</a>

sample input 2:

input file contains <a href="http://stack.com=appdd34"> appdd34</a>  output file contains <a href="http://stack.com=<a href="http://stack.com=appdd34"> appdd34</a>"> <a href="http://stack.com=appdd34"> appdd34</a></a>

desired output file 2 same sample input file 2.

how can rectify problem?

this may not exclusively reply problem because don't know entire input file like, hope direction can take.

from beautifulsoup import beautifulsoup, tag text = """appdd34""" soup = beautifulsoup(text) var1 = soup.text text = """&lt;a href="http://stack.com=appdd34"&gt; appdd34&lt;/a&gt;""" soup = beautifulsoup(text) var2 = soup.find('a').text  soup = beautifulsoup("&lt;p>some new html&lt;/p&gt;") tag1 = tag(soup, "a",{'href':'http://stack.com='+var1,}) tag1.insert(0,var1) # insert text tag2 = tag(soup, "a",{'href':'http://stack.com='+var2,}) tag2.insert(0,var2) soup.insert(0,tag1) soup.insert(3,tag2) print soup.prettify()

so basically, utilize beautifulsoup extract text , can build tags there.

python html find beautifulsoup

Search This Blog

JC

html - Python Find & Replace Beautiful Soup -

Comments

Post a Comment

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -