php - Convert spaces between PRE tags, via DOM parser -
php - Convert spaces between PRE tags, via DOM parser -
regex original thought solution, although became apparent dom parser more appropriate... i'd convert spaces
between pre tags within string of html text. example:
<table atrr="zxzx"><tr> <td>adfa adfadfaf></td><td><br /> dfa dfa</td> </tr></table> <pre class="abc" id="abc"> abc 123 <span class="abc">abc 123</span> </pre> <pre>123 123</pre>
into (note space in span tag attribute preserved):
<table atrr="zxzx"><tr> <td>adfa adfadfaf></td><td><br /> dfa dfa</td> </tr></table> <pre class="abc" id="abc"> abc 123 <span class="abc">abc 123</span> </pre> <pre>123 123</pre>
the result needs serialised string format, utilize elsewhere.
this tricky when want insert
entities without dom converting ampersand &
entities because entities nodes , spaces character data. here how it:
$dom = new domdocument; $dom->loadhtml($html); $xp = new domxpath($dom); foreach ($xp->query('//text()[ancestor::pre]') $textnode) { $remaining = $textnode; while (($nextspace = strpos($remaining->wholetext, ' ')) !== false) { $remaining = $remaining->splittext($nextspace); $remaining->nodevalue = substr($remaining->nodevalue, 1); $remaining->parentnode->insertbefore( $dom->createentityreference('nbsp'), $remaining ); } }
fetching pre elements , working nodevalues doesnt work here because nodevalue attribute contain combined domtext values of children, e.g. include nodevalue of span childs. setting nodevalue on pre element delete those.
so instead of fetching pre nodes, fetch domtext nodes have pre element parent somewhere on axis:
domelement pre domtext "abc 123" <-- picking domelement span domtext "abc 123" <-- , 1 domelement domtext "123 123" <-- , 1
we go through each of domtext nodes , split them separate domtext nodes @ each space. remove space , insert nbsp entity node before split node, in end tree
domelement pre domtext "abc" domentity nbsp domtext "123" domelement span domtext "abc" domentity nbsp domtext "123" domelement domtext "123" domentity nbsp domtext "123"
because worked domtext nodes, domelements left untouched , preserve span elements within pre element.
caveat:
your snippet not valid because doesnt have root element. when using loadhtml, libxml add together missing construction dom, means snippet including doctype, html , body tag back.
if want original snippet back, you'd have getelementsbytagname
body node , fetch children innerhtml
. unfortunately, there no innerhtml function or property in php's dom implementation, have manually:
$innerhtml = ''; foreach ($dom->getelementsbytagname('body')->item(0)->childnodes $child) { $tmp_doc = new domdocument(); $tmp_doc->appendchild($tmp_doc->importnode($child,true)); $innerhtml .= $tmp_doc->savehtml(); } echo $innerhtml;
also see
innerhtml in php's domdocument? noob question domdocument in php http://stackoverflow.com/search?q=user%3a208809+dom php html dom html-parsing
Comments
Post a Comment