ruby - sanitizing content from open(url).read -
ruby - sanitizing content from open(url).read -
i using ruby open url , read content. content type of file reading 'text/plain'.
the issue contains characters want escape. example, 1 of characters coming in plain text "\240" ascii hyphen.
i curious how beingness generated, because don't see hyphen anywhere in text. yet exists invisibly , "\240" shows when utilize puts
print text in console.
second of all, how escape such instances of weird characters? ideally, want escape characters of form "\[some number]". using
"\240".gsub(regexp.new("\\\d+"),"")
but doesn't seem work.
are there more traditional ways of sanitizing plain text content read opening url?
after having play this, found next regular look trick me:
str.gsub(/[^\x00-\x7f]/,'')
ruby text sanitization
Comments
Post a Comment