xidel 'http://example.com' --extract '//title'
ORIGINAL:
curl -L example.com | \ tidy -asxml -numeric -utf8 | \ sed -e 's/ xmlns.*=".*"//g' | \ xml select -t -v "//title" -n
- Get the HTML content from http://example.com using curl
- Use HTML Tidy to tidy it up, covert it to XHTML, change &entities; to numeric ones, and set the encoding as UTF-8
- Use sed to remove the XML namespace, for simpler XPaths
- Use XML Starlet to select by XPath
You can output multiple columns like so:
curl -L example.com | \ tidy -asxml -numeric -utf8 | \ sed -e 's/ xmlns.*=".*"//g' | \ xml select -t -v "//title" -o ','-v "//another" -n
No comments:
Post a Comment