Wednesday, July 27, 2016

Crush Tools convdate

Convert dates on the command line
$ echo '7-25-2015-12:12:12' | convdate --verbose
2016-07-25-12:12:12
$ echo '7/25/16' | convdate --verbose -i '%m/%d/%y' -o '%Y-%m-%d'
2016-07-25

where
i is the input format, as per http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html 
o is the output format, as per http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html

Crush Tools subtotal

Given foo.csv:
group1,1,2
group1,1,3
group1,1,4
group2,1,5
group2,1,6

We want to subtotal columns 2 and 3 (individually) by group in column1:
subtotal --key 1 --sum 2,3 --delim , -i foo.csv
group1,1,2
group1,1,3
group1,1,4
,3,9

group2,1,5
group2,1,6
,2,11

Easy XPath against HTML

UPDATE. Even easier with xidel:

xidel 'http://example.com' --extract '//title'


ORIGINAL:

curl -L example.com | \
  tidy -asxml -numeric -utf8 | \
  sed -e 's/ xmlns.*=".*"//g' | \
  xml select -t -v "//title" -n

  1. Get the HTML content from http://example.com using curl
  2. Use HTML Tidy to tidy it up, covert it to XHTML, change &entities; to numeric ones, and set the encoding as UTF-8
  3. Use sed to remove the XML namespace, for simpler XPaths
  4. Use XML Starlet to select by XPath

You can output multiple columns like so:
curl -L example.com | \
  tidy -asxml -numeric -utf8 | \
  sed -e 's/ xmlns.*=".*"//g' | \
  xml select -t -v "//title" -o ','-v "//another" -n

Monday, July 25, 2016

Crush Tools dbstream with PostgreSQL

There's a command line library from Google called Crush Tools. It hasn't been updated for a while, but a few commands look useful, like pivot and aggregate.

There's also one called dbstream, which is a perl script to stream to/from a database.

Here's how to run it (on OS X anyways):

Install it (crush-tools, Perl package manager, Perl PostgreSQL driver):
$ brew install crush-tools
$ curl -L https://cpanmin.us | perl - --sudo App::cpanminus
$ sudo cpanm DBD::Pg

And run it:
$ dbstream -de , -ds dbi:Pg:dbname=foo -u alice -p secret -s "select 1,2;"

or insert each record from a file:
$ cut -d, -f1-2 input.csv | dbstream ... -s "insert into tbl values (?,?);"

or look up each record from a file:
$ cut -d, -f1 input.csv | dbstream ... -s "select name from tbl where code=?;"

or update up each record from a file:
$ awk -F, '{print $2","$1}' input.csv | dbstream ... -s "update tbl set name=? where id=?;"

where
-de is the input & output field separator
-ds is a Perl DBI datasource
-u username
-p password
-s SQL

It's here if it's not in your package manager: https://github.com/google/crush-tools