Friday, December 19, 2014

Sharing records with PostgreSQL Row Security

We can also share protected rows. We can borrow PostgreSQL's built-in aclitem (Access Control List Item) type and aclexplode() function.

This is in no way optimized for performance.

aclitem is "internal", so may change at any time.

create table protected.shareable_docs (
  id int primary key,
  name text not null,
  owner name not null default current_user,
  row_acl aclitem[]
);

/* No joins in updateable views */
create view shareable_docs with ( security_barrier ) as 
  select
    *
  from protected.shareable_docs
  where
    owner = current_user
    or exists (select 1 from aclexplode(row_acl) where privilege_type = 'SELECT' and pg_has_role(grantee, 'member') )
with check option;

grant delete, insert, select, update on shareable_docs to alice, bob;

set role alice;

/* alice creates a doc and shares it with bob */
insert into shareable_docs (id, name, row_acl) values (1, 'shared doc 1', '{bob=r/alice}');

set role bob;

select count(*) from shareable_docs; --result = 1

You'd need to write triggers to prevent users from deleting shared items though.

An ACL Item is structured like this: grantee=privileges/grantor . Grantee is the role (a user or group) that gets the privileges, and grantor is the role that gives the privileges. Here are some of the privilege codes:
r = SELECT (read)
w = UPDATE (write)
d = DELETE
* = WITH GRANT OPTION
So if alice wanted to grant SELECT (WITH GRANT OPTION), UPDATE, and DELETE privileges to role editors, your aclitem would look like this: editors=r*wd/alice

Thursday, December 18, 2014

Simple Row Security with PostgreSQL 9.4

PostgreSQL 9.4 makes row security a whole lot easier:
  • security_barrier views are update-able
  • WITH CHECK OPTION prevents users from inserting, updating, or deleting rows that they can't / won't be able to see
create role alice;

create role bob;

create schema protected;

create table protected.bank_accounts (
 id int primary key,
 name text not null,
 owner name not null default current_user,
 balance decimal(19,2) not null
);

create view bank_accounts with ( security_barrier ) as 
 select
  *
 from protected.bank_accounts
 where
  owner = current_user
with check option;

grant delete, insert, select, update on bank_accounts to alice, bob;

Users can't set owner to a role in which they don't have membership:
set role alice;

insert into bank_accounts values (1, 'chequeing', 'bob', 500);

ERROR:  new row violates WITH CHECK OPTION for view "bank_accounts"
Users can only see their own stuff:
set role alice;

insert into bank_accounts values (1, 'chequeing', 'alice', 500);

select count(*) from bank_accounts; --result = 1

set role bob;

select count(*) from bank_accounts; --result = 0

delete from bank_accounts; --0 rows affected

Monday, December 8, 2014

MS SQL Server to PostgreSQL Quick Copy

You can quickly copy data from an MS SQL Server database to PostgreSQL using the bcp command, iconv, and psql's \copy command.

I copied 100,000 records from one server, to my laptop, to another server in 1.5 seconds.

Sadly, bcp must write to a file first, so you can't pipe it directly. Delete the file when you're done.

In Linux, you could create a named pipe and pretend it's a file, using the mkfifo command.

bcp can only output Unicode in UTF-16, so we must use iconv to convert the output to UTF-8

I assume that you have psql installed, and that it's configured to connect to your PostgreSQL server.

Windows
  1. If bcp is not installed (run bcp.exe -v) then install the Microsoft Command Line Utilities 11 for SQL Server
  2. If win_iconv is not installed, download it, rename it to iconv.exe and put it in your PATH
Linux
  1. If you have not installed the Microsoft ODBC Driver for SQL Server on Linux, you can get it here
In your shell run (this assumes you are using ActiveDirectory/Kerberos auth. Use -U, -P instead of -T if not):
bcp "select * from some_table" queryout results.tsv -T -S serverHostName -w

then:
iconv -f UTF16 -t UTF8 results.tsv | psql -c "\copy some_table from STDIN"
Delete results.tsv

Saturday, December 6, 2014

Fuzzy Record Matching in SQL, Part 1

Let's say you have some existing customer records like this:

Haystack
idNamePhone
1Neil6045551212
2Neil(null)
3(null)6045551212
4(null)(null)

And you have some new customer records like this (say from another company you bought):

Needles
idNamePhone
1Neil6045551212

And you want to find possible matches. Now obviously Haystack1 and Needles1 match, but Haystack2 and Haystack3 are possible matches too.

Here's a basic way to find relevant matches, ranked:

select
 n.id as needle_id,
 h.id as haystack_id,
 case when n.name = h.name then 1 else 0 end 
 + case when n.phone = h.phone then 1 else 0 end as relevance
from 
 needles n
join 
 haystack h 
on 
 n.name = h.name 
 or n.phone = h.phone
order by 
 relevance desc;


This gives us:

Results
needle_idhaystack_idrelevance
112
121
131