Monday, July 28, 2014

Searching multiple columns using PostgreSQL text search

You can do multiple column text search out-of-the-box with PostgreSQL. Well, you can do it on up to four columns.

The trick is to use the tsvector "weight" label as a label representing the column.


Our schema:

create table multi_fields (
  persons_initials text not null,
  phone text not null,
  province char(2) not null
);

insert into multi_fields values
('ON','6045551212','BC'),
('BC','4165551212','ON'),
('ON','2505551212','BC');


Now we create a function to label our columns, and put an index on our table using that function. This example bypasses the text search parser. You'll want probably want to use setweight() and to_tsvector() || to_tsvector() ... in production.

create or replace function three_column_ts_vector(text, text, text) returns tsvector strict immutable language sql as '
  select ( $1 || '':1A '' || $2 || '':1B '' || $3 || '':1C'' )::tsvector;
';

create index on multi_fields using gin( three_column_ts_vector(persons_initials, phone, province) );


Some sample queries:

--where any column contains BC (3 results):
select
 *
from multi_fields
where 
 three_column_ts_vector(persons_initials, phone, province) @@ 'BC'::tsquery;
--where only the province column (label "C") contains BC (2 results):
select
 *
from multi_fields
where 
 three_column_ts_vector(persons_initials, phone, province) @@ 'BC:C'::tsquery;
--where only the persons_initials column (label "A") contains BC (1 result):
select
 *
from multi_fields
where 
 three_column_ts_vector(persons_initials, phone, province) @@ 'BC:A'::tsquery;
--where only the phone column (label "B") contains BC (0 results):
select
 *
from multi_fields
where 
 three_column_ts_vector(persons_initials, phone, province) @@ 'BC:B'::tsquery;
--where province is BC and the name is ON (2 results):
select
 *
from multi_fields
where 
 three_column_ts_vector(persons_initials, phone, province) @@ 'BC:C & ON:A'::tsquery;

No comments: