Friday, February 14, 2014

Trees and Paths using WITH RECURSIVE in PostgreSQL

Let's say we had some data like this - from the United Nations geoscheme for world regions:

create table subregions (
  id smallint primary key,
  name text not null,
  parent_id smallint null references subregions(id)
);

insert into subregions values
(1,'World',null),
(2,'Africa',1),
(5,'South America',419),
(9,'Oceania',1),
(11,'Western Africa',2),
(13,'Central America',419),
(14,'Eastern Africa',2),
(15,'Northern Africa',2),
(17,'Middle Africa',2),
(18,'Southern Africa',2),
(19,'Americas',1),
(21,'Northern America',19),
(29,'Caribbean',419),
(30,'Eastern Asia',142),
(34,'Southern Asia',142),
(35,'South-Eastern Asia',142),
(39,'Southern Europe',150),
(53,'Australia and New Zealand',9),
(54,'Melanesia',9),
(57,'Micronesia',9),
(61,'Polynesia',9),
(142,'Asia',1),
(143,'Central Asia',142),
(145,'Western Asia',142),
(150,'Europe',1),
(151,'Eastern Europe',150),
(154,'Northern Europe',150),
(155,'Western Europe',150),
(419,'Latin America and the Caribbean',19);
And you wanted to make a pretty tree like this:

World
   Africa
      Eastern Africa
      Middle Africa
      Northern Africa
      Southern Africa
      Western Africa
   Americas
      Latin America and the Caribbean
         Caribbean
         Central America
         South America
      Northern America
   Asia
      Central Asia
      Eastern Asia
      South-Eastern Asia
      Southern Asia
      Western Asia
   Europe
      Eastern Europe
      Northern Europe
      Southern Europe
      Western Europe
   Oceania
      Australia and New Zealand
      Melanesia
      Micronesia
      Polynesia

Here's how you'd do it:

with recursive my_expression as (
  
  --start with the "anchor", i.e. all of the nodes whose parent_id is null:
  select
    id, 
    name as path,
    name as tree,
    0 as level 
  from subregions
  where 
    parent_id is null

  union all

  --then the recursive part:
  select
    current.id as id,
    previous.path || ' > ' || current.name as path,
    repeat('   ', previous.level + 1) || current.name as tree,
    previous.level + 1 as level
  from subregions current 
  join my_expression as previous on current.parent_id = previous.id
)
select
  tree
from my_expression
order by 
  path

You can think of WITH RECURSIVE as a chain of UNION statements. A good explanation here: How does a Recursive CTE run, line by line?

You can also show paths like this:

select
  path 
from my_expression
order by
  path

World
World > Africa
World > Africa > Eastern Africa
World > Africa > Middle Africa
World > Africa > Northern Africa
World > Africa > Southern Africa
World > Africa > Western Africa
World > Americas
World > Americas > Latin America and the Caribbean
World > Americas > Latin America and the Caribbean > Caribbean
World > Americas > Latin America and the Caribbean > Central America
World > Americas > Latin America and the Caribbean > South America
World > Americas > Northern America
World > Asia
World > Asia > Central Asia
World > Asia > Eastern Asia
World > Asia > South-Eastern Asia
World > Asia > Southern Asia
World > Asia > Western Asia
World > Europe
World > Europe > Eastern Europe
World > Europe > Northern Europe
World > Europe > Southern Europe
World > Europe > Western Europe
World > Oceania
World > Oceania > Australia and New Zealand
World > Oceania > Melanesia
World > Oceania > Micronesia
World > Oceania > Polynesia
Fiddle with it.

2 comments:

Paul D said...

Wow... this will be awesome when I understand what is going on with this.

Anonymous said...

How performant is this?