GitHub - ISPT/prefix: Prefix Range module for PostgreSQL

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
debian		debian
.gitignore		.gitignore
Makefile		Makefile
README.txt		README.txt
TESTS.txt		TESTS.txt
prefix.c		prefix.c
prefix.spec		prefix.spec
prefix.sql.in		prefix.sql.in
prefixes.fr.csv		prefixes.fr.csv
prefixes.sql		prefixes.sql
queries.sql		queries.sql
ranges.sql		ranges.sql

Repository files navigation

= Prefix Opclass

This module is written by Dimitri Fontaine with a great amount of help
from RhodiumToad (formely known as AndrewSN), who was the one advising
for a 'GiST opclass' to solve the prefix matching problem.

== Presentation

Prefix matching is both very common and important in telephony
applications, where call routing and costs depend on matching
caller/callee phone number to an operator prefix.

Let's say the prefixes table is called +prefixes+, a typical query
will try to match a phone number to the longest prefix in the table:

    SELECT * 
      FROM prefixes
     WHERE prefix @> '0123456789'
  ORDER BY length(prefix) DESC
     LIMIT 1;

== Installation

Check +$PATH+, then

  make
  make install
  psql <connection string> -f prefix.sql <database>

The +make install+ step might have to be done as +root+, and the
psql one has to be done as a PostgreSQL 'superuser'.

== Uninstall

It's as easy as:

  DROP TYPE prefix_range CASCADE;

== Usage

=== Table and index creation

  create table prefixes (
         prefix    prefix_range primary key,
         name      text not null,
         shortname text,
         status    char default 'S',

         check( status in ('S', 'R') )
  );
  comment on column prefixes.status is 'S:   - R: reserved';
  
  \copy prefixes from 'prefixes.fr.csv' with delimiter ; csv quote '"'

  create index idx_prefix on prefixes using gist(prefix);
 
=== Simple tests:

  dim=# select '123'::prefix_range @> '123456';
   ?column?
  ----------
   t
  (1 row)

Please note earlier versions of +prefix+ didn't use any restriction
nor join selectivity estimator functions for the +@>+ operator, so you
had to +set enable_seqscan to off+ to use the index. That's no more
the case, the +@>+ operator uses +contsel+ and +contjoinsel+ and the
planner is able to figure out by itself when to use the index or not.

If you don't understand previous mention, ignore it and use a more
recent version of +prefix+ than +0.2+.

==== Forcing seqcan

  dim=# select * from ranges where prefix @> '0146640123';
   prefix |      name      | shortname | state
  --------+----------------+-----------+-------
   0146[] | FRANCE TELECOM | FRTE      | S
  (1 row)
  
  Time: 4,071 ms

  dim=# select * from ranges where prefix @> '0100091234';
    prefix  |    name    | shortname | state
  ----------+------------+-----------+-------
   010009[] | LONG PHONE | LGPH      | S
  (1 row)
  
  Time: 4,110 ms

==== Using the Index

  dim=# select * from ranges where prefix @> '0146640123';
   prefix |      name      | shortname | state
  --------+----------------+-----------+-------
   0146[] | FRANCE TELECOM | FRTE      | S
  (1 row)
  
  Time: 1,036 ms

  dim=# select * from ranges where prefix @> '0100091234';
    prefix  |    name    | shortname | state
  ----------+------------+-----------+-------
   010009[] | LONG PHONE | LGPH      | S
  (1 row)
  
  Time: 0,771 ms

As of version 1.0, prefix_range GiST index supports also queries using the
<@, && and = operators (see below).

=== creating prefix_range, cast to and from text

There's a *constructor* function:

  prefix=# select prefix_range('123');
   prefix_range 
  --------------
   123
  (1 row)

  prefix=# select prefix_range('123', '4', '5');
   prefix_range 
  --------------
   123[4-5]
  (1 row)

Casting from unknown literal or text is as easy as usual:

  prefix=# select '123'::prefix_range;
   prefix_range 
  --------------
   123
  (1 row)

  prefix=# select x, x::prefix_range from (values('123'), ('123[4-5]'), ('[2-3]')) as t(x);
      x     |    x     
  ----------+----------
   123      | 123
   123[4-5] | 123[4-5]
   [2-3]    | [2-3]
  (3 rows)

And two casts are provided:

  CREATE CAST (text as prefix_range) WITH FUNCTION prefix_range(text) AS IMPLICIT;
  CREATE CAST (prefix_range as text) WITH FUNCTION text(prefix_range);

Which means you can use text expression in a context where a prefix_range is
expected and it'll get implicit casting, but prefix_range to text has to be
asked explicitely, so that you don't get strange behavior.

=== Provided operators

The prefix module is all about indexing prefix lookups, but in order to be
able to do this with some efficiency, it has to know a lot about prefix
ranges, such as basic comparing, containment, union, intersection and
overlapping.

The operators <=, <, =, <>, >= and > are read as usual, @> is read
*contains*, <@ is read *is contained by*, && is read *overlaps*, and | is
"union" and "&" is "intersect".

  prefix=# select a, b,
    a <= b as "<=", a < b as "<", a = b as "=", a <> b as "<>", a >= b as ">=", a > b as ">",
    a @> b as "@>", a <@ b as "<@", a && b as "&&"
  from  (select a::prefix_range, b::prefix_range
           from (values('123', '123'),
                       ('123', '124'),
                       ('123', '123[4-5]'),
                       ('123[4-5]', '123[2-7]'),
                       ('123', '[2-3]')) as t(a, b)
        ) as x;
      a     |    b     | <= | < | = | <> | >= | > | @> | <@ | && 
  ----------+----------+----+---+---+----+----+---+----+----+----
   123      | 123      | t  | f | t | f  | t  | f | t  | t  | t
   123      | 124      | t  | t | f | t  | f  | f | f  | f  | f
   123      | 123[4-5] | t  | t | f | t  | f  | f | t  | f  | t
   123[4-5] | 123[2-7] | f  | f | f | t  | t  | t | f  | t  | t
   123      | [2-3]    | t  | t | f | t  | f  | f | f  | f  | f
  (5 rows)

  prefix=# select a, b, a | b as union, a & b as intersect 
    from  (select a::prefix_range, b::prefix_range
             from (values('123', '123'), 
                         ('123', '124'), 
                         ('123', '123[4-5]'),
                         ('123[4-5]', '123[2-7]'), 
                         ('123', '[2-3]')) as t(a, b)
          ) as x;
      a     |    b     |  union   | intersect 
  ----------+----------+----------+-----------
   123      | 123      | 123      | 123
   123      | 124      | 12[3-4]  | 
   123      | 123[4-5] | 123      | 123
   123[4-5] | 123[2-7] | 123[2-7] | 123[4-5]
   123      | [2-3]    | [1-3]    | 
  (5 rows)


== See also

This link:TESTS.html[Tests] page is more developper oriented material,
but still of interest.