Skip to content

LIKE translation is collate sensitive #364

@sisidra

Description

@sisidra

What happens?

Consider query:

select endpoint from logs where endpoint like '/api/%'

It translates to following PostgreSQL queries (got from SET pg_debug_show_queries = TRUE):

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ

COPY (SELECT "endpoint" FROM "postgres"."logs" WHERE ctid BETWEEN '(0,0)'::tid AND '(1000,0)'::tid AND ("endpoint" >= '/api/' AND "endpoint" < '/api0')) TO STDOUT (FORMAT "binary");

COPY (SELECT "endpoint" FROM "postgres"."logs" WHERE ctid BETWEEN '(1000,0)'::tid AND '(4294967295,0)'::tid AND ("endpoint" >= '/api/' AND "endpoint" < '/api0')) TO STDOUT (FORMAT "binary");

COMMIT

Looks almost reasonable, only text comparison in Postgres is collate sensitive. See:

select
    '/api/something' < '/api0' COLLATE "en_US.utf8" AS collate_en_us_wrong,
    '/api/something' < '/api0' COLLATE "C" AS collate_c_correct,
    '/api/something' < '/api0' COLLATE "unicode" AS collate_unicode_correct,
    '/api/something' LIKE '/api/%' AS like_correct

Output:

collate_en_us_wrong collate_c_correct collate_unicode_correct like_correct
f t t t

As a result we get incorrect results - for example like '/%' yields no rows and like 'a%' yields all rows for table containing absolute paths.

To Reproduce

  1. Attach to postgresql
  2. Ensure default collate is "en_US.utf8"
  3. Match path-like data with LIKE '/%

OS:

MacOS Sequoia

PostgreSQL Version:

PostgreSQL 17.5 (Debian 17.5-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

DuckDB Version:

v1.3.2 (Ossivalis) 0b83e5d2f6

DuckDB Client:

v1.3.2 (Ossivalis) 0b83e5d2f6

Full Name:

Mārtiņš Kalvāns

Affiliation:

NymTech

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions