-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
What happens?
Consider query:
select endpoint from logs where endpoint like '/api/%'
It translates to following PostgreSQL queries (got from SET pg_debug_show_queries = TRUE):
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
COPY (SELECT "endpoint" FROM "postgres"."logs" WHERE ctid BETWEEN '(0,0)'::tid AND '(1000,0)'::tid AND ("endpoint" >= '/api/' AND "endpoint" < '/api0')) TO STDOUT (FORMAT "binary");
COPY (SELECT "endpoint" FROM "postgres"."logs" WHERE ctid BETWEEN '(1000,0)'::tid AND '(4294967295,0)'::tid AND ("endpoint" >= '/api/' AND "endpoint" < '/api0')) TO STDOUT (FORMAT "binary");
COMMIT
Looks almost reasonable, only text comparison in Postgres is collate sensitive. See:
select
'/api/something' < '/api0' COLLATE "en_US.utf8" AS collate_en_us_wrong,
'/api/something' < '/api0' COLLATE "C" AS collate_c_correct,
'/api/something' < '/api0' COLLATE "unicode" AS collate_unicode_correct,
'/api/something' LIKE '/api/%' AS like_correctOutput:
| collate_en_us_wrong | collate_c_correct | collate_unicode_correct | like_correct |
|---|---|---|---|
| f | t | t | t |
As a result we get incorrect results - for example like '/%' yields no rows and like 'a%' yields all rows for table containing absolute paths.
To Reproduce
- Attach to postgresql
- Ensure default collate is "en_US.utf8"
- Match path-like data with
LIKE '/%
OS:
MacOS Sequoia
PostgreSQL Version:
PostgreSQL 17.5 (Debian 17.5-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
DuckDB Version:
v1.3.2 (Ossivalis) 0b83e5d2f6
DuckDB Client:
v1.3.2 (Ossivalis) 0b83e5d2f6
Full Name:
Mārtiņš Kalvāns
Affiliation:
NymTech
Have you tried this on the latest main branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree
Metadata
Metadata
Assignees
Labels
No labels