Duplicate Unqualified Field Name #14799

LiaCastaneda · 2025-02-20T18:09:36Z

Describe the bug

I have encountered an error while doing a query with a join:

failed to optimize plan: Internal error: Input field name count(Int64(1)) does not match with the projection expression count(Int64(1)):1

This occurs when I do a join and the resulting schema has more than 2 fields with a count(*) (or any aggregator), I understand that datafusion internally tries to rename the column names to be able to create a DFSchema, since arrow does not support duplicate names. While debugging I saw that what it does internally is:

The name tracker sees count(Int64(1))
The name traccker sees count(Int64(1))again and renames it to count(Int64(1)):1
The name traccker sees count(Int64(1))again and renames it to count(Int64(1)):2 or count(Int64(1)):1 again?

I have another related error but this one arises when trying to create the logical plan that is:
failed to translate modified plan to DataFusion: Schema error: Schema contains duplicate unqualified field name "count(Int64(1)):1"

On my local clone, for the second error, I managed to make datafusion generate a logical plan by tweaking this function so in case it encounters count(Int64(1)):1 for the second time, its renamed to count(Int64(1)):2 and there are no duplicate names, but then when trying to build the physical plan I get the error I mentioned first.

To Reproduce

CREATE TABLE left_table (
    id INT PRIMARY KEY,
    category TEXT,
    timestamp TIMESTAMP
);

CREATE TABLE right_table (
    id INT PRIMARY KEY,
    category TEXT,
    timestamp TIMESTAMP
);

INSERT INTO left_table (id, category, timestamp) VALUES
(1, 'business_logic', '2024-02-15 10:00:00'),
(2, 'attack_attempt', '2024-02-15 10:05:00');

INSERT INTO right_table (id, category, timestamp) VALUES
(1, 'info', '2024-02-15 10:10:00'),
(2, 'low', '2024-02-15 10:15:00');

WITH first_agg AS (
    SELECT id, COUNT(*) AS count_first FROM left_table GROUP BY id
),
second_agg AS (
    SELECT id, COUNT(*) AS count_second FROM right_table GROUP BY id
),

third_agg AS (
    SELECT id, COUNT(*) AS count_third FROM right_table GROUP BY id
),
fourth_random_table AS (
    SELECT id, category FROM right_table GROUP BY id, category
)

select count_first, category, count_second, count_third from first_agg 
  LEFT JOIN fourth_random_table using (id) 
  LEFT JOIN  second_agg using (id)
  LEFT JOIN third_agg using  (id) `

Expected behavior

No response

Additional context

I'm using version 43.0

The text was updated successfully, but these errors were encountered:

kosiew · 2025-02-21T07:28:47Z

The error "Schema error: Ambiguous reference to unqualified field id" occurs because multiple tables in your query contain a column named id, and you're using USING (id), which requires id to be unambiguous in all participating tables.

In your sql, second_agg, third_agg, and fourth_random_table all originate from right_table, id becomes ambiguous when SQL tries to determine which one to use in further joins.

Here's how you can resolve the ambiquity:

WITH first_agg AS (
    SELECT id, COUNT(*) AS count_first FROM left_table GROUP BY id
),
second_agg AS (
    SELECT id, COUNT(*) AS count_second FROM right_table GROUP BY id
),
third_agg AS (
    SELECT id, COUNT(*) AS count_third FROM right_table GROUP BY id
),
fourth_random_table AS (
    SELECT id, category FROM right_table GROUP BY id, category
)
SELECT 
    fa.count_first, 
    frt.category, 
    sa.count_second, 
    ta.count_third 
FROM first_agg fa
LEFT JOIN fourth_random_table frt ON fa.id = frt.id
LEFT JOIN second_agg sa ON fa.id = sa.id
LEFT JOIN third_agg ta ON fa.id = ta.id;

Results

+-------------+----------+--------------+-------------+
| count_first | category | count_second | count_third |
+-------------+----------+--------------+-------------+
| 1           | info     | 1            | 1           |
| 1           | low      | 1            | 1           |
+-------------+----------+--------------+-------------+
2 row(s) fetched.

LiaCastaneda added the bug Something isn't working label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate Unqualified Field Name #14799

Duplicate Unqualified Field Name #14799

LiaCastaneda commented Feb 20, 2025 •

edited

Loading

kosiew commented Feb 21, 2025 •

edited

Loading

Duplicate Unqualified Field Name #14799

Duplicate Unqualified Field Name #14799

Comments

LiaCastaneda commented Feb 20, 2025 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

kosiew commented Feb 21, 2025 • edited Loading

LiaCastaneda commented Feb 20, 2025 •

edited

Loading

kosiew commented Feb 21, 2025 •

edited

Loading