You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered an error while doing a query with a join:
failed to optimize plan: Internal error: Input field name count(Int64(1)) does not match with the projection expression count(Int64(1)):1
This occurs when I do a join and the resulting schema has more than 2 fields with a count(*) (or any aggregator), I understand that datafusion internally tries to rename the column names to be able to create a DFSchema, since arrow does not support duplicate names. While debugging I saw that what it does internally is:
The name tracker sees count(Int64(1))
The name traccker sees count(Int64(1))again and renames it to count(Int64(1)):1
The name traccker sees count(Int64(1))again and renames it to count(Int64(1)):2 or count(Int64(1)):1 again?
I have another related error but this one arises when trying to create the logical plan that is: failed to translate modified plan to DataFusion: Schema error: Schema contains duplicate unqualified field name "count(Int64(1)):1"
On my local clone, for the second error, I managed to make datafusion generate a logical plan by tweaking this function so in case it encounters count(Int64(1)):1 for the second time, its renamed to count(Int64(1)):2 and there are no duplicate names, but then when trying to build the physical plan I get the error I mentioned first.
To Reproduce
CREATE TABLE left_table (
id INT PRIMARY KEY,
category TEXT,
timestamp TIMESTAMP
);
CREATE TABLE right_table (
id INT PRIMARY KEY,
category TEXT,
timestamp TIMESTAMP
);
INSERT INTO left_table (id, category, timestamp) VALUES
(1, 'business_logic', '2024-02-15 10:00:00'),
(2, 'attack_attempt', '2024-02-15 10:05:00');
INSERT INTO right_table (id, category, timestamp) VALUES
(1, 'info', '2024-02-15 10:10:00'),
(2, 'low', '2024-02-15 10:15:00');
WITH first_agg AS (
SELECT id, COUNT(*) AS count_first FROM left_table GROUP BY id
),
second_agg AS (
SELECT id, COUNT(*) AS count_second FROM right_table GROUP BY id
),
third_agg AS (
SELECT id, COUNT(*) AS count_third FROM right_table GROUP BY id
),
fourth_random_table AS (
SELECT id, category FROM right_table GROUP BY id, category
)
select count_first, category, count_second, count_third from first_agg
LEFT JOIN fourth_random_table using (id)
LEFT JOIN second_agg using (id)
LEFT JOIN third_agg using (id) `
Expected behavior
No response
Additional context
I'm using version 43.0
The text was updated successfully, but these errors were encountered:
The error "Schema error: Ambiguous reference to unqualified field id" occurs because multiple tables in your query contain a column named id, and you're using USING (id), which requires id to be unambiguous in all participating tables.
In your sql, second_agg, third_agg, and fourth_random_table all originate from right_table, id becomes ambiguous when SQL tries to determine which one to use in further joins.
Here's how you can resolve the ambiquity:
WITH first_agg AS (
SELECT id, COUNT(*) AS count_first FROM left_table GROUP BY id
),
second_agg AS (
SELECT id, COUNT(*) AS count_second FROM right_table GROUP BY id
),
third_agg AS (
SELECT id, COUNT(*) AS count_third FROM right_table GROUP BY id
),
fourth_random_table AS (
SELECT id, category FROM right_table GROUP BY id, category
)
SELECTfa.count_first,
frt.category,
sa.count_second,
ta.count_thirdFROM first_agg fa
LEFT JOIN fourth_random_table frt ONfa.id=frt.idLEFT JOIN second_agg sa ONfa.id=sa.idLEFT JOIN third_agg ta ONfa.id=ta.id;
Describe the bug
I have encountered an error while doing a query with a join:
failed to optimize plan: Internal error: Input field name count(Int64(1)) does not match with the projection expression count(Int64(1)):1
This occurs when I do a join and the resulting schema has more than 2 fields with a count(*) (or any aggregator), I understand that datafusion internally tries to rename the column names to be able to create a DFSchema, since arrow does not support duplicate names. While debugging I saw that what it does internally is:
count(Int64(1))
count(Int64(1))
again and renames it tocount(Int64(1)):1
count(Int64(1))
again and renames it tocount(Int64(1)):2
orcount(Int64(1)):1
again?I have another related error but this one arises when trying to create the logical plan that is:
failed to translate modified plan to DataFusion: Schema error: Schema contains duplicate unqualified field name "count(Int64(1)):1"
On my local clone, for the second error, I managed to make datafusion generate a logical plan by tweaking this function so in case it encounters
count(Int64(1)):1
for the second time, its renamed tocount(Int64(1)):2
and there are no duplicate names, but then when trying to build the physical plan I get the error I mentioned first.To Reproduce
Expected behavior
No response
Additional context
I'm using version 43.0
The text was updated successfully, but these errors were encountered: