You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
since committees change names to reflect what they are supporting or opposing for an election
This problem was found under the following circumstance:
There are some differences in the build output that are due to DISTINCT in SQL picking out different rows when there are multiple committees for the same Filer_ID. As an example, take a look at the difference for build/_data/candidates/oakland/2014-11-04/bryan-parker.json. The value for Filer_NamL is different and it comes from this query in the independent_candidate_expenditures view:
SELECT DISTINCT ON ("Filer_ID") "Filer_ID", "Filer_NamL"
FROM committees WHERE "Filer_ID" = '983545';
The following query shows that there are two rows for the same Filer_ID and so the DISTINCT portion of the query picks a different one before and after the csvkit upgrade:
disclosure-backend=# SELECT "Filer_ID", "Filer_NamL"
FROM committees WHERE "Filer_ID" = '983545';
Filer_ID | Filer_NamL
----------+---------------------------------------------------
983545 | OAKPAC, Oakland Metropolitan Chamber of Commerce
983545 | OAK PAC, Oakland Metropolitan Chamber of Commerce
(2 rows)
If we force it to always pick the same one (like always pick the most recent one), it will help reduce the noise so that we can verify that there aren't any other differences that are more serious.
We want to try to do better than pick the most recent one this time around.
The text was updated successfully, but these errors were encountered:
since committees change names to reflect what they are supporting or opposing for an election
This problem was found under the following circumstance:
There are some differences in the build output that are due to DISTINCT in SQL picking out different rows when there are multiple committees for the same Filer_ID. As an example, take a look at the difference for build/_data/candidates/oakland/2014-11-04/bryan-parker.json. The value for Filer_NamL is different and it comes from this query in the independent_candidate_expenditures view:
SELECT DISTINCT ON ("Filer_ID") "Filer_ID", "Filer_NamL"
FROM committees WHERE "Filer_ID" = '983545';
The following query shows that there are two rows for the same Filer_ID and so the DISTINCT portion of the query picks a different one before and after the csvkit upgrade:
disclosure-backend=# SELECT "Filer_ID", "Filer_NamL"
FROM committees WHERE "Filer_ID" = '983545';
Filer_ID | Filer_NamL
----------+---------------------------------------------------
983545 | OAKPAC, Oakland Metropolitan Chamber of Commerce
983545 | OAK PAC, Oakland Metropolitan Chamber of Commerce
(2 rows)
If we force it to always pick the same one (like always pick the most recent one), it will help reduce the noise so that we can verify that there aren't any other differences that are more serious.
We want to try to do better than pick the most recent one this time around.
The text was updated successfully, but these errors were encountered: