-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Zuora: issues with high volume events (10m+ rows using live) #25319
Comments
Just corrected the above to note that current date window query appears to be pulling the full stream of rows on the Zuora side (hence breaching the API limit of 10m rows) even if the number of rows in the date window is small. This will be a game stopping issue as the unlimited connection will be depreciated by Zuora end of April, meaning these streams will fail. Possible reasons could be either the non-standard date format or order by statement. |
Looks like the date issue with rows is connected to the following pull: |
Thanks @frans-k! I tired also building it myself, but had a similar issue. Not sure if this is related to another change in the code base yet. |
@frans-k, I've posted comment on your pull request that fixed another bug with the connector causing authentication to fail. Possibly another change caused this. You might be able to pass the prior test errors with it, as they were not a sign of bad credentials. |
Zuora was archived and won't receive new updates. |
Hi @marcosmarxm, Unfortunately there is not support for Zuora via the connector builder since it creates outputs that need to be fetched. What changes need to be made to update the connector? |
Environment
Current Behavior
It appears that the queries are causing full table scans, and not limited to rows only in the set dates ranges (this could be because the queries are not using the recommended date formats or possibly the order by statement; https://knowledgecenter.zuora.com/Zuora_Central_Platform/Query/Export_ZOQL/G_Dates_and_Datetimes). Accordingly, any dataset with more than 10m rows will fail no matter the date limits set (using live, unlimited will be depreciated by Zuora soon).
Additionally, the Zuora source connector uses date windows to reduce the number of events being extracted from the Zuora API to remain under API limits. However, as Zuora calculates revenue items or invoices items during billing runs, these data sources will very often will have large volumes in a set window size as they generate in very quick burst (even if setting it to sub second values). This is seen in airbyte/airbyte-integrations/connectors/source-zuora/source_zuora/source.py at line 231, where the query function is iterating over date windows.
To better handle stream with high number of events, the connector should look at using the AQuA API.
Expected Behavior
The query should not be causing Zuora to read more than 10m rows, even if the date is selecting a small amount. This causes any window size to fail.
Additionally as the unlimited data source for Zuora is about to be removed, the connector should look at using the recommended AQuA API to support high volume events (which supports stateful updates):
https://knowledgecenter.zuora.com/Zuora_Central_Platform/API/AB_Aggregate_Query_API
Alternatively it could querying from the last known event or start date (as done currently), but with a limit on rows equal to the maximum number of events found here (not using a date window). The subsequent query or stream slice should then use the last retrieved timestamp for the next query.
Logs
2023-04-19 14:06:10 source > Query failed (#20230419_140036_49258_z969q): Input Rows for revenueevent exceeded limit (10000000), QUERY: select * from revenueevent where updateddate >= TIMESTAMP '2023-03-01 00:00:00.000000 +00:00' and updateddate <= TIMESTAMP '2023-03-01 00:00:00.864000 +00:00' order by updateddate asc
Steps to Reproduce
The text was updated successfully, but these errors were encountered: