Issue 7935/integrate httpclient to bulk job creation and status update #38685

maxi297 · 2024-05-27T20:17:45Z

What

Partially address https://github.com/airbytehq/airbyte-internal-issues/issues/7935 (the bulk creation and status update part)

How

By:

Moving the HTTP query for bulk job creation outside the stream
By using the HttpClient to perform bulk creation and status update requests
By assuming all other HTTP response will be handled as part of the current code

Review guide

Removing the query from base_stream: airbyte-integrations/connectors/source-shopify/source_shopify/streams/base_streams.py While doing that, I also took the opportunity to align stream_slices with what cursors in the CDK do so that when we will standardize, there will be this thing less to do
Adding the query as part of the bulk: airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py

User Impact

The connector should be resilient following this change as TRANSIENT_EXCEPTIONS will be retried (see the updated test in airbyte-integrations/connectors/source-shopify/unit_tests/integration/test_bulk_stream.py). Apart from that, we expect no behavior change.

This will also add retries on 429 + 5XX errors which will affect not only bulk job creation and status update but also access scope (see #38678)

Can this PR be safely reverted and rolled back?

YES 💚
NO ❌

TODO

Migrate the deletion and getting results to HttpClient
Ensure that all error handling that is re-usable is in the ShopifyErrorHandler

… status update

vercel · 2024-05-27T20:17:49Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Jun 5, 2024 1:19pm

maxi297 · 2024-05-27T20:25:33Z

airbyte-integrations/connectors/source-shopify/source_shopify/http_request.py

-                return ErrorResolution(ResponseAction.IGNORE, None, None)
+                return _NO_ERROR_RESOLUTION
+
+            if response.status_code == 429 or response.status_code >= 500:


This behavior was part of the HttpStream which was used for bulk job creation requests. Hence, I moved the logic here so that it applies to all the HTTP requests performed by HttpClient

...yte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/exceptions.py

maxi297 · 2024-05-27T20:28:44Z

airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py

 from requests.exceptions import JSONDecodeError
 from source_shopify.utils import ApiTypeEnum
 from source_shopify.utils import ShopifyRateLimiter as limiter

+from ...http_request import ShopifyErrorHandler


Is there a reason why we use relative import here instead of source_shopify.http_request?

Who should answer this question?

I don't know but I haven't encountered any issue associated to this. It just felt odd that we do that here while we use fully qualified import elsewhere

Currently, there is no such import (for master), i believe this one was added by you, with this change, if not, please attach the related PR that implemented this line, because i'm confused)

airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py

bazarnov · 2024-05-28T10:26:15Z

airbyte-integrations/connectors/source-shopify/source_shopify/http_request.py

@@ -21,6 +21,8 @@
    exceptions.SSLError,
 ) + RESPONSE_CONSUMPTION_EXCEPTIONS

+_NO_ERROR_RESOLUTION = ErrorResolution(ResponseAction.SUCCESS, None, None)


Can we add the comment about this, what it stands for and what is the parrent used?

bazarnov · 2024-05-28T10:35:36Z

This change makes sense to me, but I'm concerned about the CAT tests. @maxi297, please take a look. It seems like the API Password expired or something happened in between the reading cycle.

Let's ensure we still pass the correct authentication information to the HttpClient, when:

request is retried
forced canceled and created another job with another slice range

Once CAT passes, I'll approve this change right away, since it's pretty straightforward. Thanks @maxi297

maxi297 · 2024-05-28T20:57:24Z

@bazarnov If you want to recheck this PR, there were two weird cases on error retrying that felt weird so I changed a couple more things:

When there was a JSON parsing error on the job creation request, we would not retry (see this test that reproduce the issue). The issues seems to be that the retry was on job_process_created (see this) and this only redo the HTTP request if it is a concurrency error here
We didn’t retry on the first status update if there was an issue (see this test). This was caused by self._job_state not being set when a job was created

Let me know if you see issues with this. I'll move the job cancellation and fetching the job results in another PR tomorrow

bazarnov · 2024-05-28T21:44:39Z

airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py


    def _has_running_concurrent_job(self, errors: Optional[Iterable[Mapping[str, Any]]] = None) -> bool:
        """
-        When concurent BULK Job is already running for the same SHOP we receive:
+        When concurrent BULK Job is already running for the same SHOP we receive:


Why would we change the concurrent to be concurent ? I believe there is an English word concurrent - article

bazarnov · 2024-05-28T21:53:34Z

@bazarnov If you want to recheck this PR, there were two weird cases on error retrying that felt weird so I changed a couple more things:

When there was a JSON parsing error on the job creation request, we would not retry (see this test that reproduce the issue). The issues seems to be that the retry was on job_process_created (see this) and this only redo the HTTP request if it is a concurrency error here

We didn’t retry on the first status update if there was an issue (see this test). This was caused by self._job_state not being set when a job was created

Let me know if you see issues with this. I'll move the job cancellation and fetching the job results in another PR tomorrow

This change looks good to me, left some small comments.

Also, for after the latest CAT, we see this schema change for the countries stream:

ERROR    root:test_core.py:902 
The countries stream has the following schema errors:
0.0 is not of type 'null', 'integer'

Failed validating 'type' in schema['properties']['provinces']['items']['properties']['tax_percentage']:
    {'description': 'Percentage value of tax applicable in the province.',
     'type': ['null', 'integer']}

On instance['provinces'][0]['tax_percentage']:
    0.0

Should we include this as a fix to this PR as well? I believe this one will be a breaking change for the countries stream.

bazarnov · 2024-05-28T21:55:02Z

I'll also test the latest changes tomorrow manually to see if there are no regressions on the retry on concurrency and cancelation and retry scenarios. Will update you here.

…issue-7935/integrate-httpclient-to-bulk-job-check

maxi297 · 2024-05-29T00:38:34Z

Should we include this as a fix to this PR as well? I believe this one will be a breaking change for the countries stream.

I would create another PR on master for that as have the two changes being independent. Does that make sense to you?

maxi297 · 2024-05-29T02:17:46Z

@bazarnov Here is the PR for the breaking change: #38746

bazarnov · 2024-05-29T09:09:18Z

@bazarnov Here is the PR for the breaking change: #38746

This one has been approved, we can merge it.

…issue-7935/integrate-httpclient-to-bulk-job-check

maxi297 added 2 commits May 27, 2024 14:52

Move query generation to 'request_body_json'

bcadad6

Move job creation to 'job.py' and use HttpClient for job creation and…

c80bf11

… status update

maxi297 requested a review from a team May 27, 2024 20:17

octavia-squidington-iii added area/connectors Connector related issues connectors/source/shopify labels May 27, 2024

octavia-squidington-iv requested a review from a team May 27, 2024 20:18

maxi297 changed the title ~~Issue 7935/integrate httpclient to bulk job check~~ Issue 7935/integrate httpclient to bulk job creation and status update May 27, 2024

maxi297 commented May 27, 2024

View reviewed changes

Format and small fix

021dbb2

maxi297 commented May 27, 2024

View reviewed changes

...yte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/exceptions.py Show resolved Hide resolved

maxi297 commented May 27, 2024

View reviewed changes

airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py Outdated Show resolved Hide resolved

Empty commit to retrigger CI

d85a613

bazarnov reviewed May 28, 2024

View reviewed changes

Pass the right session so that authentication occurs

204dea5

bazarnov approved these changes May 28, 2024

View reviewed changes

maxi297 added 3 commits May 28, 2024 15:38

Fix typing

e97debb

adding tests before fixing some things

bea166a

Fixing some of the retry

38f3106

bazarnov reviewed May 28, 2024

View reviewed changes

Merge branch 'issue-7935/integrate-httpclient-to-access-scopes' into …

fd0d857

…issue-7935/integrate-httpclient-to-bulk-job-check

vercel bot deployed to Preview May 29, 2024 00:42 View deployment

Merge branch 'issue-7935/integrate-httpclient-to-access-scopes' into …

c774ef6

…issue-7935/integrate-httpclient-to-bulk-job-check

vercel bot deployed to Preview May 29, 2024 14:43 View deployment

Move cancel and get_results to HttpClient (#38933)

fc6dc6c

maxi297 merged commit aa691e7 into issue-7935/integrate-httpclient-to-access-scopes Jun 5, 2024
16 of 19 checks passed

maxi297 deleted the issue-7935/integrate-httpclient-to-bulk-job-check branch June 5, 2024 13:19

maxi297 mentioned this pull request Jun 5, 2024

✨ Source Shopify: add resiliency on some transient errors using the HttpClient #38084

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 7935/integrate httpclient to bulk job creation and status update #38685

Issue 7935/integrate httpclient to bulk job creation and status update #38685

maxi297 commented May 27, 2024 •

edited

Loading

vercel bot commented May 27, 2024 •

edited

Loading

maxi297 May 27, 2024

maxi297 May 27, 2024

bazarnov May 28, 2024

maxi297 May 28, 2024

bazarnov May 28, 2024

bazarnov May 28, 2024

bazarnov commented May 28, 2024

maxi297 commented May 28, 2024 •

edited

Loading

bazarnov May 28, 2024

bazarnov commented May 28, 2024

bazarnov commented May 28, 2024

maxi297 commented May 29, 2024

maxi297 commented May 29, 2024

bazarnov commented May 29, 2024

Issue 7935/integrate httpclient to bulk job creation and status update #38685

Issue 7935/integrate httpclient to bulk job creation and status update #38685

Conversation

maxi297 commented May 27, 2024 • edited Loading

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

TODO

vercel bot commented May 27, 2024 • edited Loading

maxi297 May 27, 2024

Choose a reason for hiding this comment

maxi297 May 27, 2024

Choose a reason for hiding this comment

bazarnov May 28, 2024

Choose a reason for hiding this comment

maxi297 May 28, 2024

Choose a reason for hiding this comment

bazarnov May 28, 2024

Choose a reason for hiding this comment

bazarnov May 28, 2024

Choose a reason for hiding this comment

bazarnov commented May 28, 2024

maxi297 commented May 28, 2024 • edited Loading

bazarnov May 28, 2024

Choose a reason for hiding this comment

bazarnov commented May 28, 2024

bazarnov commented May 28, 2024

maxi297 commented May 29, 2024

maxi297 commented May 29, 2024

bazarnov commented May 29, 2024

maxi297 commented May 27, 2024 •

edited

Loading

vercel bot commented May 27, 2024 •

edited

Loading

maxi297 commented May 28, 2024 •

edited

Loading