Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use HEAD HTTP method to check if a document exists #192

Merged

Conversation

Darkheir
Copy link
Contributor

@Darkheir Darkheir commented Jan 21, 2022

The current has method requests the full document to only check if it exists or not. It may be overkill for big documents.

This PR replaces it with a simple HEAD call allowing to know if the document exists or not.

⚠️ This PR will change the error_code returned. Since we don't have a body in the response, the error_code value will default to status_code.
If the only codes we should handle are the ones in the test, I could update the response object to set the appropriate error_code based on the status code.

@Darkheir Darkheir force-pushed the feat/use_head_method_to_check_if_exists branch from 72e354d to f3b3519 Compare January 21, 2022 18:37
@joowani
Copy link
Contributor

joowani commented Jan 29, 2022

Hi @Darkheir, thanks for the PR. Could you please address the failing test? Thanks.

@joowani joowani changed the base branch from main to dev January 29, 2022 12:55
@joowani joowani changed the base branch from dev to main January 29, 2022 12:55
@Darkheir Darkheir force-pushed the feat/use_head_method_to_check_if_exists branch from f3b3519 to 5410a27 Compare January 30, 2022 19:03
@Darkheir
Copy link
Contributor Author

After some investigation it seems that ArangoDB server hangs on result retrieval for HEAD requests. I created an issue here: arangodb/arangodb#15639

To solve this issue I created a specific condition in the AsyncExecutor for HEAD requests where we directly get the result and return a dummy async job that imitate the async job signature.

IMO this is not a big issue since HEAD requests are never going to be costly and the only one I know is for the document's headers.

@Darkheir Darkheir force-pushed the feat/use_head_method_to_check_if_exists branch 5 times, most recently from f17d267 to d88243e Compare January 30, 2022 19:49
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2022

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.95%. Comparing base (e44a3e8) to head (05eb769).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #192   +/-   ##
=======================================
  Coverage   95.95%   95.95%           
=======================================
  Files          26       26           
  Lines        4277     4277           
=======================================
  Hits         4104     4104           
  Misses        173      173           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joowani
Copy link
Contributor

joowani commented Feb 1, 2022

Hi @Darkheir,

I'm not in favor of introducing the dummy async job class. It seems too "hacky" for a small performance gain, and breaks the users' expectation of error codes from async API calls. I recommend that we undo the dummy class addition, and merge just the "head" bit once arangodb/arangodb#15639 is resolved. Thanks again for the pull request and for creating the issue with ArangoDB.

@Darkheir
Copy link
Contributor Author

Darkheir commented Feb 1, 2022

Hi @Darkheir,

I'm not in favor of introducing the dummy async job class. It seems too "hacky" for a small performance gain, and breaks the users' expectation of error codes from async API calls. I recommend that we undo the dummy class addition, and merge just the "head" bit once arangodb/arangodb#15639 is resolved. Thanks again for the pull request and for creating the issue with ArangoDB.

Fair enough, let's wait until ArangoDB fixed the issue :-)

The bad thing is that even if ArangoDB fixes it, all the people using an old ArangoDB version will get some errors when using python-arango...

@joowani
Copy link
Contributor

joowani commented Feb 1, 2022

That's a good point. You can leave this PR as is for now, and I'll think about it a bit more (maybe we'll merge this after all).

@apetenchea
Copy link
Member

apetenchea commented Mar 12, 2024

Hi @Darkheir,

Thanks again for your work. Are you still interested in continuing this PR? Otherwise, I can take over.

The task is even easier now, as DummyAsyncJob is no longer needed.

@Darkheir
Copy link
Contributor Author

Hi @Darkheir,

Thanks again for your work. Are you still interested in continuing this PR? Otherwise, I can take over.

The task is even easier now, as DummyAsyncJob is no longer needed.

Did ArangoDB fix the issue ? The ticket is still open on their side: #15639

@apetenchea
Copy link
Member

Did ArangoDB fix the issue ? The ticket is still open on their side: #15639

3.8 is not supported since long ago. I tried the head method on 3.11 and it works fine. Anyway, the driver has a CI now, which will tell whether there's any problem, though I doubt there will be any. Since we're planning the next release to be a major one, this would be the time to make such a change.

Please let me know if you still want to take this through (no pressure).

@Darkheir Darkheir force-pushed the feat/use_head_method_to_check_if_exists branch from d88243e to dcdcd80 Compare March 13, 2024 09:44
@Darkheir
Copy link
Contributor Author

Just updated the PR, let's see how the CI goes :-)

Copy link
Member

@apetenchea apetenchea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Change looks good and CI passed.

Because the response body for the "HEAD" method is always empty, as in this case "has" relies solely on return codes, could you please remove the special handing of the 1202 error code and 1200 from the tests? Just so we don't have obsolete code "rotting" in there.

@@ -640,9 +640,11 @@ def response_handler(resp: Response) -> bool:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be no longer needed

tests/test_document.py Outdated Show resolved Hide resolved
@Darkheir
Copy link
Contributor Author

It seems that the CI doesn't run all the tests (no --complete flag).

The complete test suite is failing locally and I still reproduce #15639 on the latest version of ArangoDB...

@apetenchea
Copy link
Member

It seems that the CI doesn't run all the tests (no --complete flag).

The complete test suite is failing locally and I still reproduce #15639 on the latest version of ArangoDB...

Ok, I'll take a look. Could you please let me know the pytest command you used? (to make sure we're talking about the same thing)

@Darkheir
Copy link
Contributor Author

I ran the test with pytest --complete

@apetenchea
Copy link
Member

apetenchea commented Mar 13, 2024

My bad, I ran it without the x-arango-async header. Indeed, the issue is still there 🙄

I will look into the ArangoDB C++ code when I get some time, perhaps I can fix this issue. But I really want this to get through one way or another (worst case we can add an "use_get" parameter to make it work).

I cannot make any promise regarding the timing, but I am keeping this on my list and will get back to you at some point 🙏

@apetenchea
Copy link
Member

Hello @Darkheir,

I fixed the issue from the C++ codebase: arangodb/arangodb#21087

This HEAD approach should work fine, starting from the next ArangoDB minor release. I would like to integrate your PR in the next driver release. I am aware that the next major driver release may still create issues for people using ArangoDB 3.11, in case anyone tries to send an async HEAD request, but that would be a very rare occurrence, to which there is a solution (keeping the driver on version 8, or using get instead of has).

If you're still interested in contributing (no pressure whatsoever), could you please:

  • Update your PR so it gets all the changes from upstream.
  • There is no longer an error_code associated with this type of HEAD request (because there is no body), but we'll rather work with status_code. That could be either 200, 412 or 404. Clean up the code handling of 1202, as it is no longer necessary.
  • Let's get rid of the "in" expressions from the tests (eg assert err.value.error_code in {412, 1200}) and instead use ==. I think it should work just fine, and it looks much simpler.

Again, thanks for your initiative and the laaaarge amount of patience that was needed for this PR 😄

@Darkheir
Copy link
Contributor Author

Darkheir commented Jul 1, 2024

Hi @apetenchea,

Is your fix included in a released version ? So I can make sure what I did is working fine.

Also it means that we would need to tests only for the latest version of ArangoDB and not the previous ones (right now both 3.11 and 3.12 are tested).

A less intrusive fix could be to introduce an intermediate release where depending on the server version a HEAD request is made or a GET one. If we send a GET request we could also raise a warning saying that this will be soon removed and the server should be updated.

@apetenchea
Copy link
Member

apetenchea commented Jul 1, 2024

Is your fix included in a released version ? So I can make sure what I did is working fine.

Not yet, will be included in 3.12.1 once it gets released officially. Even if we merge this now, I won't release the driver until 3.12.1 is out first.

Also it means that we would need to tests only for the latest version of ArangoDB and not the previous ones (right now both 3.11 and 3.12 are tested).

You just need to make sure the current CI passes. I'll take care of the async part, and will try it against the latest version compiled on my machine. But I know it works, I have tried something very similar to your PR already. We don't run the --complete tests here. Thanks for being proactive, but don't worry about that.

A less intrusive fix could be to introduce an intermediate release where depending on the server version a HEAD request is made or a GET one. If we send a GET request we could also raise a warning saying that this will be soon removed and the server should be updated.

I don't think it's worth all the trouble. I like to believe that nobody is sending async HEAD requests, as that wouldn't bring anything but an overly complicated way of looking up documents. The async API is mainly intended for long running tasks (a complex query or a bulk insert), and using it for such a basic lookup would be like using a crane to go grocery shopping. Don't get me wrong, I'm all for backwards compatibility, but in this very specific case, a lot more users would benefit from sending a HEAD request than a GET, whether they're using 3.11 or 3.12. Those who really get into trouble just need to replace has() by get(). On the other hand, if you insist, I don't see any problem with your suggestion.

@Darkheir Darkheir force-pushed the feat/use_head_method_to_check_if_exists branch from dcdcd80 to efbfe7d Compare July 1, 2024 12:28
@Darkheir Darkheir force-pushed the feat/use_head_method_to_check_if_exists branch from ede1a4b to 05eb769 Compare July 1, 2024 12:32
@Darkheir
Copy link
Contributor Author

Darkheir commented Jul 1, 2024

Alright, I made the changes.

Please tell me if everything seems of for you.

@apetenchea
Copy link
Member

apetenchea commented Jul 1, 2024

Looks good to me! Thank you for you contribution!

Also tested it locally with latest ArangoDB and it works. I'll merge it tomorrow. It's finally going to happen 😄

@apetenchea apetenchea merged commit 95b791c into arangodb:main Jul 2, 2024
36 checks passed
@apetenchea apetenchea mentioned this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants