Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the "table_exists" behavior in the REST Catalog #1018

Closed
ndrluis opened this issue Aug 7, 2024 · 8 comments
Closed

Adjust the "table_exists" behavior in the REST Catalog #1018

ndrluis opened this issue Aug 7, 2024 · 8 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@ndrluis
Copy link
Collaborator

ndrluis commented Aug 7, 2024

Apache Iceberg version

0.7.0 (latest release)

Please describe the bug 🐞

Currently, we return True when the status code is 200 or 204, and False for all other status codes. According to the REST specification, we should return False when the catalog returns a 404 status code and raise an error for other status codes.

In the Java implementation, a try/catch block is used with load_table, and when it catches a NoSuchTableError, it currently returns False.

I believe that this behavior is suppressing errors and looks like it's a bug as reported in #1006

@kevinjqliu I noticed that you are looking into the Issue on the Polaris side and I believe that we need to adjust this on our side.

@ndrluis ndrluis added the good first issue Good for newcomers label Aug 7, 2024
@kevinjqliu
Copy link
Contributor

Thanks for looking into this. The behavior you described above seems like the correct one to me.

I believe that this behavior is suppressing errors and looks like it's a bug as reported in #1006

I don't follow this since the expected value should be True. According to the spec, the function returns True only for the 204 status code.
There's a theory in the other thread (apache/polaris#96 (comment)) that the error is due to managed Polaris not implementing the HEAD API, which would cause any HEAD request to return 404.

Reference

@TiansuYu
Copy link
Contributor

TiansuYu commented Aug 23, 2024

@ndrluis @kevinjqliu
I can contribute to this. Would you say the intended behavior should be:

  • 204 return True
  • 404 NoSuchTableException
  • others: raise a corresponding response error

@ndrluis
Copy link
Collaborator Author

ndrluis commented Aug 23, 2024

Does it make sense for a method suffixed with "exists" to never return false? As I mentioned, and as it is in the Java implementation, when a 404 status is returned, it should indicate false.

However, as @kevinjqliu pointed out, we might receive a 404 when the HEAD method is not implemented. I believe we should expect the Specification to be implemented as defined, and if it's not, it's a bug on the catalog side and not on ours.

Therefore, I believe we can follow the Java implementation in the sense that it returns false when a 404 status is encountered, but we should maintain the use of the HEAD call because it is expected to exist according to the Catalog spec.

@kevinjqliu @TiansuYu @Fokko @sungwy @HonahX What are your thoughts?

@sungwy
Copy link
Collaborator

sungwy commented Aug 23, 2024

Hi @TiansuYu and @ndrluis - thank you for bringing up this point, and sorry for not getting around to looking at this earlier.

Similar to what @TiansuYu suggested, I'm of the opinion that we should do the following:

  • 200, 204: return True
  • 404: return False
  • others, raise a corresponding response error.

The reason is, because there's a numerous different factors as to whether a REST endpoint will return a non non 204/404 response, and it would be erroneous for us to return false on any other status code. If users are relying on this endpoint to return affirmatively say that the table exists or not for their use case, and if the REST catalog returns a 5xx error due to an unknown reason, having that be interpreted as False is error-prone

@sungwy
Copy link
Collaborator

sungwy commented Aug 23, 2024

@ndrluis @kevinjqliu I can contribute to this. Would you say the intended behavior should be:

  • 204 return True
  • 404 NoSuchTableException
  • others: raise a corresponding response error

It looks like @ndrluis raised this issue, and has it assigned this to himself. Maybe we could help review his code :)

@Fokko
Copy link
Contributor

Fokko commented Aug 23, 2024

Thanks everyone for bringing this up.

Long term, I think we should leverage the endpoint discovery to see if the server provides the capability of table-exists: https://lists.apache.org/thread/8h86382omdx9cmvc15m2bf361p5rz4rk

Since the table-exists is in there from the beginning, I think we should follow the spec, and assume that:

  • 204: Table exists.
  • 404: Table doesn't exist.
  • Raise an exception for the rest, similar to other operations.

The HEAD operation does not require the REST Catalog to fetch the metadata, which will consume less resources on the server, but probably also return much faster for the client.

I think need to update the Java implementation to use the HEAD request, this will also uncover the issue on the Java side. For validation, hopefully, we can check this properly using apache/iceberg#10908 once that is in.

@Fokko
Copy link
Contributor

Fokko commented Aug 23, 2024

I checked the Java side, and I could not find the HEAD request. For raising visibility, I created an issue there: apache/iceberg#10993

@kevinjqliu
Copy link
Contributor

@ndrluis can this be closed now that #1096 is merged?

@ndrluis ndrluis closed this as completed Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants