Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register a MIME type for the Parquet format. #381

Open
asfimport opened this issue Jul 28, 2020 · 15 comments
Open

Register a MIME type for the Parquet format. #381

asfimport opened this issue Jul 28, 2020 · 15 comments

Comments

@asfimport
Copy link
Collaborator

There is currently  no MIME type registered for Parquet.  Perhaps this is intentional.

If it is not intentional, I suggest steps be taken to register a MIME type with IANA.

 

https://www.iana.org/assignments/media-types/media-types.xhtml

 

Reporter: Mark Wood

Note: This issue was originally created as PARQUET-1889. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Thomas Champagne:
Any news on a possible registration of a MIME type for the parquet format ?

I propose :)

 application/parquet

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
application/parquet would be cool but might be a bit of a challenge. The root namespace is technically reserved for IETF standards or recognition from a "standards related organization" (whatever that means). application/vnd.apache.parquet would probably be pretty trivial to register though and would be similar to application/vnd.apache.thrift.xyz and application/vnd.apache.arrow.xyz

@asfimport
Copy link
Collaborator Author

Xinli Shang / @shangxinli:
+1 on @westonpace's point

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
It looks like a request to IANA to register application/vnd.apache.parquet was submitted sometime early in 2023, as evidenced by this entry in the Parquet dev ML: https://lists.apache.org/thread/lrfsjhzoq20o95z5zn9zyrb8rdolqzz7. It looks like IANA has requested changes on the initial application so I'll keep an eye on the ML and update here when we can close this.

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
I've just submitted a request to IANA for application/vnd.apache.parquet. I'll update in this thread as that progresses.

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
The request has been received, given a ticket number of 1358674 with IANA, and sent off for a review.

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
The registration is done, and is now available: https://www.iana.org/assignments/media-types/application/vnd.apache.parquet.

I think it would be good if someone from the Parquet PMC could forward a note about this to the entire PMC asking for a quick review. I did my best filling in everything but would appreciate a review of the entire registration but specifically the "Security considerations" and "Interoperability considerations" portions. Any comments can be directed either here or to me at [email protected] and I will update the registration accordingly.

@asfimport
Copy link
Collaborator Author

Gang Wu / @wgtmac:
@amoeba Thanks for your effort! Does it support encrypted parquet file? Or should it be a separate MIME type?

 

cc PMCs for more advise [~[email protected]] @gszadovszky @ggershinsky  

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
Hi @wgtmac,

When you say "encrypted parquet file", do you mean Parquet files that use Parquet's modular encryption? I think this media type registration covers that case so another media type (or profile) wouldn't be needed. I made mention of this use case for Parquet in the registration. That said, I would like to consider your question thoroughly. Are there any media types that have a similar mechanism to Parquet's we could look at?

@asfimport
Copy link
Collaborator Author

Gang Wu / @wgtmac:
I'm not familiar with other media types. From your registration link I found the following words:

Additional information: 2. Magic number(s): PAR1
The reason I asked about encryption is that encrypted file has a different magic number PARE.

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
ah, I didn't know that. You might be right then, and thanks for catching it. I'll look into it further to see whether another media type makes sense and I welcome any others' thoughts on it too.

@asfimport
Copy link
Collaborator Author

Gidon Gershinsky / @ggershinsky:
Agreed,
Additional information: 2. Magic number(s): PAR1
should be
Additional information: 2. Magic number(s): PAR1, PARE

(encrypted parquet files can have either magic number, depending on the encryption mode).

Otherwise, LGTM.

@asfimport
Copy link
Collaborator Author

Gabor Szadovszky / @gszadovszky:
I agree with @ggershinsky's suggestion. LGTM, otherwise.

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
Thanks all. I've submitted a change request to IANA to add the extra magic number. I'll update here when that change is active.

@asfimport
Copy link
Collaborator Author

Bryce Mecum / @amoeba:
The registration has been updated to include PARE was a possible magic number as discussed here. See https://www.iana.org/assignments/media-types/application/vnd.apache.parquet.

I think this can be closed. Thanks everyone for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants