Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support uploading Parquet files #130

Open
5 tasks
calpaterson opened this issue Jun 11, 2024 · 8 comments
Open
5 tasks

Support uploading Parquet files #130

calpaterson opened this issue Jun 11, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@calpaterson
Copy link
Owner

Brief overview

AS A user of a dataframe library like Pandas/Polars/etc

I WANT to be able to upload my Parquet dataset

SO THAT I can skip all the nonsense around csv inference

Additional details

To support this we need to support an analogous type for each type in the Parquet format. Some notables currently missing

  • Datetime
  • Bytes/bytearrays - necessary for UUID as well
  • Enum (we're allowed to parse as strings if necessary)
  • Time
  • Interval

Perhaps we could get away without supporting everything to start with, but without at least datetime and probably bytes there would be no real benefit to claiming any kind of support

@calpaterson calpaterson added the enhancement New feature or request label Jun 11, 2024
@thedatadavis
Copy link

Dupe of #99 ?

@calpaterson
Copy link
Owner Author

Dupe of #99 ?

It absolutely is, yes - my mistake. :)

I've closed the other issue as this has slightly more detail on the types csvbase is missing.

@Max1Truc
Copy link

What about converting unsupported datatypes to STRING, until they are all supported?

@calpaterson
Copy link
Owner Author

What about converting unsupported datatypes to STRING, until they are all supported?

That's actually not a bad idea. We could mark it as experimental or something meanwhile.

Would that help you use csvbase for your usecase?

@Max1Truc
Copy link

Sure, as my data source converts everything to strings anyway :P

@calpaterson
Copy link
Owner Author

Ok, I think this can be moved up then. I'll try to have a go next week

@Max1Truc
Copy link

Would you consider it a good first issue for new contributors?

If I do not need to know too much about the codebase to make the change I would gladly have a shot at it.

@calpaterson
Copy link
Owner Author

Would you consider it a good first issue for new contributors?

If I do not need to know too much about the codebase to make the change I would gladly have a shot at it.

Hmm, probably not as it requires both a fair amount of knowledge and also involves making a load of design decisions.

Probably the best first changes are stuff related to getting it working locally for you. Many people use docker (but I don't so I don't discover the problems there). Does the docker container work for you? Can you think of any ways to improve it? Can you remove tini and thereby possibly resolve #126?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants