Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Road to v0.2 #154

Closed
18 of 20 tasks
matthewmturner opened this issue Sep 20, 2024 · 11 comments
Closed
18 of 20 tasks

Road to v0.2 #154

matthewmturner opened this issue Sep 20, 2024 · 11 comments

Comments

@matthewmturner
Copy link
Collaborator Author

@alamb let me know if there is anything you would like to add

@alamb
Copy link
Contributor

alamb commented Sep 21, 2024

I think we should consider adding #148 and #158 as well

@matthewmturner
Copy link
Collaborator Author

So I think it turns out that the DDL issue I have is a bug with datafusion registering external tables to a schema.

I have this in my ~/.datafusion/.datafusionrc`

CREATE SCHEMA staging;

CREATE TABLE staging.foo AS VALUES (1);
CREATE EXTERNAL TABLE staging.min_aggs STORED AS PARQUET LOCATION 'ny2://atlas/sip/aggregates/minute_by_ticker_monthly_v2/year_month=2015-08/data.01.parquet';

But when I run SHOW TABLES I get the following:

image

Notice how the schema isnt picked up for the the external table. I dont see anything in the docs for CREATE EXTERNAL TABLE that would suggest this shouldnt work but maybe im misunderstanding.

@alamb do you have a view on this? If you agree its a bug in datafusion then i can create a ticket there.

@alamb
Copy link
Contributor

alamb commented Sep 24, 2024

I agree it is a datafusion bug. I will file a ticket

@alamb
Copy link
Contributor

alamb commented Sep 24, 2024

Filed apache/datafusion#12607

@Kinrany
Copy link

Kinrany commented Dec 27, 2024

Is there anything that the current trunk version does worse than v0.1?

@matthewmturner
Copy link
Collaborator Author

@Kinrany i dont think so although i havent used v0.1 in a long time (years). i took a brief pause for december to work on some fun not work related project but im about to get back into dft in the beginning of the new year. im going to review where things are again and potentially do a release early on before proceeding to the next round of features i have in mind.

@matthewmturner
Copy link
Collaborator Author

I am going to work on #257 and then will do 0.2 release

@matthewmturner
Copy link
Collaborator Author

I was actually just poking around OpenDAL (specifically it's ObjectStore implementation) and it may actually be very easy to integrate hugging face - so i might see if it's as easy as im expecting and if so i will include that as well (basically the code in the example i linked but use the services-huggingface instead of services-s3).

@matthewmturner
Copy link
Collaborator Author

I was able to add the huggingface integration and ive started prepping for 0.2 release. however, im currently pinned to a specific git commit for the hudi feature and you cant publish versions using that. it looks like the hudi team is close to a new version but im not sure how long that will take.

i might see if i can use the FFI table provider interface instead so i prevent these version issues from coming up in the future.

@matthewmturner
Copy link
Collaborator Author

I ended up removing the hudi feature for now so i could move forward with the 0.2 release - there are several other features other than the FFI table provider i would like to work on although i do plan to get back to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants