Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support reading form csv file #550

Closed
wants to merge 3 commits into from

Conversation

infdahai
Copy link

@infdahai infdahai commented Mar 25, 2023

issue: #404

status: (don't need to review, still be coding)

material impl ref: duckdb, risinglight
copy syntax specification: copy statement

expected behavior:

copy a from 'b.csv';

task:

  1. add physical copy file plan [x]
    get bind_type and create file copy plan.(file path,file format)
  2. add copy file executor. [x]
    csv file: read file based on path and get columns.
  3. from csv to table [x]
  4. add copy command[x].

@infdahai infdahai changed the title try to support reading form csv file feat: support reading form csv file Mar 26, 2023
@infdahai
Copy link
Author

infdahai commented Mar 26, 2023

duckdb sql just supports copy a from b.csv; and the table a is a table with determinate types in catalog stage. Because we don't know the types in csv file except for specifying the test csv format in bustub, such as (bigint, bigint, bigint).

so we can only use 'create table' and then 'copy from' to do this task.

Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall on the right direction, good work!

CMakeLists.txt Outdated
@@ -311,6 +311,7 @@ add_custom_target(submit-p2
)

set(P3_FILES
"src/include/execution/executors/copy_file_executor.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not include in submission files, because it will not be modified by students

private:
const PhysicalCopyFileNode *plan_;
}
} // namespace bustub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably need to add a new line to avoid lint warnings


protected:
auto PlanNodeToString() const -> std::string override {
return fmt::format("PhysicalCopy {{ file_path={} }}", path_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BusTub we don't add Physical to the plan node name

namespace bustub {
class BaseCsvReader {
public:
BaseCsvReader();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no implementation yet? 🤪

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I still read the duckdb code and I'm a little slow to catch on that.

1. build error => copyfrom executor needs to insert table heap.

Signed-off-by: clundro <[email protected]>
@infdahai infdahai marked this pull request as draft March 27, 2023 14:15
@infdahai infdahai closed this Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants