Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read_csv #1112

Open
MarcoGorelli opened this issue Oct 1, 2024 · 1 comment
Open

feat: read_csv #1112

MarcoGorelli opened this issue Oct 1, 2024 · 1 comment
Labels
enhancement New feature or request needs discussion

Comments

@MarcoGorelli
Copy link
Member

I was initially hesitant about adding IO methods, the idea being "users provide their own dataframe, we just deal with how to process it", but we already have from_dict, and ImperialCollegeLondon/pycsvy#83 and Temporian look like good use cases for read_csv

pandas and Polars each have dozens of read_csv methods...so we may need to careful here about which ones we add, and perhaps only start with the most common ones

The api would be something like

import pandas as pd
nw.read_csv(file, native_namespace=pd)
import polars as pl
nw.read_csv(file, native_namespace=pl)

We could do:

  • nw.read_csv: this is eager-only and always returns nw.DataFrame
  • nw.scan_csv: this is the most generic one, and returns nw.LazyFrame if possible (e.g. Polars), else nw.DataFrame

Alternatives

Keep the status-quo: users are responsible for doing their own IO

@lucianosrp
Copy link
Member

I would generally prefer to keep narwhals's "just-pass-me-the-df" philosophy.


We could infer which namespace to use based on which module is already imported?

import pandas as pd
import narwhals as nw

df = nw.read_csv("data.csv") # < uses pandas

But then, what to do with the already imported pandas...? If you are importing it, you might as well use it for I/O

The only major reason to have an I/O support (that I can think of) would be if you would want to replace an entire "narwhals workflow/script" with one setting.

Other way I could think of:

nw.set_io_backend("pandas")
df = nw.read_csv("data.csv")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs discussion
Projects
None yet
Development

No branches or pull requests

2 participants