Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: bump polars to 0.44.2 #1271

Merged
merged 8 commits into from
Nov 17, 2024
Merged

feat!: bump polars to 0.44.2 #1271

merged 8 commits into from
Nov 17, 2024

Conversation

eitsupi
Copy link
Collaborator

@eitsupi eitsupi commented Nov 2, 2024

No description provided.

@eitsupi eitsupi marked this pull request as ready for review November 17, 2024 04:03
@eitsupi
Copy link
Collaborator Author

eitsupi commented Nov 17, 2024

It seems that an error is occurring in the following location only on Windows.

url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"
dtypes = list(
"first_name" = pl$Categorical(),
"gender" = pl$Categorical(),
"type" = pl$Categorical(),
"state" = pl$Categorical(),
"party" = pl$Categorical()
)
# dtypes argument
dataset = pl$read_csv(url)$with_columns(pl$col("birthday")$str$strptime(pl$Date, "%Y-%m-%d"))

Execution halted with the following contexts
   0: In R: in pl$read_csv():
   0: During function call [tools::buildVignettes(dir = ".", tangle = TRUE)]
   1: Encountered the following error in Rust-Polars:
      	could not parse `"George Read (American politician, born 1733)"
` as dtype `str` at column 'wikipedia_id
' (column number 36)
      The current offset in the file is 4916 bytes.
      You might want to try:
      - increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
      - specifying correct dtype with the `schema_overrides` argument
      - setting `ignore_errors` to `True`,
      - adding `"George Read (American politician, born 1733)"
` to the `null_values` list.
      Original error: ```invalid csv file
      Field `"George Read (American politician, born 1733)"
` is not properly escaped.```

The cell in the CSV in issue is now this, and a comma can be found there.

wikipedia_id
"George Read (American politician, born 1733)"

@etiennebacher Could you please check if you can reproduce the problem in Windows (R and Python)?

Maybe related to this change: pola-rs/polars#19088

@eitsupi
Copy link
Collaborator Author

eitsupi commented Nov 17, 2024

I have tried the following and it works fine in Python🤔

import ssl
import polars as pl
ssl._create_default_https_context = ssl._create_unverified_context
pl.read_csv("https://theunitedstates.io/congress-legislators/legislators-historical.csv")

Is it possible that there is a problem with the mechanism through which R downloads the csv file?

@eitsupi
Copy link
Collaborator Author

eitsupi commented Nov 17, 2024

I will open a follow-up issue and merge it in for now.

@eitsupi eitsupi merged commit 5737455 into main Nov 17, 2024
31 of 35 checks passed
@eitsupi eitsupi deleted the rs-0.44.2 branch November 17, 2024 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant