New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Flights 3m only has 200k rows #607

Open

domoritz opened this issue Sep 19, 2024 · 1 comment

Member

domoritz commented Sep 19, 2024 •

edited

Loading

https://github.com/vega/vega-datasets/blob/main/data/flights-3m.csv seems to only have 200k rows.

wc -l flights-3m.csv
  231084 flights-3m.csv

Added in 1e70098 by @arvind

The text was updated successfully, but these errors were encountered:

Contributor

dsmedia commented Sep 20, 2024

Looks like the count in flights_200k may also be off.

from vega_datasets import data

datasets = ['flights_2k', 'flights_5k', 'flights_10k', 'flights_20k', 'flights_200k', 'flights_3m']

for dataset_name in datasets:
    dataset = getattr(data, dataset_name)()
    row_count = len(dataset)
    print(f"{dataset_name}: {row_count} rows")

Results:

flights_2k: 2000 rows
flights_5k: 5000 rows
flights_10k: 10000 rows
flights_20k: 20000 rows
flights_200k: 231083 rows
flights_3m: 231083 rows

We can regenerate 3m rows using this script, create a csv from the 3m parquet file here or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment