Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Hotfix] Change to streaming reader for CSV schema inference. (#1471)
This PR leverages pyarrow's streaming CSV reader for schema inference; instead of reading and parsing the entire CSV file to fetch the schema, this will fetch the schema from the first "block" read by the streaming reader. The block size is configurable at the pyarrow level as part of the CSV `ReadOptions` (although we don't currently expose this to the user), with a [default of 1 MB](https://github.com/apache/arrow/blob/5ad1cae024a0f3bc67ac49fa6d4d72d36afb2384/cpp/src/arrow/csv/options.h#L144-L149).
- Loading branch information