- Chunks arrays in high-level writer so memory limit is not reached
(see option chunkbytes(); default si
2^30
). This fix is compatible with the arrow lib version in conda;dev-chunked
is compatible with the github version.
- Adds progress() indicator to
parquet write/save
(still not sure how to make it show 100% synchronously).
environment.yml
now pulls fromconda-forge
. Hence this also updatesLogicalType::None
toConvertedType::NONE
andset_num_threads
toset_use_threads
for arrow-cpp version 0.14
- Fixes tests so they run even if python hive creation fails.
- Updated install instructions in README.
- Fixes install script and instructions to be compatible with latest conda version and corresponding packages.
- Much faster read times with a relatively large number of variables. Previous versions allocated the number of observations and then the columns in Stata. Doing the reverse is orders of magnitudes faster.
progress(x)
option displays progress everyx
seconds.
parquet desc
allows the user to glean the contents of a parquet file.
- Closes #17.
parquet use, rg()
allows the user to specify the row groups to read. For parquet datasets, each dataset in the folder is a group.
- If fewer observations are read than the number expected, make a note of it and only keep as many observations as were read.
parquet save
now acceptsif
parquet use
can concatenate every parquet file in a directory if a directory is passed instead of a file.
highlevel
reader can use optionin()
, but it is not as efficient.
- reader scans the entire string column by default, which might be slower but I prefer lossless reads in this case.
lowlevel
back in; can use optionin()
to read in range usinglowlevel
.
lowlevel
writer does not support missing values; user receives warning with optionlowlevel
and error if missing values are encountered.
- High-level reader now supports missing values.
- The user is warned that extended missing values are coerced to NaN.
lowlevel
deprecated; moved todebug_lowlevel
.- User is warned if the data is empty (no obs).
- Added column selector to file reader.
- High-level file reader. Faster but might suffer from a loss of generality.
- Scan string fields (ByteArray) to figure out the maximum length.
Optionally specify
nostrscan
to fall back tostrbuffer
; optionally specifystrscan(.)
to scan the entire column.
- Some FixedLen variable support using the low-level reader and writer.
- Install from scratch is now working on NBER severs.
- Read and write parquet files via
parquet read
andparquet write