We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
If I try to compile and run the example given in the documentation for parquet::column, the assertions at the end fail.
parquet::column
To Reproduce
cargo build with the following in main.rs:
cargo build
main.rs
use std::fs; use parquet::column::reader::ColumnReader; use parquet::file::reader::FileReader; use parquet::file::serialized_reader::SerializedFileReader; use parquet::data_type::Int32Type; use parquet::file::writer::SerializedFileWriter; use std::sync::Arc; use std::path::Path; use parquet::schema::parser::parse_message_type; fn main() { let path = Path::new("column_sample.parquet"); // Writing data using column writer API. let message_type = " message schema { optional group values (LIST) { repeated group list { optional INT32 element; } } } "; let schema = Arc::new(parse_message_type(message_type).unwrap()); let file = fs::File::create(path).unwrap(); let mut writer = SerializedFileWriter::new(file, schema, Default::default()).unwrap(); let mut row_group_writer = writer.next_row_group().unwrap(); while let Some(mut col_writer) = row_group_writer.next_column().unwrap() { col_writer .typed::<Int32Type>() .write_batch(&[1, 2, 3], Some(&[3, 3, 3, 2, 2]), Some(&[0, 1, 0, 1, 1])) .unwrap(); col_writer.close().unwrap(); } row_group_writer.close().unwrap(); writer.close().unwrap(); // Reading data using column reader API. let file = fs::File::open(path).unwrap(); let reader = SerializedFileReader::new(file).unwrap(); let metadata = reader.metadata(); let mut values = vec![0; 8]; let mut def_levels = vec![0; 8]; let mut rep_levels = vec![0; 8]; for i in 0..metadata.num_row_groups() { let row_group_reader = reader.get_row_group(i).unwrap(); let row_group_metadata = metadata.row_group(i); for j in 0..row_group_metadata.num_columns() { let mut column_reader = row_group_reader.get_column_reader(j).unwrap(); match column_reader { // You can also use `get_typed_column_reader` method to extract typed reader. ColumnReader::Int32ColumnReader(ref mut typed_reader) => { let (records, values, levels) = typed_reader.read_records( 8, // maximum records to read Some(&mut def_levels), Some(&mut rep_levels), &mut values, ).unwrap(); assert_eq!(records, 2); assert_eq!(levels, 5); assert_eq!(values, 3); } _ => {} } } } assert_eq!(values, vec![1, 2, 3, 0, 0, 0, 0, 0]); assert_eq!(def_levels, vec![3, 3, 3, 2, 2, 0, 0, 0]); assert_eq!(rep_levels, vec![0, 1, 0, 1, 1, 0, 0, 0]); }
Expected behavior
The assertions should be correct. I'm surprised that the examples in the documentation don't compile and run as part of the test suite.
Additional context
This line https://github.com/apache/arrow-rs/blob/master/parquet/src/column/reader/decoder.rs#L236 appears to be the culprit, in that it resizes the values vector, and then only passes the new part of the vector to self.decoder.as_mut().unwrap().read(). This was changed 3 months ago as part of #5177.
values
self.decoder.as_mut().unwrap().read()
The text was updated successfully, but these errors were encountered:
Correct example code for column (apache#5560)
8aaf188
Correct example code for column (#5560) (#5561)
51ea388
Co-authored-by: Zach Gershkoff <[email protected]>
label_issue.py automatically added labels {'parquet'} from #5561
label_issue.py
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
Describe the bug
If I try to compile and run the example given in the documentation for
parquet::column
, the assertions at the end fail.To Reproduce
cargo build
with the following inmain.rs
:Expected behavior
The assertions should be correct. I'm surprised that the examples in the documentation don't compile and run as part of the test suite.
Additional context
This line https://github.com/apache/arrow-rs/blob/master/parquet/src/column/reader/decoder.rs#L236 appears to be the culprit, in that it resizes the
values
vector, and then only passes the new part of the vector toself.decoder.as_mut().unwrap().read()
. This was changed 3 months ago as part of #5177.The text was updated successfully, but these errors were encountered: