Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into provide-access-to…
Browse files Browse the repository at this point in the history
…-inner-parquet-writers
  • Loading branch information
tustvold committed Mar 8, 2024
2 parents 7609ed3 + 79634c0 commit 4b5a5c5
Show file tree
Hide file tree
Showing 33 changed files with 292 additions and 142 deletions.
58 changes: 53 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,31 @@ export ARROW_TEST_DATA=$(cd ../testing/data; pwd)

From here on, this is a pure Rust project and `cargo` can be used to run tests, benchmarks, docs and examples as usual.

### Running the tests
## Running the tests

Run tests using the Rust standard `cargo test` command:

```bash
# run all tests.
# run all unit and integration tests
cargo test


# run only tests for the arrow crate
# run tests for the arrow crate
cargo test -p arrow
```

For some changes, you may want to run additional tests. You can find up-to-date information on the current CI tests in [.github/workflows](https://github.com/apache/arrow-rs/tree/master/.github/workflows). Here are some examples of additional tests you may want to run:

```bash
# run tests for the parquet crate
cargo test -p parquet

# run arrow tests with all features enabled
cargo test -p arrow --all-features

# run the doc tests
cargo test --doc
```

## Code Formatting

Our CI uses `rustfmt` to check code formatting. Before submitting a
Expand All @@ -118,10 +130,19 @@ cargo +stable fmt --all -- --check

We recommend using `clippy` for checking lints during development. While we do not yet enforce `clippy` checks, we recommend not introducing new `clippy` errors or warnings.

Run the following to check for clippy lints.
Run the following to check for `clippy` lints:

```bash
# run clippy with default settings
cargo clippy

```

More comprehensive `clippy` checks can be run by adding flags:

```bash
# run clippy on the arrow crate with all features enabled, targeting all tests, examples, and benchmarks
cargo clippy -p arrow --all-features --all-targets
```

If you use Visual Studio Code with the `rust-analyzer` plugin, you can enable `clippy` to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.
Expand All @@ -134,6 +155,33 @@ Search for `allow(clippy::` in the codebase to identify lints that are ignored/a
- If you have several lints on a function or module, you may disable the lint on the function or module.
- If a lint is pervasive across multiple modules, you may disable it at the crate level.

## Running Benchmarks

Running benchmarks are a good way to test the performance of a change. As benchmarks usually take a long time to run, we recommend running targeted tests instead of the full suite.

```bash
# run all benchmarks
cargo bench

# run arrow benchmarks
cargo bench -p arrow

# run benchmark for the parse_time function within the arrow-cast crate
cargo bench -p arrow-cast --bench parse_time
```

To set the baseline for your benchmarks, use the --save-baseline flag:

```bash
git checkout master

cargo bench --bench parse_time -- --save-baseline master

git checkout feature

cargo bench --bench parse_time -- --baseline master
```

## Git Pre-Commit Hook

We can use [git pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) to automate various kinds of git pre-commit checking/formatting.
Expand Down
5 changes: 4 additions & 1 deletion arrow-array/src/array/primitive_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1557,7 +1557,10 @@ mod tests {
// roundtrip to and from datetime
assert_eq!(
1550902545147,
arr.value_as_datetime(i).unwrap().timestamp_millis()
arr.value_as_datetime(i)
.unwrap()
.and_utc()
.timestamp_millis()
);
} else {
assert!(arr.is_null(i));
Expand Down
5 changes: 5 additions & 0 deletions arrow-array/src/record_batch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,11 @@ impl RecordBatch {
self.schema.clone()
}

/// Returns a reference to the [`Schema`] of the record batch.
pub fn schema_ref(&self) -> &SchemaRef {
&self.schema
}

/// Projects the schema onto the specified columns
pub fn project(&self, indices: &[usize]) -> Result<RecordBatch, ArrowError> {
let projected_schema = self.schema.project(indices)?;
Expand Down
46 changes: 25 additions & 21 deletions arrow-array/src/temporal_conversions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,20 +43,21 @@ pub const EPOCH_DAYS_FROM_CE: i32 = 719_163;
/// converts a `i32` representing a `date32` to [`NaiveDateTime`]
#[inline]
pub fn date32_to_datetime(v: i32) -> Option<NaiveDateTime> {
NaiveDateTime::from_timestamp_opt(v as i64 * SECONDS_IN_DAY, 0)
Some(DateTime::from_timestamp(v as i64 * SECONDS_IN_DAY, 0)?.naive_utc())
}

/// converts a `i64` representing a `date64` to [`NaiveDateTime`]
#[inline]
pub fn date64_to_datetime(v: i64) -> Option<NaiveDateTime> {
let (sec, milli_sec) = split_second(v, MILLISECONDS);

NaiveDateTime::from_timestamp_opt(
let datetime = DateTime::from_timestamp(
// extract seconds from milliseconds
sec,
// discard extracted seconds and convert milliseconds to nanoseconds
milli_sec * MICROSECONDS as u32,
)
)?;
Some(datetime.naive_utc())
}

/// converts a `i32` representing a `time32(s)` to [`NaiveDateTime`]
Expand Down Expand Up @@ -130,45 +131,48 @@ pub fn time_to_time64ns(v: NaiveTime) -> i64 {
/// converts a `i64` representing a `timestamp(s)` to [`NaiveDateTime`]
#[inline]
pub fn timestamp_s_to_datetime(v: i64) -> Option<NaiveDateTime> {
NaiveDateTime::from_timestamp_opt(v, 0)
Some(DateTime::from_timestamp(v, 0)?.naive_utc())
}

/// converts a `i64` representing a `timestamp(ms)` to [`NaiveDateTime`]
#[inline]
pub fn timestamp_ms_to_datetime(v: i64) -> Option<NaiveDateTime> {
let (sec, milli_sec) = split_second(v, MILLISECONDS);

NaiveDateTime::from_timestamp_opt(
let datetime = DateTime::from_timestamp(
// extract seconds from milliseconds
sec,
// discard extracted seconds and convert milliseconds to nanoseconds
milli_sec * MICROSECONDS as u32,
)
)?;
Some(datetime.naive_utc())
}

/// converts a `i64` representing a `timestamp(us)` to [`NaiveDateTime`]
#[inline]
pub fn timestamp_us_to_datetime(v: i64) -> Option<NaiveDateTime> {
let (sec, micro_sec) = split_second(v, MICROSECONDS);

NaiveDateTime::from_timestamp_opt(
let datetime = DateTime::from_timestamp(
// extract seconds from microseconds
sec,
// discard extracted seconds and convert microseconds to nanoseconds
micro_sec * MILLISECONDS as u32,
)
)?;
Some(datetime.naive_utc())
}

/// converts a `i64` representing a `timestamp(ns)` to [`NaiveDateTime`]
#[inline]
pub fn timestamp_ns_to_datetime(v: i64) -> Option<NaiveDateTime> {
let (sec, nano_sec) = split_second(v, NANOSECONDS);

NaiveDateTime::from_timestamp_opt(
let datetime = DateTime::from_timestamp(
// extract seconds from nanoseconds
sec, // discard extracted seconds
nano_sec,
)
)?;
Some(datetime.naive_utc())
}

#[inline]
Expand All @@ -179,13 +183,13 @@ pub(crate) fn split_second(v: i64, base: i64) -> (i64, u32) {
/// converts a `i64` representing a `duration(s)` to [`Duration`]
#[inline]
pub fn duration_s_to_duration(v: i64) -> Duration {
Duration::seconds(v)
Duration::try_seconds(v).unwrap()
}

/// converts a `i64` representing a `duration(ms)` to [`Duration`]
#[inline]
pub fn duration_ms_to_duration(v: i64) -> Duration {
Duration::milliseconds(v)
Duration::try_milliseconds(v).unwrap()
}

/// converts a `i64` representing a `duration(us)` to [`Duration`]
Expand Down Expand Up @@ -272,57 +276,57 @@ mod tests {
date64_to_datetime, split_second, timestamp_ms_to_datetime, timestamp_ns_to_datetime,
timestamp_us_to_datetime, NANOSECONDS,
};
use chrono::NaiveDateTime;
use chrono::DateTime;

#[test]
fn negative_input_timestamp_ns_to_datetime() {
assert_eq!(
timestamp_ns_to_datetime(-1),
NaiveDateTime::from_timestamp_opt(-1, 999_999_999)
DateTime::from_timestamp(-1, 999_999_999).map(|x| x.naive_utc())
);

assert_eq!(
timestamp_ns_to_datetime(-1_000_000_001),
NaiveDateTime::from_timestamp_opt(-2, 999_999_999)
DateTime::from_timestamp(-2, 999_999_999).map(|x| x.naive_utc())
);
}

#[test]
fn negative_input_timestamp_us_to_datetime() {
assert_eq!(
timestamp_us_to_datetime(-1),
NaiveDateTime::from_timestamp_opt(-1, 999_999_000)
DateTime::from_timestamp(-1, 999_999_000).map(|x| x.naive_utc())
);

assert_eq!(
timestamp_us_to_datetime(-1_000_001),
NaiveDateTime::from_timestamp_opt(-2, 999_999_000)
DateTime::from_timestamp(-2, 999_999_000).map(|x| x.naive_utc())
);
}

#[test]
fn negative_input_timestamp_ms_to_datetime() {
assert_eq!(
timestamp_ms_to_datetime(-1),
NaiveDateTime::from_timestamp_opt(-1, 999_000_000)
DateTime::from_timestamp(-1, 999_000_000).map(|x| x.naive_utc())
);

assert_eq!(
timestamp_ms_to_datetime(-1_001),
NaiveDateTime::from_timestamp_opt(-2, 999_000_000)
DateTime::from_timestamp(-2, 999_000_000).map(|x| x.naive_utc())
);
}

#[test]
fn negative_input_date64_to_datetime() {
assert_eq!(
date64_to_datetime(-1),
NaiveDateTime::from_timestamp_opt(-1, 999_000_000)
DateTime::from_timestamp(-1, 999_000_000).map(|x| x.naive_utc())
);

assert_eq!(
date64_to_datetime(-1_001),
NaiveDateTime::from_timestamp_opt(-2, 999_000_000)
DateTime::from_timestamp(-2, 999_000_000).map(|x| x.naive_utc())
);
}

Expand Down
Loading

0 comments on commit 4b5a5c5

Please sign in to comment.