Skip to content

Commit

Permalink
feat!: timestamp literal support (#28)
Browse files Browse the repository at this point in the history
# RFC 3339-compliant Timestamp Parsing

To ensure our timestamp parsing aligns closely with RFC 3339 standards,
the following tests have been proposed to verify each aspect of the
timestamp formatting and parsing process:

## Current Capabilities
- Date and Time with Timezone:
  - [x] `2009-01-03T18:15:05Z` (UTC timezone)
  - [x] `2009-01-03T18:15:05+02:30` (Positive timezone offset)
  - [x] `2009-01-03T18:15:05-07:00` (Negative timezone offset)
- Date and Time with Fractional Seconds:
  - [x] `2009-01-03T18:15:05.123Z` (Milliseconds)
  - [x] `2009-01-03T18:15:05.123456Z` (Microseconds)
  - [x] `2009-01-03T18:15:05.123456789Z` (Nanoseconds)

## Unix Epoch Time Parsing Capability

This module includes functionality for parsing timestamps represented as
time units (e.g., seconds) since the Unix epoch (January 1, 1970, at
00:00:00 UTC). This allows for direct integration and manipulation of
time data sourced from systems that utilize Unix time (POSIX time).

### Features:

- **Unix Timestamp Parsing**: Capable of interpreting strings or numeric
values representing seconds since the Unix epoch and converting them
into a standard datetime format.
- **UTC Alignment**: All parsed Unix timestamps are automatically
aligned to UTC, ensuring consistency across different time-related
operations.

### Example Usage:

```rust
// Parsing an RFC 3339 timestamp without a timezone:
let timestamp_str = "2009-01-03T18:15:05Z";
let intermediate_timestamp = IntermediateTimestamp::try_from(timestamp_str).unwrap();
assert_eq!(intermediate_timestamp.timezone, IntermediateTimeZone::Utc);

// Parsing an RFC 3339 timestamp with a positive timezone offset:
let timestamp_str_with_tz = "2009-01-03T18:15:05+03:00";
let intermediate_timestamp = IntermediateTimestamp::try_from(timestamp_str_with_tz).unwrap();
assert_eq!(intermediate_timestamp.timezone, IntermediateTimeZone::FixedOffset(10800)); // 3 hours in seconds

// Parsing a Unix epoch timestamp (assumed to be seconds and UTC):
let unix_time_str = "1231006505";
let intermediate_timestamp = IntermediateTimestamp::to_timestamp(unix_time_str).unwrap();
assert_eq!(intermediate_timestamp.timezone, IntermediateTimeZone::Utc);
```

# Tests for RFC 3339 Compliance

- [x] **Test UTC Timezone Parsing**  
Ensure proper parsing of timestamps with the UTC timezone designator
(`Z`).
- [x] **Test Positive Timezone Offset**  
  Ensure timestamps with positive timezone offsets are parsed correctly.
- [x] **Test Negative Timezone Offset**  
  Ensure timestamps with negative timezone offsets are parsed correctly.
- [x] **Test Zero Timezone Offset**  
Validate parsing of timestamps where timezone is explicitly set to UTC
with `+00:00`.
- [x] **Test Unix Epoch Time Timezone**  
  Verify that Unix epoch timestamps are assumed to be in UTC.
- [x] **Test Unix Epoch Timestamp Parsing**  
  Check parsing of Unix epoch timestamps from string representations.
- [x] **Test Basic RFC 3339 Timestamp**  
Confirm basic parsing of RFC 3339 compliant timestamps with no timezone
offset specified.
- [x] **Test RFC 3339 Timestamp with Positive Offset**  
  Test parsing of timestamps with positive timezone offsets.
- [x] **Test RFC 3339 Timestamp with Negative Offset**  
  Test parsing of timestamps with negative timezone offsets.
- [x] **Test RFC 3339 Timestamp with UTC Designator**  
  Confirm parsing of timestamps with the UTC designator (`Z`).
- [x] **Test Invalid RFC 3339 Timestamp**  
  Ensure that non-compliant strings are not parsed as valid timestamps.
- [x] **Test Timestamp with Seconds Precision**  
  Confirm that timestamps with seconds precision are parsed correctly.
- [x] **Test RFC 3339 Timestamp with Milliseconds**  
  Validate parsing of timestamps that include millisecond precision.
- [x] **Test RFC 3339 Timestamp with Microseconds**  
  Validate parsing of timestamps that include microsecond precision.
- [x] **Test RFC 3339 Timestamp with Nanoseconds**  
  Validate parsing of timestamps that include nanosecond precision.
- [x] **Test General Parsing Error**  
  Check handling of malformed timestamp inputs.
- [x] **Test Basic Date-Time Support**  
  Ensure basic RFC 3339 formatted date-times are parsed correctly.
- [x] **Test Leap Seconds Handling**  
  Verify that leap seconds are handled correctly in timestamps.
- [x] **Test Rejection of Incorrect Formats**  
  Ensure that incorrect timestamp formats are properly rejected.
  • Loading branch information
Dustin-Ray authored Jul 9, 2024
1 parent 14002ce commit 8b82373
Show file tree
Hide file tree
Showing 34 changed files with 1,096 additions and 406 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ bytemuck = {version = "1.14.2" }
byte-slice-cast = { version = "1.2.1" }
clap = { version = "4.5.4" }
criterion = { version = "0.5.1" }
chrono-tz = {version = "0.9.0", features = ["serde"]}
chrono = { version = "0.4.38" }
curve25519-dalek = { version = "4", features = ["rand_core"] }
derive_more = { version = "0.99" }
dyn_partial_eq = { version = "0.1.2" }
Expand Down
2 changes: 2 additions & 0 deletions crates/proof-of-sql-parser/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@ doctest = true
test = true

[dependencies]
arrow = { workspace = true }
arrayvec = { workspace = true, features = ["serde"] }
bigdecimal = { workspace = true }
chrono = { workspace = true, features = ["serde"] }
lalrpop-util = { workspace = true, features = ["lexer", "unicode"] }
serde = { workspace = true, features = ["serde_derive"] }
thiserror = { workspace = true }
Expand Down
41 changes: 41 additions & 0 deletions crates/proof-of-sql-parser/src/error.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use serde::{Deserialize, Serialize};
use thiserror::Error;

/// Errors encountered during the parsing process
Expand All @@ -14,4 +15,44 @@ pub enum ParseError {
ResourceIdParseError(String),
}

/// General parsing error that may occur, for example if the provided schema/object_name strings
/// aren't valid postgres-style identifiers (excluding dollar signs).
pub type ParseResult<T> = std::result::Result<T, ParseError>;

/// Errors related to time operations, including timezone and timestamp conversions.s
#[derive(Error, Debug, Eq, PartialEq, Serialize, Deserialize)]
pub enum PoSQLTimestampError {
/// Error when the timezone string provided cannot be parsed into a valid timezone.
#[error("invalid timezone string: {0}")]
InvalidTimezone(String),

/// Error indicating an invalid timezone offset was provided.
#[error("invalid timezone offset")]
InvalidTimezoneOffset,

/// Indicates a failure to convert between different representations of time units.
#[error("Invalid time unit")]
InvalidTimeUnit(String),

/// The local time does not exist because there is a gap in the local time.
/// This variant may also be returned if there was an error while resolving the local time,
/// caused by for example missing time zone data files, an error in an OS API, or overflow.
#[error("Local time does not exist because there is a gap in the local time")]
LocalTimeDoesNotExist,

/// The local time is ambiguous because there is a fold in the local time.
/// This variant contains the two possible results, in the order (earliest, latest).
#[error("Unix timestamp is ambiguous because there is a fold in the local time.")]
Ambiguous(String),

/// Represents a catch-all for parsing errors not specifically covered by other variants.
#[error("Timestamp parsing error: {0}")]
ParsingError(String),
}

// This exists because TryFrom<DataType> for ColumnType error is String
impl From<PoSQLTimestampError> for String {
fn from(error: PoSQLTimestampError) -> Self {
error.to_string()
}
}
38 changes: 37 additions & 1 deletion crates/proof-of-sql-parser/src/identifier.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ impl FromStr for Identifier {
fn from_str(string: &str) -> ParseResult<Self> {
let name = IdentifierParser::new()
.parse(string)
.map_err(|e| ParseError::IdentifierParseError(format!("{:?}", e)))?;
.map_err(|e| ParseError::IdentifierParseError(
format!("failed to parse identifier, (you may have used a reserved keyword as an ID, i.e. 'timestamp') {:?}", e)))?;

Ok(Identifier::new(name))
}
Expand Down Expand Up @@ -152,6 +153,41 @@ mod tests {
assert!(Identifier::from_str("GOOD_IDENTIFIER.").is_err());
assert!(Identifier::from_str(".GOOD_IDENTIFIER").is_err());
assert!(Identifier::from_str(&"LONG_IDENTIFIER_OVER_64_CHARACTERS".repeat(12)).is_err());

// Test for reserved keywords
let keywords = [
"all",
"asc",
"desc",
"as",
"and",
"from",
"not",
"or",
"select",
"where",
"order",
"by",
"limit",
"offset",
"group",
"min",
"max",
"count",
"sum",
"true",
"false",
"timestamp",
"to_timestamp",
];

for keyword in keywords.iter() {
assert!(
Identifier::from_str(keyword).is_err(),
"Should not parse keyword as identifier: {}",
keyword
);
}
}

#[test]
Expand Down
12 changes: 11 additions & 1 deletion crates/proof-of-sql-parser/src/intermediate_ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
* https://docs.rs/vervolg/latest/vervolg/ast/enum.Statement.html
***/

use crate::{intermediate_decimal::IntermediateDecimal, Identifier};
use crate::{
intermediate_decimal::IntermediateDecimal, posql_time::timestamp::PoSQLTimestamp, Identifier,
};
use serde::{Deserialize, Serialize};

/// Representation of a SetExpression, a collection of rows, each having one or more columns.
Expand Down Expand Up @@ -328,6 +330,8 @@ pub enum Literal {
VarChar(String),
/// Decimal Literal
Decimal(IntermediateDecimal),
/// Timestamp Literal
Timestamp(PoSQLTimestamp),
}

impl From<bool> for Literal {
Expand Down Expand Up @@ -379,6 +383,12 @@ impl From<IntermediateDecimal> for Literal {
}
}

impl From<PoSQLTimestamp> for Literal {
fn from(time: PoSQLTimestamp) -> Self {
Literal::Timestamp(time)
}
}

/// Helper function to append an item to a vector
pub(crate) fn append<T>(list: Vec<T>, item: T) -> Vec<T> {
let mut result = list;
Expand Down
5 changes: 4 additions & 1 deletion crates/proof-of-sql-parser/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

/// Module for handling an intermediate decimal type received from the lexer.
pub mod intermediate_decimal;
/// Module for handling an intermediate timestamp type received from the lexer.
pub mod posql_time;
#[macro_use]
extern crate lalrpop_util;

Expand All @@ -16,7 +18,8 @@ pub(crate) mod test_utility;
pub(crate) mod select_statement;
pub use select_statement::SelectStatement;

pub(crate) mod error;
/// Error definitions for proof-of-sql-parser
pub mod error;
pub use error::ParseError;
pub(crate) use error::ParseResult;

Expand Down
6 changes: 6 additions & 0 deletions crates/proof-of-sql-parser/src/posql_time/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/// Defines an RFC3339-formatted timestamp
pub mod timestamp;
/// Defines a timezone as count of seconds offset from UTC
pub mod timezone;
/// Defines the precision of the timestamp
pub mod unit;
Loading

0 comments on commit 8b82373

Please sign in to comment.