Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: Use hackyOffset if it parses date strings in the expected format #1580

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

schleyfox
Copy link
Contributor

This is part of a series of PRs based on performance work we have done to
improve a use-case involving parsing/formatting hundreds of thousands of dates
where luxon was the bottleneck.

This includes the commit from #1574 to establish the benchmark

This is the sketchy optimization of the bunch and I would understand not wanting to support this, but it is much, much faster. In our observations, about 1.5% of web traffic outputs in a format that doesn't parse correctly, so we fall back to formatToParts.

The time zone offset of a date can be computed on platforms that support
it (anything that's not super ancient) by using
Intl.DateTimeFormat.formatToParts with en-US to output an array of the
date components. For legacy reasons, you can also generate a date string
using Intl.DateTimeFormat.format which can be parsed into an array using
regexes. The string/regex approach (hackyOffset) is way faster (2-4x),
but much more susceptible to weird client configurations.

This detects whether hackyOffset is able to parse a known date
correctly, and uses it if it does.

Benchmark Comparison (name | before | after | after/before):

DateTime.local with numbers and zone | 50,913 ±0.18% | 106,177 ±0.18% | 2.09x
DateTime.fromFormat with zone | 26,687 ±0.18% | 35,722 ±0.19% | 1.34x
DateTime#setZone | 175,791 ±0.29% | 302,007 ±0.34% | 1.72x

schleyfox and others added 2 commits January 19, 2024 17:01
The time zone offset of a date can be computed on platforms that support
it (anything that's not super ancient) by using
Intl.DateTimeFormat.formatToParts with en-US to output an array of the
date components. For legacy reasons, you can also generate a date string
using Intl.DateTimeFormat.format which can be parsed into an array using
regexes. The string/regex approach (hackyOffset) is way faster (2-4x),
but much more susceptible to weird client configurations.

This detects whether hackyOffset is able to parse a known date
correctly, and uses it if it does.
Copy link

linux-foundation-easycla bot commented Jan 22, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

*/
static hackyOffsetParsesCorrectly() {
if (hackyOffsetParsesCorrectly === undefined) {
const dtf = makeDTF("UTC");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useless DTF to cache since we use fixed offset zones for UTC. Unclear if there's a real zone that would make more sense to use here (I'm biased towards America/New_York)

@schleyfox schleyfox marked this pull request as ready for review January 22, 2024 16:50
@schleyfox
Copy link
Contributor Author

/easycla

@icambron
Copy link
Member

icambron commented Mar 9, 2024

This is a clever optimization and I'm not against it. But one question I have is how many calls it takes to amortize the cost of the upfront check. Do you have a sense of that? For all I know, it just takes one.

@schleyfox
Copy link
Contributor Author

that's a very good point

import * as b from "benny"

const ts = 1710020705493
const jsDate = new Date(ts)

const utcTestDate = new Date(Date.UTC(1969, 11, 31, 15, 45, 55))

function makeDtf(zone: string) {
	return new Intl.DateTimeFormat("en-US", {
		hour12: false,
		timeZone: zone,
		year: "numeric",
		month: "2-digit",
		day: "2-digit",
		hour: "2-digit",
		minute: "2-digit",
		second: "2-digit",
		era: "short",
	})
}

// from luxon
function hackyOffset(dtf: any, date: any) {
	const formatted = dtf.format(date).replace(/\u200E/g, ""),
		parsed = /(\d+)\/(\d+)\/(\d+) (AD|BC),? (\d+):(\d+):(\d+)/.exec(formatted),
		[, fMonth, fDay, fYear, fadOrBc, fHour, fMinute, fSecond] = parsed ?? []
	return [fYear, fMonth, fDay, fadOrBc, fHour, fMinute, fSecond]
}

const typeToPos = {
	year: 0,
	month: 1,
	day: 2,
	era: 3,
	hour: 4,
	minute: 5,
	second: 6,
}

function partsOffset(dtf: any, date: any) {
	const formatted = dtf.formatToParts(date)
	const filled = []
	for (let i = 0; i < formatted.length; i++) {
		const { type, value } = formatted[i]
		// @ts-expect-error[implicit-any-index-incorrect] hidden by suppressImplicitAnyIndexErrors
		const pos = typeToPos[type]

		if (type === "era") {
			filled[pos] = value
		} else if (pos !== undefined) {
			filled[pos] = parseInt(value, 10)
		}
	}
	return filled
}

const dtf = makeDtf("America/New_York")

void b.suite(
	"Intl.DateTimeFormat",
	b.add("hackyOffset", () => {
		hackyOffset(dtf, jsDate)
	}),
	b.add("partsOffset", () => {
		partsOffset(dtf, jsDate)
	}),
	b.add("makeDtf(UTC) no-cache", () => {
		makeDtf("UTC")
	}),
	b.add("uncached partsOffset", () => {
		const dtf = makeDtf("America/New_York")
		partsOffset(dtf, jsDate)
	}),
	b.add("uncached hackyOffset", () => {
		const utcDtf = makeDtf("UTC")
		hackyOffset(utcDtf, utcTestDate)
		const dtf = makeDtf("America/New_York")
		hackyOffset(dtf, jsDate)
	}),
	b.cycle(),
	b.complete()
)
$ notion ts-node src/test/benchmark/shared/helpers/datetime/datetimeformatOverhead.benchmark.ts
Running "Intl.DateTimeFormat" suite...
Progress: 100%

  hackyOffset:
    589 933 ops/s, ±0.28%   | fastest

  partsOffset:
    234 358 ops/s, ±0.24%   | 60.27% slower

  makeDtf(UTC) no-cache:
    27 116 ops/s, ±6.38%    | 95.4% slower

  uncached partsOffset:
    22 742 ops/s, ±4.06%    | 96.14% slower

  uncached hackyOffset:
    11 640 ops/s, ±11.52%    | slowest, 98.03% slower

Finished 5 cases!
  Fastest: hackyOffset
  Slowest: uncached hackyOffset

DTF construction is very slow, so doing it twice is quite bad. I don't know why the error bars are so high for it, however.

I think this math works, but let me know if it makes sense to you.

Analytically, we can see that constructing a DTF takes 1MM/27,116 = 36.9us . hackyOffset takes 1MM/589,933 = 1.7us, while partsOffset takes 1MM/234,358= 4.3us

we then expect a single uncached partsOffset to take 36.9us + 4.3us = 41.2us (observed is 44us)
likewise, we'd expect a single uncached hackyOffset to take 36.9us + 1.7us + 36.9us + 1.7us = 77.2us (observed is 86us)

77.2 - 41.2 = 36us to make up. We would save 4.3 - 1.7 = 2.6us per call, so 36/2.6 = 13.8 calls before our optimization pays off.

Note that we may have to call *Offset up to 2 or 3 (but mostly 2 except for time holes) times per DateTime, so this could pay off in as little as 7 DateTimes (or 4.6 if you are specifically testing edge cases). At a few hundred DateTimes, you start saving milliseconds.

@icambron
Copy link
Member

How about just adding Settings.setUseHackyOffset()? Seems like that would allow specific applications that do a ton of formats to speed up while sacrificing a few locales, and others to not worry about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants