Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FixedTimeZone isbits #354

Merged
merged 3 commits into from
Oct 12, 2021
Merged

Conversation

Wynand
Copy link
Contributor

@Wynand Wynand commented Aug 10, 2021

This switches FixedTimeZone.name to an InlineString instead of a String, and doing that makes it an isbits type

As a side effect this also limits the number of bytes in a FixedTimeZone name

Usually timezones are named after locations, and the longest location name currently is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu. This is in New Zealand, so if a fixed time zone were created for it then it would be Oceania/Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu

That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well

Related issue: #271

@Wynand
Copy link
Contributor Author

Wynand commented Aug 10, 2021

Depends on: JuliaData/WeakRefStrings.jl#78

Project.toml Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Aug 10, 2021

Thanks, this seems promising. Out of curiosity do you also want VariableTimeZone to be an isbitstype or is your priority just FixedTimeZone?

@Wynand
Copy link
Contributor Author

Wynand commented Aug 10, 2021

Thanks, this seems promising. Out of curiosity do you also want VariableTimeZone to be an isbitstype or is your priority just FixedTimeZone?

No problem!

For now I'm focusing on FixedTimeZone since VariableTimeZone still has the transitions vector, however Transition will also be an isbitstype once FixedTimeZone is an isbitstype, which should speed up garbage collection

@nickrobinson251
Copy link
Contributor

That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well

I think we should probably switch to InlineString127
Having every FixedTimeZone take up an extra 128 bytes unnecessarily is gonna add up (especially compared to the current size e.g. sizeof(tz"UTC") == 24)

I also wonder if there's anyway to use even shorter InlineStrings... 🤔

@omus
Copy link
Member

omus commented Aug 12, 2021

For now I'm focusing on FixedTimeZone since VariableTimeZone still has the transitions vector, however Transition will also be an isbitstype once FixedTimeZone is an isbitstype, which should speed up garbage collection

Great! I think using InlineStrings is a good stepping stone towards that goal.

I also wonder if there's anyway to use even shorter InlineStrings... 🤔

Funny you should ask... As an experiment I implemented a TZAbbr string-type which should be minimize the storage cost for the Transition vector: #355

I still think we should proceed with using InlineString as proposed in this PR as the TZAbbr is more specialized and will take some additional work to integrate.

@Wynand
Copy link
Contributor Author

Wynand commented Aug 12, 2021

Worth noting: This will cause issues with projects using CSV.jl until it's next release, since v0.8.5 doesn't support WeakRefStrings 1.2

@omus
Copy link
Member

omus commented Aug 13, 2021

Worth noting: This will cause issues with projects using CSV.jl until it's next release, since v0.8.5 doesn't support WeakRefStrings 1.2

It's doubtful anyone will explicitly require this latest release so falling back to a slightly older TimeZones.jl should be fine

Project.toml Outdated Show resolved Hide resolved
src/types/fixedtimezone.jl Outdated Show resolved Hide resolved
src/types/fixedtimezone.jl Outdated Show resolved Hide resolved
src/types/fixedtimezone.jl Outdated Show resolved Hide resolved
src/types/timezone.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Aug 13, 2021

WeakRefStrings.jl has a Julia minimum of 1.3. All of its dependencies are set to use Julia 1 and the PR that bumped up the minimum was: JuliaData/WeakRefStrings.jl#73. From looking over that PR it seems like WeakRefStrings.jl could be set to use julia = "1"

@Wynand
Copy link
Contributor Author

Wynand commented Oct 5, 2021

I'm looking into the performance of this change but I haven't benchmarked with Julia before, so here's what I did and the results:

First I created a test script, test.jl:

using TimeZones, BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.gcsample = true


function bm_tzs()
    map( t -> TimeZone(t, TimeZones.Class(:ALL)), timezone_names())
    TimeZones._reset_tz_cache()
    GC.gc()
end

@benchmark bm_tzs()

For both master and my branch I switched to each like so:

> git checkout <branch name>
> git clean -dfX && julia --project=@. -e 'using Pkg; Pkg.build("TimeZones")'

Then I tested each with the following Julia commands:

julia --project=@. -t 1
include("test.jl")

Here are the results for master from one run:

BenchmarkTools.Trial: 9 samples with 1 evaluation.
 Range (min … max):  337.749 ms … 347.104 ms  ┊ GC (min … max): 20.89% … 22.29%
 Time  (median):     339.937 ms               ┊ GC (median):    21.09%
 Time  (mean ± σ):   340.773 ms ±   2.992 ms  ┊ GC (mean ± σ):  21.18% ±  0.56%

  █  █  ██      █        █  █        █                        █  
  █▁▁█▁▁██▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  338 ms           Histogram: frequency by time          347 ms <

 Memory estimate: 45.01 MiB, allocs estimate: 843644.

And from my branch:

BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  94.937 ms … 110.278 ms  ┊ GC (min … max): 57.50% … 50.64%
 Time  (median):     97.181 ms               ┊ GC (median):    57.46%
 Time  (mean ± σ):   97.859 ms ±   3.467 ms  ┊ GC (mean ± σ):  57.23% ±  1.85%

         ▃█  ▃                                                  
  ▇▇▁▇▇▁▁██▇▇█▁▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  94.9 ms         Histogram: frequency by time          110 ms <

 Memory estimate: 9.38 MiB, allocs estimate: 54047.

I'm going to follow up and make sure they use the same number of samples, but so far this looks good 🤷

@Wynand
Copy link
Contributor Author

Wynand commented Oct 5, 2021

Reran with test.jl as

using TimeZones, BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.gcsample = true
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 300
BenchmarkTools.DEFAULT_PARAMETERS.samples = 100


function bm_tzs()
    map( t -> TimeZone(t, TimeZones.Class(:ALL)), timezone_names())
    TimeZones._reset_tz_cache()
    GC.gc()
end

@benchmark bm_tzs()

Master results:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  278.357 ms … 318.445 ms  ┊ GC (min … max): 23.32% … 19.60%
 Time  (median):     289.653 ms               ┊ GC (median):    22.14%
 Time  (mean ± σ):   291.489 ms ±   8.360 ms  ┊ GC (mean ± σ):  22.02% ±  0.65%

     ▃▁▆ █  ▃ ▆ ▁▃▆▁ ▃   ▁  ▁▃    ▁▁ ▃ ▁                         
  ▄▁▄███▇█▁▄█▇█▇████▇█▄▄▇█▇▄██▇▇▄▁██▄█▁█▄▁▄▇▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄ ▄
  278 ms           Histogram: frequency by time          318 ms <

 Memory estimate: 45.01 MiB, allocs estimate: 843644.

this branch's results:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  94.168 ms … 100.176 ms  ┊ GC (min … max): 58.06% … 56.36%
 Time  (median):     96.990 ms               ┊ GC (median):    57.91%
 Time  (mean ± σ):   96.926 ms ±   1.017 ms  ┊ GC (mean ± σ):  58.00% ±  0.62%

                     ▂▆  ▆  ▄▂   ▄ █ ▆▂                         
  ▄▁▁▁▄▁▁▁▁▁▆▁▁▆▁▆▆▄▄███▆██▁██▆▆████▄██▆▄█▆▆▆▄▆▄▄▄▁▁▁▁▄▁▄▁▁▁▁▄ ▄
  94.2 ms         Histogram: frequency by time         99.7 ms <

 Memory estimate: 9.38 MiB, allocs estimate: 54047.

@Wynand Wynand changed the title WIP: Make FixedTimeZone isbits Make FixedTimeZone isbits Oct 5, 2021
h = hash(tz.name, h)
h = hash(tz.offset, h)
return h
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to overload == and hash?

I expected making it isbits would decrease that need.

# Note: If the class `mask` does not match the time zone we'll still load the
# information into the cache to ensure the result is consistent.
tz, class = get!(_tz_cache(), str) do
tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)
tz, class = get!(_tz_cache(), String(str)) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed pretty much all the changes that cast to strings, seems like it's handled better now

tz, class = get!(_tz_cache(), str) do
tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)
tz, class = get!(_tz_cache(), String(str)) do
tz_path = joinpath(TZData.COMPILED_DIR, split(String(str), "/")...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An issue should be openned on InlineStrings.jl if this is required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

elseif occursin(FIXED_TIME_ZONE_REGEX, str)
FixedTimeZone(str), Class(:FIXED)
elseif occursin(FIXED_TIME_ZONE_REGEX, String(str))
FixedTimeZone(String(str)), Class(:FIXED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?
It seems like it is just going to be converted to a InlineString anyway so why convert to a String first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@oxinabox
Copy link
Contributor

oxinabox commented Oct 5, 2021

Usually timezones are named after locations, and the longest location name currently is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu. This is in New Zealand, so if a fixed time zone were created for it then it would be Oceania/Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu

That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well

I think we can do better.

VariableTimeZones are named after locations.
FixedTimeZones have codes like UTC+0800
See the FIXED_TIMEZONE_REGEX.

I found before they all fit a ShortString15.

@Wynand
Copy link
Contributor Author

Wynand commented Oct 6, 2021

I found before they all fit a ShortString15

Switched to InlineString15, here are the results if I test without specifying -t:

enchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):   97.166 ms … 198.882 ms  ┊ GC (min … max): 60.96% … 29.90%
 Time  (median):     105.430 ms               ┊ GC (median):    60.54%
 Time  (mean ± σ):   108.828 ms ±  13.095 ms  ┊ GC (mean ± σ):  60.96% ±  3.95%

  ▁▃██▃▃▆▆ ▃   ▆ ▃   ▁▁▁▃                ▁                       
  ████████▇█▇▇▄█▄█▇▇▄████▄▁▄▁▄▁▁▄▁▁▇▁▄▁▁▄█▁▇▁▄▇▁▁▄▄▇▁▁▁▄▁▁▁▁▁▁▇ ▄
  97.2 ms          Histogram: frequency by time          135 ms <

 Memory estimate: 5.04 MiB, allocs estimate: 53858.

@Wynand Wynand force-pushed the wy/isbits-fixedtimezone branch 2 times, most recently from 339f04f to 5b24406 Compare October 6, 2021 17:21
@Wynand
Copy link
Contributor Author

Wynand commented Oct 6, 2021

I can't think of any more code changes or cleanup needed, so if everything looks good this can be merged

Comment on lines 99 to 101
function Base.isequal(a::FixedTimeZone, b::FixedTimeZone)
return isequal(a.name, b.name) && isequal(a.offset, b.offset)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to overload this?
(sorry I didn't mention it before)

Copy link
Contributor

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than 1 last comment about isequal this looks good to me.
(I don't have merge rights though)

@iamed2
Copy link
Member

iamed2 commented Oct 6, 2021

Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.

@iamed2
Copy link
Member

iamed2 commented Oct 6, 2021

I can't think of any more code changes or cleanup needed, so if everything looks good this can be merged

This should go without saying but CI should be passing before merge

@Wynand
Copy link
Contributor Author

Wynand commented Oct 7, 2021

Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.

I've switched to ShortStrings instead, which does support Julia 1.0. The only quirk so far is with regex matching, but I've also only tested locally with Julia 1.6

@codecov-commenter
Copy link

codecov-commenter commented Oct 7, 2021

Codecov Report

Merging #354 (76b1fdd) into master (6eb9788) will decrease coverage by 1.26%.
The diff coverage is 100.00%.

❗ Current head 76b1fdd differs from pull request most recent head bf015db. Consider uploading reports for the commit bf015db to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
- Coverage   93.75%   92.48%   -1.27%     
==========================================
  Files          31       31              
  Lines        1553     1477      -76     
==========================================
- Hits         1456     1366      -90     
- Misses         97      111      +14     
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <ø> (ø)
src/types/fixedtimezone.jl 100.00% <100.00%> (ø)
src/winzone/WindowsTimeZoneIDs.jl 0.00% <0.00%> (-100.00%) ⬇️
src/build.jl 83.33% <0.00%> (-16.67%) ⬇️
src/tzdata/version.jl 57.14% <0.00%> (-3.39%) ⬇️
src/arithmetic.jl 92.00% <0.00%> (-0.31%) ⬇️
src/tzdata/download.jl 90.62% <0.00%> (-0.29%) ⬇️
src/utils.jl 97.56% <0.00%> (-0.27%) ⬇️
src/tzdata/timeoffset.jl 94.11% <0.00%> (-0.17%) ⬇️
src/types/zoneddatetime.jl 96.00% <0.00%> (-0.16%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6eb9788...bf015db. Read the comment docs.

@Wynand
Copy link
Contributor Author

Wynand commented Oct 7, 2021

I messed up updating the project.toml, which I think is what caused the benchmarking failure

I also re-ran performance tests after switching to ShortStrings, and here are the results:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):   72.273 ms … 325.138 ms  ┊ GC (min … max): 61.92% … 17.33%
 Time  (median):      95.901 ms               ┊ GC (median):    58.95%
 Time  (mean ± σ):   101.141 ms ±  28.220 ms  ┊ GC (mean ± σ):  55.53% ±  6.93%

   ▁      ▄▇█▇▅                                                  
  ▇█▅▇▅▁▁▁█████▇▁▇▅▁▅▅▁▇▅█▅▁▁▁▅▅▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
  72.3 ms       Histogram: log(frequency) by time        206 ms <

 Memory estimate: 5.04 MiB, allocs estimate: 53952.

@iamed2
Copy link
Member

iamed2 commented Oct 7, 2021

Not sure but I think there may also be a code bug in the benchmarks based on this error: https://github.com/JuliaTime/TimeZones.jl/pull/354/checks?check_run_id=3828334686#step:5:263

@oxinabox
Copy link
Contributor

oxinabox commented Oct 7, 2021

Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.

I've switched to ShortStrings instead, which does support Julia 1.0. The only quirk so far is with regex matching, but I've also only tested locally with Julia 1.6

I would rather not, we are wanting to deprecate ShortStrings.jl for InlineStrings.

I wonder why InlineStrings doesn't support 1.0
JuliaStrings/InlineStrings.jl#9

@Wynand
Copy link
Contributor Author

Wynand commented Oct 7, 2021

Not sure but I think there may also be a code bug in the benchmarks based on this error: https://github.com/JuliaTime/TimeZones.jl/pull/354/checks?check_run_id=3828334686#step:5:263

My best guess is that the older version of TimeZones is being built (in the compiled folder), and that is being used when benchmarking, which causes deserializing from those files to fail. To work around this I've pushed a change that moves checkout to before package building, similar to the julia tests in the package

@Wynand Wynand force-pushed the wy/isbits-fixedtimezone branch 2 times, most recently from 2c3651c to 9c1e835 Compare October 7, 2021 17:35
@Wynand
Copy link
Contributor Author

Wynand commented Oct 7, 2021

This should resolve the benchmarking issue: #360

I've tested running the CI code locally, replicated the issue, and fixed it by using this code as the base. I'll add more details on how I tested it in this MR

@oxinabox
Copy link
Contributor

oxinabox commented Oct 8, 2021

@omus can we revert the commit changing to ShortStrings and drop 1.0 support instead?

@omus
Copy link
Member

omus commented Oct 8, 2021

@omus can we revert the commit changing to ShortStrings and drop 1.0 support instead?

I'm okay with that. We should drop 1.0 support in a separate PR and be sure to remove all of the VERSION specific code.

@oxinabox
Copy link
Contributor

oxinabox commented Oct 8, 2021

I'm okay with that. We should drop 1.0 support in a separate PR and be sure to remove all of the VERSION specific code.

I will make that PR presently

@omus omus added this to the 1.6.0 milestone Oct 12, 2021
@Wynand
Copy link
Contributor Author

Wynand commented Oct 12, 2021

Reran the custom benchmark script

This branch:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  92.906 ms … 128.399 ms  ┊ GC (min … max): 59.50% … 54.53%
 Time  (median):     97.701 ms               ┊ GC (median):    59.53%
 Time  (mean ± σ):   98.881 ms ±   5.541 ms  ┊ GC (mean ± σ):  59.13% ±  0.91%

  ▂▃██▂ ▂▂▂   ▃▂  ▅▃                                            
  ██████████▄███▅▄██▇▁▄▇▄▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▄
  92.9 ms         Histogram: frequency by time          127 ms <

 Memory estimate: 5.04 MiB, allocs estimate: 53912.

master:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  265.216 ms … 308.959 ms  ┊ GC (min … max): 22.84% … 20.03%
 Time  (median):     275.566 ms               ┊ GC (median):    22.41%
 Time  (mean ± σ):   278.183 ms ±   8.808 ms  ┊ GC (mean ± σ):  22.28% ±  0.63%

        ██ ▁▃▃ ▃▁   ▁          ▃                                 
  ▄▁▄▁▇▆██▇███▄██▇▄▇█▆▄▄▇▇▆▇▆▁▄█▁▁▁▇▄▇▁▇▁▄▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▄▁▁▁▁▄ ▄
  265 ms           Histogram: frequency by time          307 ms <

 Memory estimate: 45.07 MiB, allocs estimate: 844893.

Project.toml Outdated Show resolved Hide resolved
@omus omus merged commit 2b2324f into JuliaTime:master Oct 12, 2021
@omus
Copy link
Member

omus commented Oct 13, 2021

I'll be making the 1.6.0 release today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants