Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table.mgetOrPut without default val #22994

Merged
merged 3 commits into from
Nov 30, 2023
Merged

Conversation

inv2004
Copy link
Contributor

@inv2004 inv2004 commented Nov 27, 2023

RFC: nim-lang/RFCs#539

  • mgetOrPutDefaultImpl template into tableimpl.nim to avoid macros
  • mgetOrPut for Table, TableRef, OrderedTable, OrderedTableRef
  • tests/stdlib/tmget.nim tests update

@inv2004 inv2004 changed the title add mgetOrPut.mgetOrPut without val add table.mgetOrPut without val Nov 28, 2023
@inv2004
Copy link
Contributor Author

inv2004 commented Nov 28, 2023

checked failed js test locally - looks ok:

u@DESKTOP-H ~/Nim (mgetorput_with_default)> ./bin/nim js -d:release --run tests/js/t11353.nim
Hint: used config file '/home/u/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/u/Nim/config/config.nims' [Conf]
Hint: used config file '/home/u/config.nims' [Conf]
Hint: used config file '/home/u/Nim/tests/config.nims' [Conf]
Hint: opt: speed; options: -d:release
41513 lines; 0.351s; 37.875MiB peakmem; proj: t11353; out: t11353.js [SuccessX]
Hint: /usr/local/lib/nodejs/node-v20.10.0-linux-x64/bin/node --unhandled-rejections=strict /home/u/Nim/tests/js/t11353.js [Exec]
{}
{}
u@DESKTOP-H ~/Nim (mgetorput_with_default)> node --version
v20.10.0

@inv2004 inv2004 changed the title add table.mgetOrPut without val add table.mgetOrPut without default val Nov 28, 2023
@inv2004 inv2004 changed the title add table.mgetOrPut without default val table.mgetOrPut without default val Nov 28, 2023
@Araq
Copy link
Member

Araq commented Nov 29, 2023

What's the benefit over tab.mgetOrPut(key, default(B))?

@inv2004
Copy link
Contributor Author

inv2004 commented Nov 29, 2023

To avoid additiomal initialization of the default(B) in case if key is already in table

It is applicable to any function instead of default because it is evaluated before mgetOrPut

@Araq
Copy link
Member

Araq commented Nov 29, 2023

To avoid additiomal initialization of the default(B) in case if key is already in table

Why would it do that?

@inv2004
Copy link
Contributor Author

inv2004 commented Nov 29, 2023

because of function call and params evaluation order.

I expected it works the same for any f() at the place of the default(B) if not special optimizations here

@Araq
Copy link
Member

Araq commented Nov 29, 2023

Sorry, but please show the benefit of this with a benchmark.

@inv2004
Copy link
Contributor Author

inv2004 commented Nov 29, 2023

I agree that it is good point and it is not very clear if -O3 makes any different in perf on simple objects, but it is visible in -d:debug. but I would like to test the case a bit more

About more complicate objects you can find result here:

Code:
import criterion
import tables

const R = 100
type
  U = object
    c: seq[string]
    d: seq[string]
    e = 100
    f = 200
    j = 300
    h = 400

  T = object
    a: int
    b: array[100, U]
    c: seq[string]
    d: seq[string]
    e = 100
    f = 200
    j = 300
    h = 400

let cfg = newDefaultConfig()

benchmark cfg:
  var xOld = initTable[int, T]()
  var xNew = initTable[int, T]()

  proc defaultOld() {.measure.} =
    for _ in 1..10:
      for i in 1..R:
        xOld.mgetOrPut(i, default(T)).a.inc

  proc defaultNew() {.measure.} =
    for _ in 1..10:
      for i in 1..R:
        xNew.mgetOrPut(i).a.inc

Result:
image

@inv2004
Copy link
Contributor Author

inv2004 commented Nov 29, 2023

For

T = object
  a: int

results are the same:

image

@c-blake
Copy link
Contributor

c-blake commented Nov 29, 2023

Almost 300x faster is a pretty big difference. FWIW, I think tim or its library counterpart emin is less off track for this kind of measurement than criterion { basically estimate min dt and the error of that min rather than trying to do trimmed means & reporting vague "outlier" counts when you aren't even sure the distribution is even stationary (stable over time) }.

It still seems to me like a template solving the more general case of deferring running any sort of init code at all would be better than this new default-less overload for mgetOrPut that always uses default(T). (If you don't like my editOrInit name suggestion, you could do something closer to tables.withValue.)

{ There are actually even more general cases.. E.g., where the "key" itself is an integer offset into a string buffer that needs growing on puts, but not gets. Those seem out of scope for the current design, though. }

@inv2004
Copy link
Contributor Author

inv2004 commented Nov 29, 2023

@c-blake
There are few points I see here:

  • No reasons to have overcomplicated duplicates like proc mgetOrPut and proc mgetOrPutLazy at the moment, because of next item
  • it would be good to convert just mgetOrPut into template, like it was discussed in the RFC, but @Araq mentioned that it could be not back-compatible at the moment (maybe later)
  • withValue looks pretty complicate and even returns pointer, I think that most devs would prefer mgetOrPut old or new in most cases

That is why I decided to start with something simple, but which can cover most of the cases (default insert, probably, is the most popular case), but it covers lazy evaluation (not like template/macros, but a bit more than nothing) + it looks back-compatible

Even without perf gains, it would be good to have mgetOrPut...(..., val = default(B)) just to save few characters in code

@c-blake
Copy link
Contributor

c-blake commented Nov 29, 2023

I'm ok with your new overload, and mostly for the reasons you enumerate. I just think that in N years someone will also want the template anyway to cover more cases. :-)

@inv2004
Copy link
Contributor Author

inv2004 commented Nov 29, 2023

@c-blake Totally agree. And the reason I did not make it template at the moment because of the stopper with borrow'ed teamplate if I understood it correct

@c-blake
Copy link
Contributor

c-blake commented Nov 29, 2023

I also don't see a problem with a new name for the template. It's fundamentally a speed optimization. mgetOrPut itself is just a speed optimization over if x in tab: y else: z. So, this yet more optimized stage can use yet another new name. In my experience, speed-optimizers are a more forgiving audience about such { though I also see the charm of the same name, and your seeming insistence that a new name is a problem is challenging my experience :-) }.

One other thing I could say possibly constructively is that, from the caller's perspective, I think of withValue as more an "identifier interface" than a "pointer interface" (I know it uses pointers internally, but the caller needn't worry about that).

{ This is really just to respond to your rationale. As said, I don't have a problem with the new overload. }

@Araq Araq merged commit 0f7ebb4 into nim-lang:devel Nov 30, 2023
19 checks passed
Copy link
Contributor

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 0f7ebb4

Hint: mm: orc; opt: speed; options: -d:release
177235 lines; 7.581s; 766.727MiB peakmem

@c-blake
Copy link
Contributor

c-blake commented Dec 2, 2023

By the way, in case no one else noticed there is a strong ambiguity here in terms of extending the existing naming convention. This PR drops an argument from mgetOrPut, but one could also "add an m" to getOrDefault.

No Nim release has happened yet to this very fresh merge. So, if some future template mgetOrPut may happen someday (I still think a new name today is fine) which also handles cases sort of like this, mgetOrDefault might be the better name for this.

@inv2004
Copy link
Contributor Author

inv2004 commented Dec 2, 2023

@c-blake
I did not get why mgetOrDefault is better. It was my initial idea, but I decided that less new names => better

mget-or-put(with-default) or mgetordefault(and it means put too, mget is not just get) I am not sure and I am bad in naming, but I like less new functions :)

@c-blake
Copy link
Contributor

c-blake commented Dec 3, 2023

It's a somewhat tricky situation and neither choice is very wrong. On the one hand, this new operation is a lot like getOrDefault but with modifiable memory. On the other hand, in order to have said modifiable memory, a table slot must be allocated - so a put always happens. Which consistency is better depends upon what what you want to emphasize - the modifiability & default-ness aspects or the always-putting aspects. And which you want to emphasize depends upon what you think users attention will be drawn to. Kind of hard to say. So, your choice is not a terrible one and it indeed saves a new name - it's just a name you appear to want to use for a third thing (a template) whenever the compiler supports that.

Arguably, mgetOrPutDefault would be the clearest name of all. It is a little bit long, but is crystal clear and still solves all your performance concerns and is in some ways consistent with both of the other two names.

@inv2004
Copy link
Contributor Author

inv2004 commented Dec 3, 2023

Another thing I am afraid, of mgetOrPut will be macros one day - mgetOrDefaults will be needed to be deprecated, but not mgetOrPut

@c-blake
Copy link
Contributor

c-blake commented Dec 3, 2023

A "fourth consistency" I neglected to mention is higher-level. This module uses m, and English verbs pretty consistently (hasKeyOrPut, getOrDefault) instead of making up "neologisms". For this op, "upsert" is common in the DB world due to SQL (getPut is another option). So, upsertDefault would be an analogue along those lines. I have also seen creative distinguishing of similar verbs like set, put, init. I didn't mention this before since your first try here didn't violate that aspect of conventions. Just added it for completeness.

will be needed to be deprecated

Eventual obsolescence of this proc after a more general template/macro is added is behind my comments at the start of both this thread and your RFC, connecting the two ideas. So, that makes it seem like we are just looping, but with you agreeing more with me on the second iteration. Progress! :-) But deprecation also doesn't really depend on the name since Nim can deprecate just one definition.

Anyway, it's not a huge deal as mgetOrPut(.., key) also looks like a workaround for mgetOrPut(.., val=default(B)) being less optimized as has been noted. The tension between different consistencies just seemed undiscussed before the PR merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants