Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strict boxed vectors #483

Closed
Shimuuar opened this issue Mar 1, 2024 · 12 comments · Fixed by #488
Closed

Add strict boxed vectors #483

Shimuuar opened this issue Mar 1, 2024 · 12 comments · Fixed by #488

Comments

@Shimuuar
Copy link
Contributor

Shimuuar commented Mar 1, 2024

This is inspired by thread on discource. We don 't have any good way to ensure that elements of boxed vector are evaluated. Namely any use of map, zip, etc will not evaluate vector's elements. And if used in iterative algorithms as in thread above it will leak memory. We don't even have standard way of evaluating all elements of vector to WHNF.

From users' PoV easiest way is to add boxed vector which is strict in its elements. It would be also most reliable. And since strictness is tied to strictness of basicWrite we have to define new data type.

As much as I hate adding (and supporting) another vector data type I think this addition is valuable enough. @lehins @Bodigrim what is your opinion?

I

@lehins
Copy link
Contributor

lehins commented Mar 1, 2024

I've done the same in massiv, so I am definitely in favor. Having a whole new module with duplication of all of the functions is of course pretty unfortunate. I don't have the same problem in massiv.

We could potentially take the same approach here and just define the types with instances and let the users rely on the Generic interface.

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Mar 1, 2024

I think it would be better to maintain same style for all modules. I guess we're stuck with the decision.

If GHC API has stabilized somewhat it would be possible to write tool for applying documentation changes and to generate reexports automatically

@konsumlamm
Copy link
Contributor

This could be a separate library. Then the argument for duplicating all the functions is less strong and it wouldn't make compilation slower for everyone who doesn't need strict boxed vectors.

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Mar 4, 2024

Separate library would result in rather poor usability. User would have to find another library and that's for rather basic functionality. As an example containers duplicate API for maps: there're strict and lazy version

@konsumlamm
Copy link
Contributor

The library could be linked from the vector documentation for better discoverability. As I showed in #458 (comment), redefinitions slow down compilation time significantly.

@lehins
Copy link
Contributor

lehins commented Mar 4, 2024

I am with @Shimuuar on this one.

I would even go so far as advocating strict boxed vector as the default one that users should reach out for, instead of the current lazy one.

Also, it would be in line with other packages, like containers, transformers, mtl`, etc.

Extra 10 seconds of compile time is a small price for having this feature, IMHO.

@Mikolaj
Copy link
Member

Mikolaj commented Mar 5, 2024

Is it the same as included in https://hackage.haskell.org/package/strict-containers? I use these with gusto.

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Mar 5, 2024

Yes. My plan was to copy definition of lazy vector, make basicWrite strict in value being written. Then there are functions for conversion from Array. Should we force all elements in Array?

It should be essentially the same.

@Mikolaj
Copy link
Member

Mikolaj commented Mar 5, 2024

I don't know or don't remember the details, so let me ping @infinity0.

As a user, I think I'd expect the conversion to force all elements, yes.

@infinity0
Copy link

I support the proposal since it means I'll have less maintenance work in strict-containers. :)

As for implementation, my patch is against vector-0.13 and I think basicWrite didn't in exist back in that version? I found it necessary to patch both basicUnsafeReplicate and basicUnsafeWrite, if that helps.

Also I provide explicit strict versions of the G.Vector API (it results in a smaller patch for me, to duplicate more of vector) and override elemseq to be strict as well, since the default definition is not.

There may have been other things I missed, but it worked well for me back when I used it - it passed a bunch of strictness tests I wrote, and it seems @Mikolaj hasn't found any bugs so far. :)

@infinity0
Copy link

More details about my patch. The results are here: https://github.com/haskellari/strict-containers/tree/master/strict-containers/src/Data/Strict/Vector

We only need to duplicate 3 files from Vector:

  • Data/Vector.hs -> Data/Strict/Vector/Autogen.hs
  • Data/Vector/Mutable.hs -> Data/Strict/Vector/Autogen/Mutable.hs
  • Data/Vector/Internal/Check.hs -> Data/Strict/Vector/Autogen/Internal/Check.hs

The patch linked above is then applied onto this. Then a bunch of stuff is re-exported as Data.Strict.Vector.

I didn't think about how vector itself might provide something like this. Maybe even duplication of 3 files is too much for you and you want to think about if there's a better way to do it. Then again, this may require messy stuff involving higher-order kinds etc, the options I could think of were not pretty.

@Bodigrim
Copy link
Contributor

The earlier discussion: #380

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants