Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove INLINE Pragma on indices #219

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

Tarmean
Copy link

@Tarmean Tarmean commented Feb 21, 2018

For context, here is a recent reddit discussion about a program that runs faster with optimizations disabled..

This is caused by an interaction between the INLINE pragma on indices and the INLINE [1] pragma on isInfixOf. For some reason this combination keeps buildTable 0 0 (nlen-2) from being floated out. The end result is buildTable being run in each iteration of scan resulting in some impressively slow searches.

There are other ways to solve this issue but indices doesn't participate in fusion anyway and this seems like it'd impact code readability the least.

This fixes an interaction with {-# INLINE [1] isInfixOf #-}
that made buildTable run once for each scan iteration
@hvr hvr requested a review from nomeata February 21, 2018 13:07
@hvr
Copy link
Member

hvr commented Feb 21, 2018

I'd like @nomeata to give this a look as he's been recently also looking at the fusion framework rules...

Copy link
Contributor

@nomeata nomeata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is independent of fusion, right?

But if leaving the inlining decision to GHC yields better performance, then that’s great of course.

@Tarmean, did you run benchmarks to test this?

@Tarmean
Copy link
Author

Tarmean commented Feb 21, 2018

...I really should have done that before opening the pull request, sorry. I am running into some problems when trying to run the benchmarks, though:

8.0.2:

 text/benchmarks > dist/build/text-benchmarks/text-benchmarks
q[1]    5174 segmentation fault (core dumped)  dist/build/text-benchmarks/text-benchmarks

8.2.2:

text-benchmarks: internal error: Unable to commit 1214251008 bytes of memory
    (GHC version 8.2.2 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
[1]    5201 abort (core dumped)  .stack-work/install/x86_64-linux-nopie/lts-10.6/8.2.2/bin/text-benchmarks

@hvr
Copy link
Member

hvr commented Feb 22, 2018

@Tarmean Did the benchmark executable fail right away or did it produce more output than you showed us? I can't reproduce neither the segfault nor the GHC panic with a text-benchmarks executable built using cabal new-build (with a GHC distro specifically built for Ubuntu). I suspect either your GHC installation to have a problem, some package-db corruption, or that you're running this on a memory-scarce machine.

@Tarmean
Copy link
Author

Tarmean commented Feb 22, 2018

The executable fails right away. I tried nuking everything ghc related and reinstalling but that didn't fix it.

The issue does seem to be memory related, the error changes when specifying a max heap size with -M. How much memory are the benchmarks supposed to use?

  ~/Projects/text/benchmarks  ➦ ab90c65 ✚  ~/Projects/text/benchmarks/.stack-work/install/x86_64-linux-nopie/lts-10.6/8.2.2/bin/text-benchmarks +RTS -M9G
text-benchmarks: Heap exhausted;
text-benchmarks: Current maximum heap size is 9663676416 bytes (9216 MB).
text-benchmarks: Use `+RTS -M<size>' to increase it.

@hvr
Copy link
Member

hvr commented Feb 22, 2018

@Tarmean ok, now that you mention it, I see it too... I got a 32GiB ram machine, and didn't realise how much memory is used shortly after startup (but then gets quickly GC it seems). In any case, the lowest value I was able to get it working was with +RTS -M20G. I need to look into when this regressed, as I don't think the benchmark suite always had such a huge memory spike.

PS: #204 is related

@Lysxia Lysxia added the internal No API-level changes label Mar 7, 2021
@sjakobi
Copy link
Member

sjakobi commented Mar 17, 2021

I've frequently seen buildTable in the midst of Core dumps from pandoc. The indices code seemed to significantly contribute to pandoc's terrible compile times (https://gitlab.haskell.org/ghc/ghc/-/issues/18010). At least I've been able to reduce the compile times substantially by actively preventing splitOn (which uses indices) from being inlined on overloaded string literals. See jgm/doclayout#1 and jgm/doclayout#2.

I strongly suspect that the terrible string matching performance observed in haskell/bytestring#307 (comment) is also related to this issue.

@sjakobi
Copy link
Member

sjakobi commented Mar 17, 2021

This is caused by an interaction between the INLINE pragma on indices and the INLINE [1] pragma on isInfixOf. For some reason this combination keeps buildTable 0 0 (nlen-2) from being floated out.

It might be good to report this on GHC's issue tracker. There might be a compiler bug hiding here.

@Tarmean
Copy link
Author

Tarmean commented Mar 17, 2021

I still think that removing the INLINE pragma should improve performance in most cases. Not at all certain anymore that nested INLINE pragmas are at fault.

The interaction between the bang pattern on buildTable and the INLINE pragma for some reason does seem to prevent buildTable from being floated out when indices is called in a loop. But buildTable also isn't floated when indices isn't inlined so that doesn't explain the weird performance regression on -O2.

Guess I'm gonna try if this still can be reproduced with profiling builds tomorrow.

@Bodigrim Bodigrim marked this pull request as draft February 28, 2023 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal No API-level changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants