Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong conversion from double to int64 with SAFE_HEAP flag #1110

Open
metalpavel opened this issue Jul 24, 2017 · 12 comments
Open

Wrong conversion from double to int64 with SAFE_HEAP flag #1110

metalpavel opened this issue Jul 24, 2017 · 12 comments

Comments

@metalpavel
Copy link

Hi,
I create example for this issue:

#include <iostream>
#include <cmath>

int64_t __attribute__ ((noinline)) toSInteger64(double x) {
  return (x > (double)std::numeric_limits<int64_t>::min() &&
          x < (double)std::numeric_limits<int64_t>::max())
             ? (int64_t)x
             : (std::signbit(x) ? std::numeric_limits<int64_t>::min()
                                : std::numeric_limits<int64_t>::max());
}

int main(int argc, char ** argv) {
    double a = -123.0 * double(argc + 1);
    int64_t b = toSInteger64(a);
    std::cout << "result: " << b << std::endl;
    return 0;
}

Compile command:
em++ hello.cpp -O3 -s WASM=1 --emit-symbol-map --bind -s TOTAL_MEMORY=67108864 -s ALLOW_MEMORY_GROWTH=1 --memory-init-file 1 -g -s SAFE_HEAP=1 -s BINARYEN=1 -o hello.html

generate function:
(func $__Z12toSInteger64d (param $0 f64) (result i32)
and it crashes at 32.trunc_u/f64

without SAFE_HEAP it generates correct code:
(func $__Z12toSInteger64d (param $0 f64) (result i64)
with correct conversion i64.trunc_s/f64

@kripken
Copy link
Member

kripken commented Jul 24, 2017

This is a corner case I'm not sure what to do with. What's going on is that

  1. We decided to allow wasm traps by default. It's fast to emit a potentially-trapping i32.trunc_u/f64 (what traps here) rather than guard against it trapping.
  2. Enabling safe-heap makes it do some more int/double conversions, and one of them happens to hit this trap (nothing to do with safe-heap, just that it instruments the code in ways that avoid optimizations removing what would otherwise trap). This is bad since it makes safe-heap behave differently than normal mode.

You can work around this for now by building with -s "BINARYEN_TRAP_MODE='clamp'". Maybe we should just enable that to avoid safe-heap trapping, but that changes safe-heap in another way, so I'm not sure what the best solution is.

@kripken
Copy link
Member

kripken commented Jul 24, 2017

Meanwhile I documented the issue better on

https://github.com/kripken/emscripten/wiki/WebAssembly#debugging

Not sure yet what we can do aside from documenting this. A tricky point here that we may want to do something about is the interaction between the two optimizers. What can happen is

  • The user builds normally, which means we allow traps in wasm.
  • The user builds in a mode like SAFE_HEAP which enables the asm.js optimizer.
  • The asm.js optimizer assumes JavaScript semantics - no traps - and can do things that depend on that.
  • The code is converted to wasm, where it traps.

So the problem here is that the asm.js optimizer is unaware of the wasm trap mode. This is rarely an issue as we don't run the asm.js optimizer anymore, aside from special modes like SAFE_HEAP.

Perhaps when we allow trapping in wasm but are forced to run the asm.js optimizer, we should issue a warning? Or perhaps we should disallow traps in wasm when doing so?

@metalpavel
Copy link
Author

Thank you for explanation. Warning definitely should be, but also this is issue because it generation wrong code. Why it generate trunc_uinstead trunc_s?

@kripken
Copy link
Member

kripken commented Jul 25, 2017

The truncation operation uses >>>, which is an unsigned operation. For example, -1 >>> 0 => 4294967295.

@metalpavel
Copy link
Author

Functions like uint32_t wasm::toUInteger32(double x) with optimizations will be useless unless you add __attribute__ ((optnone)).

For example:

#include <iostream>
#include <cmath>

uint32_t toUInteger32(double x) {
  return std::signbit(x) ? 0 : (x < std::numeric_limits<uint32_t>::max()
                                    ? (uint32_t)x
                                    : std::numeric_limits<uint32_t>::max());
}

int main(int argc, char ** argv) {
    double a = argc > 0 ? INFINITY : 123;

    std::cout << "a: " << a << std::endl;

    uint32_t b = toUInteger32(a);

    std::cout << "b: " << b << std::endl;

    return 0;
}

Will crash with Uncaught RuntimeError: integer result unrepresentable. Without optimizations it's ok __attribute__ ((optnone)) uint32_t toUInteger32(double x).
Command: emcc hello.cpp -O3 -s WASM=1 --emit-symbol-map --bind -s TOTAL_MEMORY=67108864 -s ALLOW_MEMORY_GROWTH=1 --memory-init-file 1 -o hello.html

@kripken
Copy link
Member

kripken commented Jul 31, 2017

Yeah, that's a good example of the same issue. I verified -s "BINARYEN_TRAP_MODE='clamp'" fixes that as well.

@metalpavel
Copy link
Author

Unfortunately -s "BINARYEN_TRAP_MODE='clamp'" is bad for performance. In my app fps decreasing near for 2 times with this option. For now, using unoptimized function in several cases is better.

@binji
Copy link
Member

binji commented Aug 2, 2017

@metalpavel you may want to follow along with the non-trapping-float-to-int proposal too. They're not available yet, but should help with this case.

@kripken
Copy link
Member

kripken commented Aug 2, 2017

2 times slower is very bad. @metalpavel, can you maybe create a standalone benchmark of what you are seeing? Or if not, can you profile it and see what's going on? One possibility is that we could add separate flags for clamping different operations, this was proposed in the past but we didn't have data that supported it yet.

@metalpavel
Copy link
Author

Here is my simply benchmark with different types of floating points operations casts:

#include <iostream>
#include <vector>
#include <chrono>

const size_t COUNT = 10000000;

std::vector<double> doubles(COUNT);
std::vector<float> floats(COUNT);
std::vector<uint64_t> ints64(COUNT);
std::vector<uint32_t> ints32(COUNT);
std::vector<int64_t> results(COUNT);

void test() {
    double d = 0.0;
    float f = 0.f;
    uint64_t i64 = 0;
    uint32_t i32 = 0;

    for(size_t i = 0; i < COUNT; ++i) {
        doubles[i] = d += 0.25;
        floats[i] = f += 0.25f;
        ints64[i] = i64 += 1;
        ints32[i] = i32 += 1;
    }

    for(size_t i = 0; i < COUNT; ++i) {
        ints64[i] += doubles[i] * 0.25;
        ints32[i] += doubles[i] * 0.25;
        ints64[i] += floats[i] * 0.25f;
        ints32[i] += floats[i] * 0.25f;

        results[i] = ((doubles[i] * ints32[i]) + (floats[i] * ints32[i])) -
                     ((doubles[i] * ints64[i]) + (floats[i] * ints64[i]));
    }
}

int main(int argc, char ** argv) {

    typedef std::chrono::high_resolution_clock Clock;
    typedef std::chrono::time_point<Clock> TimePoint;

    const int TEST_COUNTS = 100;
    uint32_t sum = 0;

    for (int i = 0; i < TEST_COUNTS; ++i) {

        TimePoint start = Clock::now();

        test();

        TimePoint end = Clock::now();

        uint32_t duration =
            std::chrono::duration_cast<std::chrono::milliseconds> (
                end - start).count();

        std::cout << "step " << i << " time: " << duration << "ms"
                  << std::endl;

        sum += duration;
    }

    std::cout << "average: " << sum / TEST_COUNTS << "ms" << std::endl;

    return 0;
}

Tests:

Model Name: MacBook Pro
Model Identifier: MacBookPro11,4
Processor Name:   Intel Core i7
Processor Speed:  2,2 GHz
Number of Processors: 1
Total Number of Cores:    4
L2 Cache (per Core):  256 KB
L3 Cache: 6 MB
Memory:   16 GB
System Version: macOS 10.12.6 (16G29)

----------------------

Chrome (Version 60.0.3112.78 (Official Build) (64-bit)) average:

205ms: em++ hello.cpp -O3 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
346ms: em++ hello.cpp -O3 -s WASM=1 -s TOTAL_MEMORY=16777216 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
~40% diff

205ms: em++ hello.cpp -O2 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
347ms: em++ hello.cpp -O2 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
~40% diff

227ms: em++ hello.cpp -O1 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
375ms: em++ hello.cpp -O1 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
~40% diff

267ms: em++ hello.cpp -O0 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
424ms: em++ hello.cpp -O0 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
With memory grow up at starting to: 536870912 !!!
~40% diff

----------------------

Firefox (54.0.1 (64-bit)) average:

257ms: em++ hello.cpp -O3 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
389ms: em++ hello.cpp -O3 -s WASM=1 -s TOTAL_MEMORY=16777216 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
~35% diff

260ms: em++ hello.cpp -O2 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
390ms: em++ hello.cpp -O2 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
~35% diff

269ms: em++ hello.cpp -O1 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
373ms: em++ hello.cpp -O1 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
~28% diff

306ms: em++ hello.cpp -O0 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html
411ms: em++ hello.cpp -O0 -s WASM=1 -s TOTAL_MEMORY=33554432 -s ALLOW_MEMORY_GROWTH=1 -o hello.html -s "BINARYEN_TRAP_MODE='clamp'"
With memory grow up at starting to: 536870912 !!!
~25% diff

----------------------

Safari Technology Preview didn't made it =(

@kripken kripken mentioned this issue Aug 8, 2017
@kripken
Copy link
Member

kripken commented Aug 8, 2017

Thanks for the benchmark. Some numbers about inlining are in #1125 , looks like we can get rid of most of the overhead that way, but not all.

@dschuff
Copy link
Member

dschuff commented Nov 14, 2017

See also emscripten-core/emscripten#5498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants