Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
ND4J's reduced sum is actually slower than Compute.scala on large arrays
  • Loading branch information
Atry authored Mar 30, 2018
1 parent 7405139 commit dc522b0
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,6 @@ Some information can be found in the benchmark result:
* Apparently, Compute.scala supports both NVIDIA GPU and AMD GPU, while ND4J does not support AMD GPU.
* Compute.scala is faster than ND4J on large arrays or complex expressions.
* ND4J is faster than Compute.scala when performing one simple primary operation on very small arrays.
* ND4J's reduced sum is faster than Compute.scala.
* ND4J's `permute` and `broadcast` are extremely slow, causing very low score in the convolution benchmark (unlike this benchmark, Deeplearning4j's convolution operation internally uses some undocumented variant of `permute` and `broadcast` in ND4J, which are not extremely slow).

## Future work
Expand All @@ -351,4 +350,4 @@ Now this project is only a minimum viable product. Many important features are s
* Add more OpenCL math functions ([#101](https://github.com/ThoughtWorksInc/Compute.scala/issues/101)).
* Further optimization of performance ([#62, #103](https://github.com/ThoughtWorksInc/Compute.scala/labels/performance)).

Contribution is welcome. Check [good first issues](https://github.com/ThoughtWorksInc/Compute.scala/labels/good%20first%20issue) to start hacking.
Contribution is welcome. Check [good first issues](https://github.com/ThoughtWorksInc/Compute.scala/labels/good%20first%20issue) to start hacking.

0 comments on commit dc522b0

Please sign in to comment.