From cc316c52a846be5998666ab948be6cb84db6c9c7 Mon Sep 17 00:00:00 2001 From: Mamy Ratsimbazafy Date: Sun, 24 Sep 2017 18:48:47 +0200 Subject: [PATCH] Update of documentation for 0.2.0 --- README.md | 11 +- docs/Linear algebra notation comparison.md | 4 +- docs/README_0.2.0.rst | 983 +++++++++++++++++++++ docs/autogen_nim_API.nim | 442 ++++++--- 4 files changed, 1300 insertions(+), 140 deletions(-) create mode 100644 docs/README_0.2.0.rst diff --git a/README.md b/README.md index 163df3d2c..e74e794f4 100644 --- a/README.md +++ b/README.md @@ -58,13 +58,14 @@ Putting a research model in production, on a drone or as a webservice for exampl All those pain points may seem like a huge undertaking however thanks to the Nim language, we can have Arraymancer: - Be as fast as C - Accelerated routines with Intel MKL/OpenBLAS or even NNPACK -- Access to CUDA and reusing existing Torch, Tensorflow or Nervana Neon kernels -- A Python-like syntax with custom operators `a * b` for tensor multiplication instead of `a.dot(b)` (Numpy/Tensorflow) or `a.mm(b)` (Torch) and Numpy-like slicing ergonomics `t[0..4, 2..10|2]` +- Access to CUDA and generate custom CUDA kernels on the fly via metaprogramming. +- A Python-like syntax with custom operators `a * b` for tensor multiplication instead of `a.dot(b)` (Numpy/Tensorflow) or `a.mm(b)` (Torch) +- Numpy-like slicing ergonomics `t[0..4, 2..10|2]` ## Future ambitions Because apparently to be successful you need a vision, I would like Arraymancer to be: - The go-to tool for Deep Learning video processing. I.e. `vid = load_video("./cats/youtube_cat_video.mkv")` -- Target javascript, WebAssembly, ARM devices, AMD Rocm, OpenCL. +- Target javascript, WebAssembly, Apple Metal, ARM devices, AMD Rocm, OpenCL, you name it. - Target cryptominers FPGAs because they drove the price of GPUs for honest deep-learners too high. ## Support (Types, OS, Hardware) @@ -95,7 +96,7 @@ For now Arraymancer is still at the ndarray stage, however a [vision package](ht ### Speed -On the demo benchmark, Arraymancer already reach speeds with comparable to Torch on logistic regression on OpenBLAS, though further MKL optimization are possible (batched matmul probably): +On the demo benchmark, Arraymancer already reach speeds with comparable to Torch on logistic regression on OpenBLAS, though further MKL optimizations are possible (batched matmul probably): | Library | Timing | | ------ | ------ | @@ -135,7 +136,7 @@ Here is a comparative table, not that this feature set is developing very rapidl | Iterating on a Tensor |[x]|[]| | Slicing a Tensor |[x]|[x]| | Slice mutation `a[1,_] = 10` |[x]|[]| -| Comparison `==`|[x]|[]| +| Comparison `==`|[x]| Coming soon| | Element-wise basic operations|[x]|[x]| | Universal functions |[x]|[x]| | Automatically broadcasted operations |[x]| Coming soon| diff --git a/docs/Linear algebra notation comparison.md b/docs/Linear algebra notation comparison.md index 235f2ae30..568dbba4f 100644 --- a/docs/Linear algebra notation comparison.md +++ b/docs/Linear algebra notation comparison.md @@ -1,8 +1,8 @@ | Language/lib | Normal matmul | element-wise matmul (Hadamard) | vec-vec dot product | mat-vec multiplication| | ------------- | ---------------------------- | --- | --- | --- | -| Arraymancer | A * B | \|*\| | A * B | A * B | +| Arraymancer | A * B | .* | dot(A, B) | A * B | | neo/linalg | A * B | \|*\| | A * B | A * B | -| Julia | A * B | .* | | dot(A, B) | A * B | +| Julia & Matlab | A * B | .* | dot(A, B) | A * B | | Numpy ndarray| np.dot(A, B) or np.matmul(A, B) or A @ B| np.multiply(A, B) or A * B | np.dot(A, B) or np.inner(A, B) | np.dot(A, B) | | R | A %*% B | A * B | A %*% B or dot(A, B)| A %*% B | | Tensorflow | tf.matmul(A, B) or A @ B | tf.multiply(A, B) | tf.matmul(a, b, transpose_a=False, transpose_b=True) or tf.tensordot(a, b, 1) or tf.einsum('i,i->', x, y) | same reshape/transpose/einsum shenanigans as vec-vec| diff --git a/docs/README_0.2.0.rst b/docs/README_0.2.0.rst new file mode 100644 index 000000000..caf63228a --- /dev/null +++ b/docs/README_0.2.0.rst @@ -0,0 +1,983 @@ +|Join the chat at https://gitter.im/Arraymancer/Lobby| |Linux Build +Status (Travis)| |Windows build status (Appveyor)| |License| |Stability| + +Arraymancer - A n-dimensional tensor (ndarray) library +====================================================== + +Arraymancer is a tensor (N-dimensional array) project. The main focus is +providing a fast and ergonomic CPU and GPU ndarray library on which to +build a numerical computing and in particular a deep learning ecosystem. + +The library is inspired by Numpy and PyTorch. + +.. raw:: html + + + +- `Arraymancer - A n-dimensional tensor (ndarray) + library <#arraymancer---a-n-dimensional-tensor-ndarray-library>`__ + + - `Why Arraymancer <#why-arraymancer>`__ + - `Future ambitions <#future-ambitions>`__ + - `Support (Types, OS, Hardware) <#support-types-os-hardware>`__ + - `Limitations: <#limitations>`__ + - `Installation: <#installation>`__ + - `Features <#features>`__ + + - `Speed <#speed>`__ + - `Safe vs unsafe: copy vs view <#safe-vs-unsafe-copy-vs-view>`__ + - `Tensors on CPU and on Cuda <#tensors-on-cpu-and-on-cuda>`__ + - `Tensor properties <#tensor-properties>`__ + - `Tensor creation <#tensor-creation>`__ + - `Accessing and modifying a + value <#accessing-and-modifying-a-value>`__ + - `Copying <#copying>`__ + - `Slicing <#slicing>`__ + - `Slice mutations <#slice-mutations>`__ + - `Shapeshifting <#shapeshifting>`__ + + - `Transposing <#transposing>`__ + - `Reshaping <#reshaping>`__ + - `Permuting - Reordering + dimension <#permuting---reordering-dimension>`__ + - `Concatenation <#concatenation>`__ + + - `Universal functions <#universal-functions>`__ + - `Type conversion <#type-conversion>`__ + - `Matrix and vector + operations <#matrix-and-vector-operations>`__ + - `Broadcasting <#broadcasting>`__ + - `Iterators <#iterators>`__ + - `Higher-order functions (Map, Reduce, + Fold) <#higher-order-functions-map-reduce-fold>`__ + + - ```map``, ``apply``, ``map2``, + ``apply2`` <#map-apply-map2-apply2>`__ + - ```reduce`` on the whole Tensor or along an + axis <#reduce-on-the-whole-tensor-or-along-an-axis>`__ + - ```fold`` on the whole Tensor or along an + axis <#fold-on-the-whole-tensor-or-along-an-axis>`__ + + - `Aggregate and Statistics <#aggregate-and-statistics>`__ + +.. raw:: html + + + +Why Arraymancer +--------------- + +The deep learning frameworks are currently in two camps: + +- Research: Theano, Tensorflow, Keras, Torch, PyTorch +- Production: Caffe, Darknet, (Tensorflow) + +Putting a research model in production, on a drone or as a webservice +for example, is difficult: + +- Managing Python versions and environment is hell +- Python data science ecosystem does not run on embedded devices + (Nvidia Tegra/drones) or mobile phones +- Transforming a tuned research model (in Python) to a usable Caffe or + Darknet model (in C) is almost impossible. PMML is supposed to be the + "common" XML description of ML models but is not really supported by + anyone. + **Edit - Sept 7, 2017: Microsoft and Facebook are announcing `Open + Neural Network + Exchange `__** +- Tensorflow is supposed to bridge the gap between research and + production but its syntax and ergonomics are a pain to work with. +- Deployed models are static, there is no interface to add a new + observation/training sample to any framework. The end goal is to use + a model as a webservice. + +All those pain points may seem like a huge undertaking however thanks to +the Nim language, we can have Arraymancer: + +- Be as fast as C +- Accelerated routines with Intel MKL/OpenBLAS or even NNPACK +- Access to CUDA and generate custom CUDA kernels on the fly via + metaprogramming. +- A Python-like syntax with custom operators ``a * b`` for tensor + multiplication instead of ``a.dot(b)`` (Numpy/Tensorflow) or + ``a.mm(b)`` (Torch) +- Numpy-like slicing ergonomics ``t[0..4, 2..10|2]`` + +Future ambitions +---------------- + +Because apparently to be successful you need a vision, I would like +Arraymancer to be: + +- The go-to tool for Deep Learning video processing. I.e. + ``vid = load_video("./cats/youtube_cat_video.mkv")`` +- Target javascript, WebAssembly, Apple Metal, ARM devices, AMD Rocm, + OpenCL, you name it. +- Target cryptominers FPGAs because they drove the price of GPUs for + honest deep-learners too high. + +Support (Types, OS, Hardware) +----------------------------- + +Arraymancer's tensors supports arbitrary types (floats, strings, objects +...). + +| Arraymancer run anywhere you can compile C code. Linux, MacOS are + supported, Windows should work too as Appveyor (Continuous Integration + for Windows) never flash red. +| Optionally you can compile Arraymancer with Cuda support. + +Note: Arraymancer Tensors and CudaTensors are tensors in the machine +learning sense (multidimensional array) not in the mathematical sense +(describe transformation laws) + +Limitations: +------------ + +EXPERIMENTAL: Arraymancer may summon Ragnarok and cause the heat death +of the Universe. + +#. Display of 5-dimensional or more tensors is not implemented. (To be + honest Christopher Nolan had the same issue in Interstellar) + +Installation: +------------- + +Nim is available in some Linux repositories and on Homebrew for macOS. + +I however recommend installing Nim in your user profile via +```choosenim`` `__. Once choosenim +installed Nim, you can ``nimble arraymancer`` which will pull +arraymancer and all its dependencies. + +Features +-------- + +Detailed API is available on Arraymancer official +`documentation `__. + +For now Arraymancer is still at the ndarray stage, however a `vision +package `__ and a +`machine learning demo `__ +have started. + +Speed +~~~~~ + +On the demo benchmark, Arraymancer already reach speeds with comparable +to Torch on logistic regression on OpenBLAS, though further MKL +optimizations are possible (batched matmul probably): + ++------------------------+------------+ +| Library | Timing | ++========================+============+ +| Torch CUDA | 582 ms | ++------------------------+------------+ +| Torch MKL | 1417ms | ++------------------------+------------+ +| Torch OpenBLAS | 13044 ms | ++------------------------+------------+ +| Numpy MKL | 17906 ms | ++------------------------+------------+ +| Arraymancer MKL | 2325 ms | ++------------------------+------------+ +| Arraymancer OpenBLAS | 12502 ms | ++------------------------+------------+ + +:: + + Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz GeForce GTX 1080 Ti ArchLinux (kernel 4.9.51-1-lts, glibc 2.26) GCC 7.2.0 MKL 2017.17.0.4.4 OpenBLAS 0.2.20 CUDA 8.0.61 + +| In the future, Arraymancer will leverage Nim compiler to automatically + fuse operations +| like ``alpha A*B + beta C`` or a combination of element-wise + operations. This is already done to fuse ``toTensor`` and ``reshape``. + +Safe vs unsafe: copy vs view +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Compared to most frameworks, Arraymancer choose to be safe by default +but allows ``unsafe`` operations to optimize for speed and memory. The +tensor resulting from ``unsafe`` operations (no-copy operations) share +the underlying storage with the input tensor (also called views or +shallow copies). This is often a surprise for beginners. + +In the future Arraymancer will leverage Nim compiler to automatically +detect when an original is not used and modified anymore to +automatically replace it by the ``unsafe`` equivalent. + +For CudaTensors, operations are unsafe by default (including assignmnt +with ``=``) while waiting for further Nim optimizations for manually +managed memory. CudaTensors can be copied safely with ``.clone`` + +Tensors on CPU and on Cuda +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +| Tensors and CudaTensors do not have the same features implemented yet. +| Also Cuda Tensors can only be float32 or float64 while Cpu Tensor can + be integers, string, boolean or any custom object. + +Here is a comparative table, not that this feature set is developing +very rapidly. + ++---------+---------+-------------+ +| Action | Tensor | CudaTensor | ++=========+=========+=============+ +| Accessi | [x] | [x] | +| ng | | | +| tensor | | | +| propert | | | +| ies | | | ++---------+---------+-------------+ +| Tensor | [x] | by | +| creatio | | converting | +| n | | a cpu | +| | | Tensor | ++---------+---------+-------------+ +| Accessi | [x] | [] | +| ng | | | +| or | | | +| modifyi | | | +| ng | | | +| a | | | +| single | | | +| value | | | ++---------+---------+-------------+ +| Iterati | [x] | [] | +| ng | | | +| on a | | | +| Tensor | | | ++---------+---------+-------------+ +| Slicing | [x] | [x] | +| a | | | +| Tensor | | | ++---------+---------+-------------+ +| Slice | [x] | [] | +| mutatio | | | +| n | | | +| ``a[1,_ | | | +| ] = 10` | | | +| ` | | | ++---------+---------+-------------+ +| Compari | [x] | [] | +| son | | | +| ``==`` | | | ++---------+---------+-------------+ +| Element | [x] | [x] | +| -wise | | | +| basic | | | +| operati | | | +| ons | | | ++---------+---------+-------------+ +| Univers | [x] | [x] | +| al | | | +| functio | | | +| ns | | | ++---------+---------+-------------+ +| Automat | [x] | Coming soon | +| ically | | | +| broadca | | | +| sted | | | +| operati | | | +| ons | | | ++---------+---------+-------------+ +| Matrix- | [x] | [x] Note | +| Matrix | | that sliced | +| and | | CudaTensors | +| Matrix- | | must | +| Vector | | explicitly | +| multipl | | be made | +| ication | | contiguous | +| | | for the | +| | | moment | ++---------+---------+-------------+ +| Display | [x] | [x] | +| ing | | | +| a | | | +| tensor | | | ++---------+---------+-------------+ +| Higher- | [x] | Apply, but | +| order | | only | +| functio | | internally | +| ns | | | +| (map, | | | +| apply, | | | +| reduce, | | | +| fold) | | | ++---------+---------+-------------+ +| Transpo | [x] | [x] | +| sing | | | ++---------+---------+-------------+ +| Convert | [x] | [x] | +| ing | | | +| to | | | +| contigu | | | +| ous | | | ++---------+---------+-------------+ +| Reshapi | [x] | [] | +| ng | | | ++---------+---------+-------------+ +| Explici | [x] | Coming soon | +| t | | | +| broadca | | | +| st | | | ++---------+---------+-------------+ +| Permuti | [x] | Coming soon | +| ng | | | +| dimensi | | | +| ons | | | ++---------+---------+-------------+ +| Concate | [x] | [] | +| nating | | | +| tensors | | | +| along | | | +| existin | | | +| g | | | +| dimensi | | | +| on | | | ++---------+---------+-------------+ +| Squeezi | [x] | Coming soon | +| ng | | | +| singlet | | | +| on | | | +| dimensi | | | +| on | | | ++---------+---------+-------------+ +| Slicing | [x] | Coming soon | +| + | | | +| squeezi | | | +| ng | | | ++---------+---------+-------------+ + +Tensor properties +~~~~~~~~~~~~~~~~~ + +Tensors have the following properties: + +- ``rank``: + + - 0 for scalar (unfortunately cannot be stored) + - 1 for vector + - 2 for matrices + - N for N-dimension array + +- ``shape``: a sequence of the tensor dimensions along each axis. + +Next properties are technical and there for completeness + +- ``strides``: a sequence of numbers of steps to get the next item + along a dimension. +- ``offset``: the first element of the tensor + +.. code:: nim + + import arraymancer + + let d = [[1, 2, 3], [4, 5, 6]].toTensor() + + echo d + # Tensor of shape 2x3 of type "int" on backend "Cpu" + # |1 2 3| + # |4 5 6| + + echo d.rank # 2 + echo d.shape # @[2, 3] + echo d.strides # @[3, 1] => Next row is 3 elements away in memory while next column is 1 element away. + echo d.offset # 0 + +Tensor creation +~~~~~~~~~~~~~~~ + +The canonical way to initialize a tensor is by converting a seq of seq +of ... or an array of array of ... into a tensor using ``toTensor``. + +``toTensor`` supports deep nested sequences and arrays, even sequence of +arrays of sequences. + +.. code:: nim + + import arraymancer + + let c = [ + [ + [1,2,3], + [4,5,6] + ], + [ + [11,22,33], + [44,55,66] + ], + [ + [111,222,333], + [444,555,666] + ], + [ + [1111,2222,3333], + [4444,5555,6666] + ] + ].toTensor() + echo c + + # Tensor of shape 4x2x3 of type "int" on backend "Cpu" + # | 1 2 3 | 11 22 33 | 111 222 333 | 1111 2222 3333| + # | 4 5 6 | 44 55 66 | 444 555 666 | 4444 5555 6666| + +``newTensor`` procedure can be used to initialize a tensor of a specific +shape with a default value. (0 for numbers, false for bool ...) + +``zeros`` and ``ones`` procedures create a new tensor filled with 0 and +1 respectively. + +``zeros_like`` and ``ones_like`` take an input tensor and output a +tensor of the same shape but filled with 0 and 1 respectively. + +.. code:: nim + + let e = newTensor([2, 3], bool) + # Tensor of shape 2x3 of type "bool" on backend "Cpu" + # |false false false| + # |false false false| + + let f = zeros([4, 3], float) + # Tensor of shape 4x3 of type "float" on backend "Cpu" + # |0.0 0.0 0.0| + # |0.0 0.0 0.0| + # |0.0 0.0 0.0| + # |0.0 0.0 0.0| + + let g = ones([4, 3], float) + # Tensor of shape 4x3 of type "float" on backend "Cpu" + # |1.0 1.0 1.0| + # |1.0 1.0 1.0| + # |1.0 1.0 1.0| + # |1.0 1.0 1.0| + + let tmp = [[1,2],[3,4]].toTensor() + let h = tmp.zeros_like + # Tensor of shape 2x2 of type "int" on backend "Cpu" + # |0 0| + # |0 0| + + let i = tmp.ones_like + # Tensor of shape 2x2 of type "int" on backend "Cpu" + # |1 1| + # |1 1| + +Accessing and modifying a value +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Tensors value can be retrieved or set with array brackets. + +.. code:: nim + + var a = toSeq(1..24).toTensor().reshape(2,3,4) + + echo a + # Tensor of shape 2x3x4 of type "int" on backend "Cpu" + # | 1 2 3 4 | 13 14 15 16| + # | 5 6 7 8 | 17 18 19 20| + # | 9 10 11 12 | 21 22 23 24| + + echo a[1, 1, 1] + # 18 + + a[1, 1, 1] = 999 + echo a + # Tensor of shape 2x3x4 of type "int" on backend "Cpu" + # | 1 2 3 4 | 13 14 15 16| + # | 5 6 7 8 | 17 999 19 20| + # | 9 10 11 12 | 21 22 23 24| + +Copying +~~~~~~~ + +Tensor copy is deep by default (all the data is copied). In the majority +of cases Nim compiler will detect and avoid useless copies. + +``unsafeView`` can be used on a Tensor to enforce shallow copying (data +is shared between the 2 variables). Most shape manipulation proc also +have an ``unsafe`` version. + +Slicing +~~~~~~~ + +Arraymancer supports the following slicing syntax. It allows for +selecting dimension subsets, whole dimension, stepping (one out of 2 +rows), reversing dimensions, counting from the end. + +.. code:: nim + + import math, arraymancer, future + + const + x = @[1, 2, 3, 4, 5] + y = @[1, 2, 3, 4, 5] + + var + vandermonde: seq[seq[int]] + row: seq[int] + + vandermonde = newSeq[seq[int]]() + + for i, xx in x: + row = newSeq[int]() + vandermonde.add(row) + for j, yy in y: + vandermonde[i].add(xx^yy) + + let foo = vandermonde.toTensor() + + echo foo + + # Tensor of shape 5x5 of type "int" on backend "Cpu" + # |1 1 1 1 1| + # |2 4 8 16 32| + # |3 9 27 81 243| + # |4 16 64 256 1024| + # |5 25 125 625 3125| + + echo foo[1..2, 3..4] # slice + + # Tensor of shape 2x2 of type "int" on backend "Cpu" + # |16 32| + # |81 243| + + echo foo[3.._, _] # Span slice + + # Tensor of shape 2x5 of type "int" on backend "Cpu" + # |4 16 64 256 1024| + # |5 25 125 625 3125| + + echo foo[_..^3, _] # Slice until (inclusive, consistent with Nim) + + # Tensor of shape 3x5 of type "int" on backend "Cpu" + # |1 1 1 1 1| + # |2 4 8 16 32| + # |3 9 27 81 243| + + echo foo[_.._|2, _] # Step + + # Tensor of shape 3x5 of type "int" on backend "Cpu" + # |1 1 1 1 1| + # |3 9 27 81 243| + # |5 25 125 625 3125| + + echo foo[^1..0|-1, _] # Reverse step + + # Tensor of shape 5x5 of type "int" on backend "Cpu" + # |5 25 125 625 3125| + # |4 16 64 256 1024| + # |3 9 27 81 243| + # |2 4 8 16 32| + # |1     1       1       1       1| + +Slice mutations +~~~~~~~~~~~~~~~ + +Slices can also be mutated with a single value, a nested seq or array, a +tensor or tensor slice. + +.. code:: nim + + import math, arraymancer, future + + const + x = @[1, 2, 3, 4, 5] + y = @[1, 2, 3, 4, 5] + + var + vandermonde: seq[seq[int]] + row: seq[int] + + vandermonde = newSeq[seq[int]]() + + for i, xx in x: + row = newSeq[int]() + vandermonde.add(row) + for j, yy in y: + vandermonde[i].add(xx^yy) + + var foo = vandermonde.toTensor() + + echo foo + + # Tensor of shape 5x5 of type "int" on backend "Cpu" + # |1 1 1 1 1| + # |2 4 8 16 32| + # |3 9 27 81 243| + # |4 16 64 256 1024| + # |5 25 125 625 3125| + + # Mutation with a single value + foo[1..2, 3..4] = 999 + + echo foo + # Tensor of shape 5x5 of type "int" on backend "Cpu" + # |1 1 1 1 1| + # |2 4 8 999 999| + # |3 9 27 999 999| + # |4 16 64 256 1024| + # |5 25 125 625 3125| + + # Mutation with nested array or nested seq + foo[0..1,0..1] = [[111, 222], [333, 444]] + + echo foo + # Tensor of shape 5x5 of type "int" on backend "Cpu" + # |111 222 1 1 1| + # |333 444 8 999 999| + # |3 9 27 999 999| + # |4 16 64 256 1024| + # |5 25 125 625 3125| + + # Mutation with a tensor or tensor slice. + foo[^2..^1,2..4] = foo[^1..^2|-1, 4..2|-1] + + echo foo + # Tensor of shape 5x5 of type "int" on backend "Cpu" + # |111 222 1 1 1| + # |333 444 8 999 999| + # |3 9 27 999 999| + # |4 16 3125 625 125| + # |5 25 1024 256 64| + +Shapeshifting +~~~~~~~~~~~~~ + +Transposing +^^^^^^^^^^^ + +The ``transpose`` function will reverse the dimensions of a tensor. + +Reshaping +^^^^^^^^^ + +The ``reshape`` function will change the shape of a tensor. The number +of elements in the new and old shape must be the same. + +For example: + +.. code:: nim + + let a = toSeq(1..24).toTensor().reshape(2,3,4) + + # Tensor of shape 2x3x4 of type "int" on backend "Cpu" + #  | 1 2 3 4 | 13 14 15 16| + #  | 5 6 7 8 | 17 18 19 20| + #  | 9 10 11 12 | 21 22 23 24| + +Permuting - Reordering dimension +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +| The ``permute`` proc can be used to reorder dimensions. +| Input is a tensor and the new dimension order + +.. code:: nim + + let a = toSeq(1..24).toTensor(Cpu).reshape(2,3,4) + echo a + + # Tensor of shape 2x3x4 of type "int" on backend "Cpu" + # | 1 2 3 4 | 13 14 15 16| + # | 5 6 7 8 | 17 18 19 20| + # | 9 10 11 12 | 21 22 23 24| + + echo a.permute(0,2,1) # dim 0 stays at 0, dim 1 becomes dim 2 and dim 2 becomes dim 1 + + # Tensor of shape 2x4x3 of type "int" on backend "Cpu" + # | 1 5 9 | 13 17 21| + # | 2 6 10 | 14 18 22| + # | 3 7 11 | 15 19 23| + # | 4 8 12 | 16 20 24| + +Concatenation +^^^^^^^^^^^^^ + +Tensors can be concatenated along an axis with the ``concat`` proc. + +.. code:: nim + + import ../arraymancer, sequtils + + + let a = toSeq(1..4).toTensor(Cpu).reshape(2,2) + + let b = toSeq(5..8).toTensor(Cpu).reshape(2,2) + + let c = toSeq(11..16).toTensor(Cpu) + let c0 = c.reshape(3,2) + let c1 = c.reshape(2,3) + + echo concat(a,b,c0, axis = 0) + # Tensor of shape 7x2 of type "int" on backend "Cpu" + # |1 2| + # |3 4| + # |5 6| + # |7 8| + # |11 12| + # |13 14| + # |15 16| + + echo concat(a,b,c1, axis = 1) + # Tensor of shape 2x7 of type "int" on backend "Cpu" + # |1 2 5 6 11 12 13| + # |3 4 7 8 14 15 16| + +Universal functions +~~~~~~~~~~~~~~~~~~~ + +Functions that applies to a single element can work on a whole tensor +similar to Numpy's universal functions. + +3 functions exist: ``makeUniversal``, ``makeUniversalLocal`` and +``map``. + +| ``makeUniversal`` create a a function that applies to each element of + a tensor from any unary function. Most functions from the ``math`` + module have been generalized to tensors with ``makeUniversal(sin)``. +| Furthermore those universal functions are exported and available for + import. + +``makeUniversalLocal`` does not export the universal functions. + +``map`` is more generic and map any function to all element of a tensor. +``map`` works even if the function changes the type of the tensor's +elements. + +.. code:: nim + + echo foo.map(x => x.isPowerOfTwo) # map a function (`=>` comes from the future module ) + + # Tensor of shape 5x5 of type "bool" on backend "Cpu" + # |true true true true true| + # |true true true true true| + # |false false false false false| + # |true true true true true| + # |false false false false false| + + let foo_float = foo.map(x => x.float) + echo ln foo_float # universal function (convert first to float for ln) + + # Tensor of shape 5x5 of type "float" on backend "Cpu" + # |0.0 0.0 0.0 0.0 0.0| + # |0.6931471805599453 1.386294361119891 2.079441541679836 2.772588722239781 3.465735902799727| + # |1.09861228866811 2.19722457733622 3.295836866004329 4.394449154672439 5.493061443340548| + # |1.386294361119891 2.772588722239781 4.158883083359671 5.545177444479562 6.931471805599453| + # |1.6094379124341 3.218875824868201 4.828313737302302 6.437751649736401 8.047189562170502| + +Type conversion +~~~~~~~~~~~~~~~ + +A type conversion fonction ``astype`` is provided for convenience + +.. code:: nim + + let foo_float = foo.astype(float) + +Matrix and vector operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following linear algebra operations are supported for tensors of +rank 1 (vectors) and 2 (matrices): + +- dot product (Vector to Vector) using ``dot`` +- addition and substraction (any rank) using ``+`` and ``-`` +- in-place addition and substraction (any-rank) using ``+=`` and ``-=`` +- multiplication or division by a scalar using ``*`` and ``/`` +- matrix-matrix multiplication using ``*`` +- matrix-vector multiplication using ``*`` +- element-wise multiplication (Hadamard product) using ``.*`` + +Note: Matrix operations for floats are accelerated using BLAS (Intel +MKL, OpenBLAS, Apple Accelerate ...). Unfortunately there is no +acceleration routine for integers. Integer matrix-matrix and +matrix-vector multiplications are implemented via semi-optimized +routines (no naive loops but don't leverage CPU-specific features). + +.. code:: nim + + echo foo_float * foo_float # Accelerated Matrix-Matrix multiplication (needs float) + # Tensor of shape 5x5 of type "float" on backend "Cpu" + # |15.0 55.0 225.0 979.0 4425.0| + # |258.0 1146.0 5274.0 24810.0 118458.0| + # |1641.0 7653.0 36363.0 174945.0 849171.0| + # |6372.0 30340.0 146244.0 710980.0 3478212.0| + # |18555.0 89355.0 434205.0 2123655.0 10436805.0| + +Broadcasting +~~~~~~~~~~~~ + +| Arraymancer supports explicit broadcasting with ``broadcast`` and its + alias ``bc``. +| And supports implicit broadcasting with operations beginning with a + dot: + +Image from Scipy + +|image5| + +.. code:: nim + + let j = [0, 10, 20, 30].toTensor(Cpu).reshape(4,1) + let k = [0, 1, 2].toTensor(Cpu).reshape(1,3) + + echo j .+ k + # Tensor of shape 4x3 of type "int" on backend "Cpu" + # |0 1 2| + # |10 11 12| + # |20 21 22| + # |30 31 32| + +- ``.+``,\ ``.-``, +- ``.*``: broadcasted element-wise matrix multiplication also called + Hadamard product) +- ``./``: broadcasted element-wise division or integer-division +- ``.+=``, ``.-=``, ``.*=``, ``./=``: in-place versions. Only the right + operand is broadcastable. + +Iterators +~~~~~~~~~ + +Tensors can be iterated in the proper order. Arraymancer provides: + +- ``items`` and ``pairs``. ``pairs`` returns the coordinates of the + tensor. + +.. code:: nim + + import ../arraymancer, sequtils + + let a = toSeq(1..24).toTensor(Cpu).reshape(2,3,4) + # Tensor of shape 2x3x4 of type "int" on backend "Cpu" + # | 1 2 3 4 | 13 14 15 16| + # | 5 6 7 8 | 17 18 19 20| + # | 9 10 11 12 | 21 22 23 24| + + for v in a: + echo v + + for coord, v in a: + echo coord + echo v + # @[0, 0, 0] + # 1 + # @[0, 0, 1] + # 2 + # @[0, 0, 2] + # 3 + # @[0, 0, 3] + # 4 + # @[0, 1, 0] + # 5 + # @[0, 1, 1] + # 6 + # @[0, 1, 2] + # 7 + # @[0, 1, 3] + # 8 + # @[0, 2, 0] + # 9 + # ... + +For convenience a ``values`` closure iterator is available for iterator +chaining. ``values`` is equivalent to ``items``. + +| A ``mitems`` iterator is available to directly mutate elements while + iterating. +| An ``axis`` iterator is available to iterate along an axis. + +Higher-order functions (Map, Reduce, Fold) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Arraymancer supports efficient higher-order functions on the whole +tensor or on an axis. + +``map``, ``apply``, ``map2``, ``apply2`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: nim + + a.map(x => x+1) + +or + +.. code:: nim + + proc plusone[T](x: T): T = + x + 1 + a.map(plusone) # Map the function plusone + +Note: for basic operation, you can use implicit broadcasting instead +``a .+ 1`` + +``apply`` is the same as ``map`` but in-place. + +``map2`` and ``apply2`` takes 2 input tensors and respectively, return a +new one or modify the first in-place. + +.. code:: nim + + proc `**`[T](x, y: T): T = # We create a new power `**` function that works on 2 scalars + pow(x, y) + a.map2(`**`, b) + # Or + map2(a, `**`, b) + +``reduce`` on the whole Tensor or along an axis +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``reduce`` apply a function like ``+`` or ``max`` on the whole Tensor[T] +returning a single value T. + +For example: + +- Reducing with ``+`` returns the sum of all elements of teh Tensor. +- Reducing with ``max`` returns the biggest element of the Tensor + +``reduce`` can be applied along an axis, for example the sum along the +rows of a Tensor. + +``fold`` on the whole Tensor or along an axis +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``fold`` is a generalization of ``reduce``. Its starting value is not +the first element of the Tensor. + +It can do anything that reduce can, but also has other tricks because it +is not constrained by the Tensor type or starting value. + +For example: + +- Reducing with ``was_a_odd_and_what_about_b`` and a starting value of + ``true`` returns ``true`` if all elements are odd or ``false`` + otherwise + +Just in case + +.. code:: nim + + proc was_a_odd_and_what_about_b[T: SomeInteger](a: bool, b: T): bool = + return a and (b mod 2 == 1) # a is the result of previous computations, b is the new integer to check. + +Aggregate and Statistics +~~~~~~~~~~~~~~~~~~~~~~~~ + +| ``sum`` and ``mean`` functions are available to compute the sum and + mean of a tensor. +| ``sum`` and ``mean`` can also be computed along an axis with the + ``axis`` argument. + +Generic aggregates on the whole tensor or along an axis can be computed +with ``agg`` and ``agg_inplace`` functions. + +.. |Join the chat at https://gitter.im/Arraymancer/Lobby| image:: https://badges.gitter.im/Arraymancer/Lobby.svg + :target: https://gitter.im/Arraymancer/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge +.. |Linux Build Status (Travis)| image:: https://travis-ci.org/mratsim/Arraymancer.svg?branch=master + :target: https://travis-ci.org/mratsim/Arraymancer +.. |Windows build status (Appveyor)| image:: https://ci.appveyor.com/api/projects/status/github/mratsim/arraymancer?branch=master&svg=true + :target: https://ci.appveyor.com/project/mratsim/arraymancer +.. |License| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg + :target: https://opensource.org/licenses/Apache-2.0 +.. |Stability| image:: https://img.shields.io/badge/stability-experimental-orange.svg +.. |image5| image:: https://scipy.github.io/old-wiki/pages/image004de9e.gif + diff --git a/docs/autogen_nim_API.nim b/docs/autogen_nim_API.nim index 546cc0ce6..a14aa0231 100644 --- a/docs/autogen_nim_API.nim +++ b/docs/autogen_nim_API.nim @@ -2,7 +2,10 @@ ## ====================================================== ## ## Arraymancer is a tensor (N-dimensional array) project. The main focus is -## providing a fast and ergonomic CPU and GPU ndarray library with deep learning and neural network capabilities. +## providing a fast and ergonomic CPU and GPU ndarray library on which to +## build a numerical computing and in particular a deep learning ecosystem. +## +## The library is inspired by Numpy and PyTorch. ## ## ## Why Arraymancer @@ -10,84 +13,203 @@ ## ## The deep learning frameworks are currently in two camps: ## -## - Research: Theano, Tensorflow, Keras, Torch, PyTorch, Mxnet -## - Production: Caffe, Darknet, (Tensorflow, Mxnet) +## - Research: Theano, Tensorflow, Keras, Torch, PyTorch +## - Production: Caffe, Darknet, (Tensorflow) +## +## Putting a research model in production, on a drone or as a webservice +## for example, is difficult: +## +## - Managing Python versions and environment is hell +## - Python data science ecosystem does not run on embedded devices +## (Nvidia Tegra/drones) or mobile phones +## - ~Transforming a tuned research model (in Python) to a usable Caffe or +## Darknet model (in C) is almost impossible. PMML is supposed to be the +## "common" XML description of ML models but is not really supported by +## anyone.~ +## **Edit - Sept 7, 2017**: Microsoft and Facebook are announcing `Open +## Neural Network +## Exchange `__ +## - Tensorflow is supposed to bridge the gap between research and +## production but its syntax and ergonomics are a pain to work with. +## - Deployed models are static, there is no interface to add a new +## observation/training sample to any framework. The end goal is to use +## a model as a webservice. +## +## All those pain points may seem like a huge undertaking however thanks to +## the Nim language, we can have Arraymancer: +## +## - Be as fast as C +## - Accelerated routines with Intel MKL/OpenBLAS or even NNPACK +## - Access to CUDA and generate custom CUDA kernels on the fly via +## metaprogramming. +## - A Python-like syntax with custom operators ``a * b`` for tensor +## multiplication instead of ``a.dot(b)`` (Numpy/Tensorflow) or +## ``a.mm(b)`` (Torch) +## - Numpy-like slicing ergonomics ``t[0..4, 2..10|2]`` +## +## Future ambitions +## ---------------- +## +## Because apparently to be successful you need a vision, I would like +## Arraymancer to be: +## +## - The go-to tool for Deep Learning video processing. I.e. +## ``vid = load_video("./cats/youtube_cat_video.mkv")`` +## - Target javascript, WebAssembly, Apple Metal, ARM devices, AMD Rocm, +## OpenCL, you name it. +## - Target cryptominers FPGAs because they drove the price of GPUs for +## honest deep-learners too high. ## +## Support (Types, OS, Hardware) +## ----------------------------- ## -## Putting a research model in production, on a drone or as a webservice for example, is difficult: +## Arraymancer's tensors supports arbitrary types (floats, strings, objects +## ...). ## -## - Managing Python versions and environment is hell -## - Python does not run on embedded devices or mobile phones -## - Transforming a tuned research model (in Python) to a usable Caffe or Darknet model (in C) is almost impossible. PMML is supposed to be the "common" XML description of ML models but is not really supported by anyone. -## - Tensorflow is supposed to bridge the gap between research and production but its syntax and ergonomics are a pain to work with. -## - Deployed models are static, there is no interface to add a new observation/training sample to any framework. The end goal is to use a model as a webservice. -## - No framework are designed yet with javascript/WebAssembly in mind. -##  -## All those pain points may seem like a huge undertaking however thanks to the Nim language, we can have Arraymancer: +## | Arraymancer run anywhere you can compile C code. Linux, MacOS are +## supported, Windows should work too as Appveyor (Continuous Integration +## for Windows) never flash red. +## | Optionally you can compile Arraymancer with Cuda support. ## -## - Be as fast as C -## - Accelerated routines with Intel MKL/OpenBLAS or even NNPACK -## - Access to CUDA and reusing existing Torch, Tensorflow or Nervana Neon kernels -## - A Python-like syntax with custom operators `a .* b` for tensor multiplication instead of `a.dot(b)` (Numpy/Tensorflow) or `a.mm(b)` (Torch) and Numpy-like slicing ergonomics `t[0..4, 2..10|2]` -## - Target javascript and soon WebAssembly +## Note: Arraymancer Tensors and CudaTensors are tensors in the machine +## learning sense (multidimensional array) not in the mathematical sense +## (describe transformation laws) ## +## Limitations: +## ------------ ## -## Support (Types, OS, Hardware) -## ----------------------------- +## EXPERIMENTAL: Arraymancer may summon Ragnarok and cause the heat death +## of the Universe. +## +## Display of 5-dimensional or more tensors is not implemented. (To be +## honest Christopher Nolan had the same issue in Interstellar) +## +## Installation: +## ------------- ## -## Arraymancer's tensors supports arbitrary types (floats, strings, objects ...). +## Nim is available in some Linux repositories and on Homebrew for macOS. ## -## Arraymancer will target PC and embedded devices running: +## I however recommend installing Nim in your user profile via +## `choosenim `__. Once choosenim +## installed Nim, you can ``nimble arraymancer`` which will pull +## arraymancer and all its dependencies. ## -## - Windows, MacOS, Linux -## - Javascript/WebAssembly browsers -## - X86, X86_64, ARM, Nvidia GPU +## Features +## -------- ## -## Jetson TX1 and embedded devices with GPU are also a target. +## Detailed API is available on Arraymancer official +## `documentation `__. ## -## Provided ROCm (RadeonOpenCompute) can successfully use CUDA code, AMD GPUs will also be supported. -##  -## Magma will be supported for simultaneous computation on CPU + CUDA GPUs. +## For now Arraymancer is still at the ndarray stage, however a `vision +## package `__ and a +## `machine learning demo `__ +## have started. ## -## Currently only CPU backends are working. -##  -## Note: Arraymancer tensors are tensors in the machine learning sense (multidimensional array) not in the mathematical sense (describe transformation laws) -##  +## Speed +## ~~~~~ ## -## Limitations -## ----------- +## On the demo benchmark, Arraymancer already reach speeds with comparable +## to Torch on logistic regression on OpenBLAS, though further MKL +## optimizations are possible (batched matmul probably): ## -## EXPERIMENTAL: Arraymancer may summon Ragnarok and cause the heat death of the Universe. -##  -## - There is no optimized routine for integer matrix and vector multiplication. I wrote my own for integer matrix-matrix multplication but matrix-vector is not implemented. -## - Display of 5-dimensional or more tensors is not implemented. +## ------------------------ ------------ +## Library Timing +## ------------------------ ------------ +## Torch CUDA 582 ms +## Torch MKL 1417ms +## Torch OpenBLAS 13044 ms +## Numpy MKL 17906 ms +## Arraymancer MKL 2325 ms +## Arraymancer OpenBLAS 12502 ms +## ------------------------ ------------ ## +## :: ## -## Features -## -------- +## Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz GeForce GTX 1080 Ti ArchLinux (kernel 4.9.51-1-lts, glibc 2.26) GCC 7.2.0 MKL 2017.17.0.4.4 OpenBLAS 0.2.20 CUDA 8.0.61 +## +## | In the future, Arraymancer will leverage Nim compiler to automatically +## fuse operations +## | like ``alpha A*B + beta C`` or a combination of element-wise +## operations. This is already done to fuse ``toTensor`` and ``reshape``. +## +## Safe vs unsafe: copy vs view +## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +## +## Compared to most frameworks, Arraymancer choose to be safe by default +## but allows ``unsafe`` operations to optimize for speed and memory. The +## tensor resulting from ``unsafe`` operations (no-copy operations) share +## the underlying storage with the input tensor (also called views or +## shallow copies). This is often a surprise for beginners. +## +## In the future Arraymancer will leverage Nim compiler to automatically +## detect when an original is not used and modified anymore to +## automatically replace it by the ``unsafe`` equivalent. +## +## For CudaTensors, operations are unsafe by default (including assignmnt +## with ``=``) while waiting for further Nim optimizations for manually +## managed memory. CudaTensors can be copied safely with ``.clone`` +## +## Tensors on CPU and on Cuda +## ~~~~~~~~~~~~~~~~~~~~~~~~~~ +## +## | Tensors and CudaTensors do not have the same features implemented yet. +## | Also Cuda Tensors can only be float32 or float64 while Cpu Tensor can +## be integers, string, boolean or any custom object. +## +## Here is a comparative table, not that this feature set is developing +## very rapidly. +## +## ------------------------------------------------- --------- --------------------------------------------------------------- +## Action Tensor CudaTensor +## ------------------------------------------------- --------- --------------------------------------------------------------- +## Accessing tensor properties [x] [x] +## Tensor creation [x] by converting a cpu Tensor +## Accessing or modifying a single value [x] [] +## Iterating on a Tensor [x] [] +## Slicing a Tensor [x] [x] +## Slice mutation ``a[1,_] = 10`` [x] [] +## Comparison ``==`` [x] Coming soon +## Element-wise basic operations [x] [x] +## Universal functions [x] [x] +## Automatically broadcasted operations [x] Coming soon +## Matrix-Matrix and Matrix vector multiplication [x] [x] Note: sliced CudaTensors must explicitly be made contiguous +## Displaying a tensor [x] [x] +## Higher-order functions (map, apply, reduce, fold)[x] Apply, but only for internal use +## Transposing [x] [x] +## Converting to contiguous [x] [x] +## Reshaping [x] [] +## Explicit broadcast [x] Coming soon +## Permuting dimensions [x] Coming soon +## Concatenating along existing dimensions [x] [] +## Squeezing singleton dimensions [x] Coming soon +## Slicing + squeezing in one operation [x] Coming soon +## ------------------------------------------------- --------- --------------------------------------------------------------- ## ## Tensor properties ## ~~~~~~~~~~~~~~~~~ ## -## Properties are read-only. -## ## Tensors have the following properties: -## - ``rank``: -## - 0 for scalar (unfortunately cannot be stored) -## - 1 for vector -## - 2 for matrices -## - N for N-dimension array -## - ``shape``: a sequence of the tensor dimensions along each axis. +## +## - ``rank``: +## +## - 0 for scalar (unfortunately cannot be stored) +## - 1 for vector +## - 2 for matrices +## - N for N-dimension array +## +## - ``shape``: a sequence of the tensor dimensions along each axis. ## ## Next properties are technical and there for completeness -## - ``strides``: a sequence of numbers of steps to get the next item along a dimension. -## - ``offset``: the first element of the tensor +## +## - ``strides``: a sequence of numbers of steps to get the next item +## along a dimension. +## - ``offset``: the first element of the tensor ## ## .. code:: nim ## ## import arraymancer ## -## let d = [[1, 2, 3], [4, 5, 6]].toTensor(Cpu) +## let d = [[1, 2, 3], [4, 5, 6]].toTensor() ## ## echo d ## # Tensor of shape 2x3 of type "int" on backend "Cpu" @@ -99,16 +221,14 @@ ## echo d.strides # @[3, 1] => Next row is 3 elements away in memory while next column is 1 element away. ## echo d.offset # 0 ## -## -## ## Tensor creation ## ~~~~~~~~~~~~~~~ ## ## The canonical way to initialize a tensor is by converting a seq of seq ## of ... or an array of array of ... into a tensor using ``toTensor``. ## -## ``toTensor`` takes the backend (CPU-only currently) as a parameter and -## supports deep nested sequences and arrays. +## ``toTensor`` supports deep nested sequences and arrays, even sequence of +## arrays of sequences. ## ## .. code:: nim ## @@ -131,14 +251,13 @@ ## [1111,2222,3333], ## [4444,5555,6666] ## ] -## ].toTensor(Cpu) +## ].toTensor() ## echo c ## ## # Tensor of shape 4x2x3 of type "int" on backend "Cpu" ## # | 1 2 3 | 11 22 33 | 111 222 333 | 1111 2222 3333| ## # | 4 5 6 | 44 55 66 | 444 555 666 | 4444 5555 6666| ## -## ## ``newTensor`` procedure can be used to initialize a tensor of a specific ## shape with a default value. (0 for numbers, false for bool ...) ## @@ -150,26 +269,26 @@ ## ## .. code:: nim ## -## let e = newTensor([2, 3], bool, Cpu) +## let e = newTensor([2, 3], bool) ## # Tensor of shape 2x3 of type "bool" on backend "Cpu" ## # |false false false| ## # |false false false| ## -## let f = zeros([4, 3], float, Cpu) +## let f = zeros([4, 3], float) ## # Tensor of shape 4x3 of type "float" on backend "Cpu" ## # |0.0 0.0 0.0| ## # |0.0 0.0 0.0| ## # |0.0 0.0 0.0| ## # |0.0 0.0 0.0| ## -## let g = ones([4, 3], float, Cpu) +## let g = ones([4, 3], float) ## # Tensor of shape 4x3 of type "float" on backend "Cpu" ## # |1.0 1.0 1.0| ## # |1.0 1.0 1.0| ## # |1.0 1.0 1.0| ## # |1.0 1.0 1.0| ## -## let tmp = [[1,2],[3,4]].toTensor(Cpu) +## let tmp = [[1,2],[3,4]].toTensor() ## let h = tmp.zeros_like ## # Tensor of shape 2x2 of type "int" on backend "Cpu" ## # |0 0| @@ -180,8 +299,6 @@ ## # |1 1| ## # |1 1| ## -## -## ## Accessing and modifying a value ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## @@ -189,7 +306,7 @@ ## ## .. code:: nim ## -## var a = toSeq(1..24).toTensor(Cpu).reshape(2,3,4) +## var a = toSeq(1..24).toTensor().reshape(2,3,4) ## ## echo a ## # Tensor of shape 2x3x4 of type "int" on backend "Cpu" @@ -207,18 +324,15 @@ ## # | 5 6 7 8 | 17 999 19 20| ## # | 9 10 11 12 | 21 22 23 24| ## -## -## ## Copying ## ~~~~~~~ ## ## Tensor copy is deep by default (all the data is copied). In the majority ## of cases Nim compiler will detect and avoid useless copies. ## -## ``shallowCopy`` can be used on a var Tensor to enforce shallow copying -## (data is shared between the 2 variables). -## -## +## ``unsafeView`` can be used on a Tensor to enforce shallow copying (data +## is shared between the 2 variables). Most shape manipulation proc also +## have an ``unsafe`` version. ## ## Slicing ## ~~~~~~~ @@ -247,7 +361,7 @@ ## for j, yy in y: ## vandermonde[i].add(xx^yy) ## -## let foo = vandermonde.toTensor(Cpu) +## let foo = vandermonde.toTensor() ## ## echo foo ## @@ -293,8 +407,6 @@ ## # |2 4 8 16 32| ## # |1     1       1       1       1| ## -## -## ## Slice mutations ## ~~~~~~~~~~~~~~~ ## @@ -321,7 +433,7 @@ ## for j, yy in y: ## vandermonde[i].add(xx^yy) ## -## var foo = vandermonde.toTensor(Cpu) +## var foo = vandermonde.toTensor() ## ## echo foo ## @@ -365,8 +477,6 @@ ## # |4 16 3125 625 125| ## # |5 25 1024 256 64| ## -## -## ## Shapeshifting ## ~~~~~~~~~~~~~ ## @@ -375,7 +485,6 @@ ## ## The ``transpose`` function will reverse the dimensions of a tensor. ## -## ## Reshaping ## ^^^^^^^^^ ## @@ -386,42 +495,18 @@ ## ## .. code:: nim ## -## let a = toSeq(1..24).toTensor(Cpu).reshape(2,3,4) +## let a = toSeq(1..24).toTensor().reshape(2,3,4) ## ## # Tensor of shape 2x3x4 of type "int" on backend "Cpu" ## #  | 1 2 3 4 | 13 14 15 16| ## #  | 5 6 7 8 | 17 18 19 20| ## #  | 9 10 11 12 | 21 22 23 24| ## -## -## Broadcasting -## ^^^^^^^^^^^^ -## -## Arraymancer supports explicit broadcasting with ``broadcast`` and its -## alias ``bc``. A future aim is to use ``bc`` as an indicator to -## automatically tune the shape of both tensors to make them compatible. To -## avoid silent bugs, broadcasting is not implicit like for Numpy. -## -## .. image:: https://scipy.github.io/old-wiki/pages/image004de9e.gif -## -## .. code:: nim -## -## let j = [0, 10, 20, 30].toTensor(Cpu).reshape(4,1) -## let k = [0, 1, 2].toTensor(Cpu).reshape(1,3) -## -## echo j.bc([4,3]) + k.bc([4,3]) -## # Tensor of shape 4x3 of type "int" on backend "Cpu" -## # |0 1 2| -## # |10 11 12| -## # |20 21 22| -## # |30 31 32| -## -## ## Permuting - Reordering dimension ## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ## -## The ``permute`` proc can be used to reorder dimensions. Input is a -## tensor and the new dimension order +## | The ``permute`` proc can be used to reorder dimensions. +## | Input is a tensor and the new dimension order ## ## .. code:: nim ## @@ -441,7 +526,6 @@ ## # | 3 7 11 | 15 19 23| ## # | 4 8 12 | 16 20 24| ## -## ## Concatenation ## ^^^^^^^^^^^^^ ## @@ -475,7 +559,6 @@ ## # |1 2 5 6 11 12 13| ## # |3 4 7 8 14 15 16| ## -## ## Universal functions ## ~~~~~~~~~~~~~~~~~~~ ## @@ -483,23 +566,23 @@ ## similar to Numpy's universal functions. ## ## 3 functions exist: ``makeUniversal``, ``makeUniversalLocal`` and -## ``fmap``. +## ``map``. ## -## ``makeUniversal`` create a a function that applies to each element of a -## tensor from any unary function. Most functions from the ``math`` module -## have been generalized to tensors with ``makeUniversal(sin)``. -## Furthermore those universal functions are exported and available for -## import. +## | ``makeUniversal`` create a a function that applies to each element of +## a tensor from any unary function. Most functions from the ``math`` +## module have been generalized to tensors with ``makeUniversal(sin)``. +## | Furthermore those universal functions are exported and available for +## import. ## ## ``makeUniversalLocal`` does not export the universal functions. ## -## ``fmap`` is more generic and map any function to all element of a -## tensor. ``fmap`` works even if the function changes the type of the -## tensor's elements. +## ``map`` is more generic and map any function to all element of a tensor. +## ``map`` works even if the function changes the type of the tensor's +## elements. ## ## .. code:: nim ## -## echo foo.fmap(x => x.isPowerOfTwo) # map a function (`=>` comes from the future module ) +## echo foo.map(x => x.isPowerOfTwo) # map a function (`=>` comes from the future module ) ## ## # Tensor of shape 5x5 of type "bool" on backend "Cpu" ## # |true true true true true| @@ -508,7 +591,7 @@ ## # |true true true true true| ## # |false false false false false| ## -## let foo_float = foo.fmap(x => x.float) +## let foo_float = foo.map(x => x.float) ## echo ln foo_float # universal function (convert first to float for ln) ## ## # Tensor of shape 5x5 of type "float" on backend "Cpu" @@ -518,7 +601,6 @@ ## # |1.386294361119891 2.772588722239781 4.158883083359671 5.545177444479562 6.931471805599453| ## # |1.6094379124341 3.218875824868201 4.828313737302302 6.437751649736401 8.047189562170502| ## -## ## Type conversion ## ~~~~~~~~~~~~~~~ ## @@ -528,14 +610,13 @@ ## ## let foo_float = foo.astype(float) ## -## ## Matrix and vector operations ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ## The following linear algebra operations are supported for tensors of ## rank 1 (vectors) and 2 (matrices): ## -## - dot product (Vector to Vector) using ``.*`` +## - dot product (Vector to Vector) using ``dot`` ## - addition and substraction (any rank) using ``+`` and ``-`` ## - in-place addition and substraction (any-rank) using ``+=`` and ``-=`` ## - multiplication or division by a scalar using ``*`` and ``/`` @@ -545,8 +626,9 @@ ## ## Note: Matrix operations for floats are accelerated using BLAS (Intel ## MKL, OpenBLAS, Apple Accelerate ...). Unfortunately there is no -## acceleration routine for integers. I wrote a custom routine for -## matrix-matrix but matrix-vector is not implemented. +## acceleration routine for integers. Integer matrix-matrix and +## matrix-vector multiplications are implemented via semi-optimized +## routines (no naive loops but don't leverage CPU-specific features). ## ## .. code:: nim ## @@ -558,6 +640,32 @@ ## # |6372.0 30340.0 146244.0 710980.0 3478212.0| ## # |18555.0 89355.0 434205.0 2123655.0 10436805.0| ## +## Broadcasting +## ~~~~~~~~~~~~ +## +## | Arraymancer supports explicit broadcasting with ``broadcast`` and its +## alias ``bc``. +## | And supports implicit broadcasting with operations beginning with a +## dot: +## +## .. code:: nim +## +## let j = [0, 10, 20, 30].toTensor(Cpu).reshape(4,1) +## let k = [0, 1, 2].toTensor(Cpu).reshape(1,3) +## +## echo j .+ k +## # Tensor of shape 4x3 of type "int" on backend "Cpu" +## # |0 1 2| +## # |10 11 12| +## # |20 21 22| +## # |30 31 32| +## +## - ``.+``,\ ``.-``, +## - ``.*``: broadcasted element-wise matrix multiplication also called +## Hadamard product) +## - ``./``: broadcasted element-wise division or integer-division +## - ``.+=``, ``.-=``, ``.*=``, ``./=``: in-place versions. Only the right +## operand is broadcastable. ## ## Iterators ## ~~~~~~~~~ @@ -606,23 +714,91 @@ ## For convenience a ``values`` closure iterator is available for iterator ## chaining. ``values`` is equivalent to ``items``. ## -## A ``mitems`` iterator is available to directly mutate elements while -## iterating. An ``axis`` iterator is available to iterate along an axis. +## | A ``mitems`` iterator is available to directly mutate elements while +## iterating. +## | An ``axis`` iterator is available to iterate along an axis. ## +## Higher-order functions (Map, Reduce, Fold) +## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## -## Aggregate and Statistics -## ~~~~~~~~~~~~~~~~~~~~~~~~ +## Arraymancer supports efficient higher-order functions on the whole +## tensor or on an axis. ## -## ``sum`` and ``mean`` functions are avalaible to compute the sum and mean -## of a tensor. ``sum`` and ``mean`` can also be computed along an axis -## with the ``axis`` argument. +## ``map``, ``apply``, ``map2``, ``apply2`` +## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ## -## Generic aggregates on the whole tensor or along an axis can be computed -## with ``agg`` and ``agg_inplace`` functions. +## .. code:: nim +## +## a.map(x => x+1) +## +## or +## +## .. code:: nim ## +## proc plusone[T](x: T): T = +## x + 1 +## a.map(plusone) # Map the function plusone ## +## Note: for basic operation, you can use implicit broadcasting instead +## ``a .+ 1`` ## +## ``apply`` is the same as ``map`` but in-place. ## -## Arraymancer - Technical API -## =========================== +## ``map2`` and ``apply2`` takes 2 input tensors and respectively, return a +## new one or modify the first in-place. +## +## .. code:: nim +## +## proc `**`[T](x, y: T): T = # We create a new power `**` function that works on 2 scalars +## pow(x, y) +## a.map2(`**`, b) +## # Or +## map2(a, `**`, b) +## +## ``reduce`` on the whole Tensor or along an axis +## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +## +## ``reduce`` apply a function like ``+`` or ``max`` on the whole Tensor[T] +## returning a single value T. +## +## For example: +## +## - Reducing with ``+`` returns the sum of all elements of teh Tensor. +## - Reducing with ``max`` returns the biggest element of the Tensor +## +## ``reduce`` can be applied along an axis, for example the sum along the +## rows of a Tensor. +## +## ``fold`` on the whole Tensor or along an axis +## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +## +## ``fold`` is a generalization of ``reduce``. Its starting value is not +## the first element of the Tensor. +## +## It can do anything that reduce can, but also has other tricks because it +## is not constrained by the Tensor type or starting value. +## +## For example: +## +## - Reducing with ``was_a_odd_and_what_about_b`` and a starting value of +## ``true`` returns ``true`` if all elements are odd or ``false`` +## otherwise +## +## Just in case +## +## .. code:: nim +## +## proc was_a_odd_and_what_about_b[T: SomeInteger](a: bool, b: T): bool = +## return a and (b mod 2 == 1) # a is the result of previous computations, b is the new integer to check. +## +## Aggregate and Statistics +## ~~~~~~~~~~~~~~~~~~~~~~~~ +## +## | ``sum`` and ``mean`` functions are available to compute the sum and +## mean of a tensor. +## | ``sum`` and ``mean`` can also be computed along an axis with the +## ``axis`` argument. +## +## Generic aggregates on the whole tensor or along an axis can be computed +## with ``agg`` and ``agg_inplace`` functions. ## \ No newline at end of file