-
Notifications
You must be signed in to change notification settings - Fork 4
/
readme.txt
162 lines (133 loc) · 7.15 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
Tiny Big Float library
----------------------
Copyright (c) 2017-2020 Fabrice Bellard
LibBF is a small library to handle arbitrary precision binary or
decimal floating point numbers. Its compiled size is about 90 KB of
x86 code and has no dependency on other libraries. It is not the
fastest library nor the smallest but it tries to be simple while using
asymptotically optimal algorithms. The basic arithmetic operations
have a near linear running time.
The TinyPI example computes billions of digits of Pi using the
Chudnovsky formula.
1) Features
-----------
- Arbitrary precision floating point numbers in base 2 using the IEEE
754 semantics (including subnormal numbers, infinities and
NaN).
- All operations are exactly rounded using the 5 IEEE 754 rounding
modes (round to nearest with ties to even or away from zero, round
to zero, -/+ infinity). The additional non-deterministic faithful
rounding mode is supported when a lower or deterministic running
time is necessary.
- Stateless API (each function takes as input the rounding mode,
mantissa and exponent precisions in bits and return the IEEE status
flags).
- The basic arithmetic operations (addition, subtraction,
multiplication, division, square root) have a near linear running
time.
- Multiplication using a SIMD optimized Number Theoretic Transform.
- Exactly rounded floating point input and output in any base between
2 and 36 with near linear runnning time. Floating point output can
select the smallest amount of digits to get the required precision.
- Transcendental functions are supported (exp, log, pow, sin, cos, tan,
asin, acos, atan, atan2).
- Operations on arbitrarily large integers are supported by using a
special "infinite" precision. Integer division with remainder and
logical operations (assuming two complement binary representation)
are implemented.
- Arbitrary precision floating point numbers in base 10 corresponding
to the IEEE 754 2008 semantics with the limitation that the mantissa
is always normalized. The basic arithmetic operations, output and
input are supported with a quadratic running time.
- Easy to embed: a few C files need to be copied, the memory allocator
can be redefined, the memory allocation failures are tested.
- MIT license.
2) Compilation
--------------
Edit the top of the Makefile to select the build options. By default,
the MPFR library is used to compile the test tools (bftest and
bfbench) but it is not needed to build libbf. The included SoftFP code
(softfp* files) is only used by the bftest test tool.
TinyPI example: the "tinypi" executable uses the portable code. The
"tinypi-avx2" executable uses the AVX2 implementation. An x86 CPU of
at least the Intel Haswell generation is necessary for AVX2.
3) Design principles
--------------------
- Base 2 and IEEE 754 semantics were chosen so that it is possible to
get good performance and to compare the results with other libraries
or hardware implementations. Moreover, base 2 arbitrary precision is
easier to analyse and implement.
- The support of subnormal numbers and of a configurable number of
bits for the exponent allows the exact emulation of IEEE 754
floating hardware.
- The stateless API ensures that there is no global state to save and
restore between operations. The rounding mode, subnormal flag and
number of exponent bits are ored to a single "flags" parameter to
limit the verbosity of the API. The number of exponent bits 'n' is
specified as '(M-n)' where M is the maximum number of exponent bits
so that '0' always indicates the maximum number of exponent bits.
- All the IEEE 754 status flags are returned by each operation. The
user can easily or them when necessary.
- Unlike other libraries (such as MPFR [2]), the numbers have no
attached precision. The general rule is that each operation is
internally computed with infinite precision and then rounded with
the precision and rounding mode specified for the operation.
- In many computations it is necessary to use arbitrarily large
integers. LibBF support them without adding another number type by
providing a special "infinite" precision. There is a small overhead
of course because they are manipulated as floating point numbers but
there is no cost to convert between floating point numbers and
integers.
- The faithful rounding mode (i.e. the result is rounded to - or
+infinity non deterministically) is supported for all operations. It
usually gives a faster and deterministic running time. The
transcendental functions, inverse or inverse square root are
internally implemented to give a faithful rounding. When a
non-faithful rounding is requested by the user, the Ziv rounding
algorithm is invoked.
4) Implementation notes
-----------------------
- The code was tested on a 64 bit x86 CPU. It should be portable to
other CPUs. The portable version handles numbers with up to 4*10^16
digits. The AVX2 version handles numbers with up to 8*10^12 digits.
- 32 bits: the code compiles on 32 bit architectures but it is not
designed to be efficient nor scalable in this case. The size of the
numbers is limited to about 10 million digits.
- The Number Theoretic Transform is not the fastest algorithm for
small to medium numbers (i.e. a few million digits), but it gets
better when the size of the numbers grows. There is no round-off
errors as with Fast Fourier Transform, the memory usage is much
smaller and it is potentially easier to parallelize. This code
contains an original SIMD (AVX2 on x86) implementation using 64 bit
floating point numbers. It relies on the fact that the fused
multiply accumulate (FMA) operation gives access to the full
precision of the product of two 64 bit floating point numbers. The
portable code relies on the fact that the C compiler supports a
double word integer type (i.e. 128 bit integers on 64 bit). The
modulo operations were replaced with multiplications which are
usually faster.
- Base conversion: the algorithm is not the fastest one but it is
simple and still gives a near linear running time.
- This library reuses some ideas from TachusPI (
http://bellard.org/pi/pi2700e9/tpi.html ) . It is about 4 times
slower to compute Pi but is much smaller and simpler.
5) Known limitations
--------------------
- In some operations (such as the transcendental ones), there is no
rigourous proof of the rounding error. We expect to improve it by
reusing ideas from the MPFR algorithms. Some unlikely
overflow/underflow cases are also not handled in exp or pow.
- The transcendental operations are not speed optimized and do not use
an asymptotically optimal algorithm (the running time is in
O(n^(1/2)*M(n)) where M(n) is the time to multiply two n bit
numbers). A possible solution would be to implement a binary
splitting algorithm for exp and sin/cos (see [1]) and to use a
Newton based inversion to get log and atan.
- Memory allocation errors are not always correctly reported for the
transcendental operations.
6) References
-------------
[1] Modern Computer Arithmetic, Richard Brent and Paul Zimmermann,
Cambridge University Press, 2010
(https://members.loria.fr/PZimmermann/mca/pub226.html).
[2] The GNU MPFR Library (http://www.mpfr.org/)