-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathNEWS
181 lines (125 loc) · 6.36 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
NEWS for the OpenCL package
0.2-11
o Do not issue kernel compiler warning if the build log
string consists solely of whitespaces. (#21)
0.2-10 2023-12-22
o Adjust header declaration for ocl_get_device_info_entry.
o Windows: for single-arch R fall back to $OCL/lib/OpenCL.lib
if present.
0.2-9 2023-11-29
o Clean up superfluous class comparisons.
0.2-8 2023-11-27
o Change the name of the package dynamic object to R_OpenCL
to avoid name conflict on Windows. (#6, #20, thanks to
Aaron Puchert)
0.2-7 2023-06-28
o Windows: accept OCLLIB instead of OCL64LIB for single-arch R.
Update INSTALL documentation to be more specific about the
defaults and installation (#17).
o added (non-exported) .oclDeviceInfoEntry() for diagnostics
which allows the retrieval of any device information entries.
o added support for dynamic __local memory buffers as arguments
to kernel via the clLocal() function -- see documentation for
details. (#18)
0.2-6 2023-04-13
o Update oclRun() example to explicitly cast in order to avoid
ambiguity errors in built-ins on macOS.
0.2-5 2022-12-25
o Use correct error()/warning() for build logs in compiled
programs. Output from the OpenCL compilation is now included
in the error/warning. (#13)
o Use more verbose, textual representation of OpenCL errors.
0.2-4 2022-03-17
o Added full support for subsetting and subassignment for clBuffers.
Note that only contiguous operations (e.g. x[5:8] = 0) can be performed
natively, any other index will result in the entire buffer being
emerged and the operation done in R (CPU).
o Added mode= parameter to as.clBuffer() to ensure specific type
of the buffer. This is ofen required when passing buffers to
kernel since the kernel code must match teh buffer type.
o Improved oclRun() example to explicitly use numeric buffer for
reasons explained above. Previously, passing integer input
would result in integer buffer which does not match the kernel
code. (#10)
0.2-3 2021-12-21
o added buffer memory tracking and automatic garbage collection triggers.
See oclMemLimits() help for details. (#8)
0.2-2 2021-07-24
o added native symbol registration
o fix issue with R API in R-devel
0.2-1 2020-03-03
o OpenCL context and command queue can be persisted, allowing to keep data
between calls. The context also remembers whether to default to single-
or double-precision for numeric vectors.
o Data can stay on the OpenCL device (GPU) between kernel calls. This is
extremely valuable when working with discrete GPUs connected over a
relatively slow PCIe connection.
o A single-precision data type is no longer required. The conversion takes
place when transferring the data to the OpenCL device. On the R side,
data remains in numeric vectors.
o Kernels are executed asynchronously and possibly out-of-order, if the
OpenCL implementation allows it. Synchronization need not to be done
manually and happens without the user knowing: OpenCL events
corresponding to a kernel execution are attached to the output buffer.
Following kernel executions having the buffer as input then wait for the
event, hence for the preceding kernel execution to finish. Likewise,
reads from buffers wait on the attached event as well.
o OpenCL device information is amended by maximum frequency. Also the list
of extensions is broken down to make it easier searchable.
o By default, we choose GPU devices. CPU devices usually don't make a lot
of sense. Also, if there are multiple GPU devices available - think of
a notebook with integrated and discrete GPU - we try to choose the
faster device.
o There are now several tests covering most of the functionality.
o Windows configuration has been simplified. On Windows, you
have to set the OCL environemnt variable to the root of the
OpenCL SDK. By default the CPPFLAGS and LIBS will be
constrcuted from that location depending on whether 32-bit or
64-bit binaries are produced by appending lib/x64/OpenCL.lib
or lib/x86/OpenCL.lib respecitvely. Includes are assumed to be
in the includes directory. All of the above can be overridden
by setting OCLINC to the necessary pre-processor flags and
OCL32LIB or OCL64LIB for 32-bit and 64-bit linker flags.
(Note that OCL must still be set even if you override all
flags)
0.1-4
o devices with very long extensions strings could cause error
on retrieval. Fixed with larger static buffer.
(Thanks to Valerio Aimale again)
o Improve error reporting by always including the OpenCL
error code
0.1-3 2012-05-25
o fix a bug causing device enumeration to use the default
device for device count regradless of the specified type.
This affects only systems with more than one type of device.
(Thanks to Valerio Aimale for reporting)
o added dim argument to oclRun() which allows multidimensional
indexing (up to 3d) in the kernel. The dimensions can be
obtained in the kernel via get_global_size() and the index
values with get_global_id(). Note that using index vectors
instead of multidimensional indexing may perform better
depending on the device. The default is to use single
dimension (dim=size) which is the same as previous versions of
OpenCL.
o add precision="best" in oclSimpleKernel which switches
automatically to double-precision if supported by the device
o kernels objects are now less cryptic - they implement
print(), names() and $ methods for access to their attributes.
0.1-2 2012-03-07
o add the support for asynchronous calls, i.e., execution
parallel to the CPU or multiple parallel GPU operations.
This is done by using x <- oclRun(..., wait=FALSE) to
dispatch the kernel and then oclResult(x) to collect the
results later.
o minor cleanup
0.1-1 2011-08-09
o improve memory management and clean up on error in oclRun()
o use CL_MEM_USE_HOST_PTR instead of clEnqueueWriteBuffer() for
better performance on large input vectors
o add support for native single precision representation
(see ?clFloat and native.result argument in oclRun())
o added INSTALL file with links to common OpenCL implementations
0.1-0 2011-08-08
o first public release
includes support for single and double precision computations
as well as simple kernels (one output vector, arbitrary input)