Skip to content

v1.1.0

Compare
Choose a tag to compare
@dmed256 dmed256 released this 10 Sep 03:37
c8a5876

🔥 Exciting News

Fortran API

I'm super excited to announce the Fortran API! This was single-handedly designed and built by @awehrfritz, so huge thanks!! The API is not finalized but most likely not changing much in the future since the design matches our other language APIs.

For more information, the initial PR can be found here: #341

Collaboration

Join our Slack workspace!

For a lot of the OCCA development, most of the work was done by a very small group of people. The project has grown over the last few years from it being a research project to it being used by a few organizations.

During this release, we added CMake support. While it's not directly adding any development features, it will enable the use of the OCCA library to a greater audience which some might say is even more impactful than adding features. What makes this even more exciting is how many unrelated collaborators took part in this work!

Lots of PRs that made this happen: #310, #313, #319, #323, #329, #344, #345, #357

Many thanks to

⚠️ Breaking Changes

  • [de598e6] OCCA now compiles with C++11. C++ projects will need the -std=c++11 flag for most compilers added to compilation.

  • [f4fea62] Renamed occa::hash_t methods

    • hash_t::toString()hash_t::getString()
    • hash_t::toFullString()hash_t::getFullString()
  • [#322] Updates occaFree to take in the argument by reference rather than value

    occaFree(value)

    occaFree(&value)

🃏Experimental

  • [#341] The Fortran API

  • [08b3a68] Adds OCCA_JIT and OCCA_JIT_WITH_SCOPE macro. Examples for C++ and C can be found:

    For Example:

      OCCA_JIT(
        (entries, a, b, ab),
        (
          for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
            ab[i] = 100 * (a[i] + b[i]);
          }
        )
      );
  • [0a77696] Adds okl-mode.el for editing OKL kernels in Emacs 🎉

⭐️ Features

  • [f813c34] Adds templated malloc for easier use while keeping backwards compatibility

    Original malloc

    occa::memory mem = occa::malloc(10 * sizeof(float), src);

    Initial dtype malloc

    occa::memory mem = occa::malloc(10, occa::dtype::float_, src);

    New malloc

    occa::memory mem = occa::malloc<float>(10, src);
  • [92ffb58] Adds templated umalloc for easier use while keeping backwards compatibility

    float *a = (float*) occa::umalloc(10, occa::dtype::float_, src);
    void *b = occa::umalloc(10 * sizeof(float), src);

    float *a = occa::umalloc<float>(10, src);
    void *b = occa::umalloc(10 * sizeof(float), src);
  • [c61d636] Adds templated ptr for easier use. Defaults to the return value of void* for backwards compatibility.

    occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
    float *ptr = (float*) mem.ptr();

    occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
    float *ptr = mem.ptr<float>();
  • [c61d636] Adds use_host_pointer to memory props to auto-wrap source pointers during malloc calls

    float *hostPtr = new float[10];
    occa::memory mem = occa::malloc<float>(10, occa::dtype::float_, hostPtr, "use_host_pointer: true");
    mem.ptr<float>() == hostPtr;
  • Adds polyfills to test compilation of locally unsupported modes

  • [284aff8] Adds method to get the kernel hash

    • C++ occa::kernel::hash() which returns a occa::hash_t object
    • C occaKernelGetHash and occaKernelGetFullHash which return hash as a const char*
  • [f2f21a3] Adds Metal backend for GPGPU in MacOS

    • Requires MacOS to be at least 10.4 (Mojave)
    • Requires XCode version to be at least 10.2.1
    • Metal does not support double or long types
    • Issues with global typedef due to missing address space qualifiers
  • [386bc4c] Adds occa translate --launcher to get the host code needed to launch the device kernels (CUDA, HIP, OpenCL, Metal modes)

  • [#246] Adds the @directive preprocessor attribute to add directives inside macros, such as OCCA_JIT

    @directive("#pragma ivdep")
    

    #pragma ivdep
    
  • [#265] Adds OCCA_CONFIG config file to set defaults. There is a config.defaults.json file with explanation of possible properties that can be set, including mode-specific properties.

  • [#266] Allows HIP to compile CUDA kernels (Thanks @noelchalmers!)

  • [#270] Adds occa::null for passing a NULL equivalent to occa::kernels (occaNull in C)

  • [#284] Adds OCCA_LDFLAGS along with kernel/compiler_linker_flags (Thanks @stgeke!)

  • [#308] Adds OCCA_SHARED_FLAGS along with kernel/compiler_shared_flags

  • [#308] Adds support to build native C kernels (disabling OKL with okl/enabled set to false and setting kernel/compiler_language to C which defaults to C++) (Thanks @amikstcyr!)

  • [#346] Supports #include of standard C and C++ headers in OKL kernels. Note this will print warnings since adding these headers is not a portable solution across supported backends.

  • [#347] Adds some standard defines on OKL kernels so users can check if the kernel is being processed by an OKL kernel or not. This is useful when reusing source code for OCCA kernels and non-OCCA kernels.

  • [#349] Keeps some comments around after applying OKL transformation for cleaner generated kernels.

  • [#354] Adds OKL_KERNEL_HASH define to help debug which kernel is currently being run (Tip: printf and std::cout are available in Serial and OpenMP modes!)

  • [#349][#355][#358][#364] Keeps comments around when transpiling kernels

🐛 Bugs Fixed

  • [ebdb659] Updates to HIP backend (Thanks @noelchalmers!)
  • [ac117fb] Fixed caching bugs (Thanks Nigel Nunn!)
  • [5420005] Use .dylib instead of .so on MacOS (Thanks @thilinarmtb!)
  • [ce4df26] Properly copy over artifacts when building with PREFIX (Thanks @thilinarmtb!)
  • [#243] Properly avoid overriding and duplicating compiler shared flags(Thanks @noelchalmers!)
  • [f23ce88] Avoids writing lockfile when checking compiler vendor
  • [3df3955] Properly fixed untyped umalloc in C
  • [4d5d5bc] Kernels from strings were badly generating the launcher kernel
  • [27a7420] OpenCL translation was converting the const pointer typedefs const qualifier &rarr __constant
  • [#261] Invalid read in json -> properties unsafe cast (Thanks for pointing it out @stgeke!)
  • [#265] Fixes object/mode specific properties from not propagating
  • [86dead2] OpenCL timing was done backwards, resulting in negative times. (Thanks @tcew!)
  • [#293] Fixed some reference counting issues with the kernelBuilder
  • [#400] CUDA context was not being set in a few places (Thanks @amikstcyr!)

🎉 Contributors