Skip to content

Omega_h Usage

Cameron Smith edited this page Apr 5, 2021 · 1 revision

Notation

o:: precedes symbols from the Omega_h namespace

Omega_h Array Objects

The majority of code will be either directly inside, or is designed to be called from, an OMEGA_H_LAMBDA or SCS_LAMBDA. o::Write and o::Read array objects support random access on the device/GPU via the bracket ([ ]) operator. As the name suggests, o::HostWrite and o::HostRead support random access on the host.

File read/write is done on the host. File read/write uses o::HostWrite, which is similar to C-style dynamic-array taking run-time size for creation, but easier when conversion to/from device data is done.

Parallel For Loops

OMEGA_H_LAMBDA or SCS_LAMBDA can be used for accessing device data. If particle data (SCS) is to be used, then SCS_LAMBDA is the correct way. Note that the 'pid' is not particle sequential index or id, which is stored specifically in SCS. When storing particle specific stuff for intermediate use, there are 2 ways: 1) use scs-capacity as array size, or 2) use total number of particles as array size. If option (1) is used , then the array filled is valid only in the current iteration, specifically until SCS is re-built. If option (2) is used, the the index should be permanent particle id retrieved from the SCS data member. The option (1) has disadvantage that the array-size should be too large to accommodate the 'pid', which potentially has not limit. Or, the array would have to be generated in each iteration, which seems to be not efficient in summit. The option (2) has disadvantage that the array is fixed size, causing carrying around many unused elements towards the end of simulation.

OMEGA_H_LAMBDA is the right way to process device data when particle data is not used. When both particle and array device data are used, SCS_LAMBDA is to be used.

Usage

//in class hpp file
o::Reals data;

// in cpp file
data = o::Reals(o::HostWrite<o::Real>dataIn);

OMEGA_H_DEVICE void subFunc(o::Reals& data) {
 //to print first element
  printf("%g ", data[i]); //OK
}

//somewhere in the class definition
inline void func(o::Reals& data) {
  //to print first element
  printf("%g ", data[i]); //Error

  //Use
  o::parallel_for(1, OMEGA_H_LAMBDA(const int& i){
    printf("%g ", data[i]);
  });

  auto lambda = OMEGA_H_LAMBDA(const int& i){
    subFunc(data);
  }
  o::parallel_for(n, lambda);
}

int main() {
  ...create object, obj
  func(obj.data); //OK, calling kernel
  subFunc(obj.data); //ERROR, not a kernel
}

Debugging

GDB is the recommended way to get a stack trace when there is a failure in GPU kernel code. Note, the stack will not provide the line number in the kernel. Print statements are currently the best way to determine the location.

Core dumps can also be used: https://jvns.ca/blog/2018/04/28/debugging-a-segfault-on-linux/

Memory error not always visible

  OMEGA_H_DEVICE myFunction(o::Write<o::Real> &data) {
    //do something, uninitialized memory error is swallowed
  }

  void kernel(o::Write<o::Real> &data) {
  auto lambda = SCS_LAMBDA(const int &elem, const int &pid, const int &mask) {
    // pass unitialized data
    myFunction(data);
  };
  scs->parallel_for(lambda);
  }

The solution is to use assert or OMEGA_H_CHECK() before accessing arrays.

Classes and Structs

Device data pointers to Omega_h array objects can be stored in classes or structs. Class functions which need to be called from GPU kernels need to be public; private functions cannot be called from kernels [REFERENCE NEEDED].

Current kernels are done as inline functions in hpp files. The keyword 'inline' is also required if definition and declaration are not separate in files, when there is no class or namespace. [EXAMPLE?]

Element Ordering

Omega_h

   see description: Omega_h_simplex.hpp, Omega_h_refine_topology.hpp line 26
   face_vert:0,2,1; 0,1,3; 1,2,3; 2,0,3.
   corresp. opp. vertexes: 3,2,0,1, by simplex_opposite_template(3, 2, iface, i) ?
   Side note: r3d.cpp line 528: 3,2,1; 0,2,3; 0,3,1; 0,1,2 .Vertexes opp.:0,1,2,3
              3
            / | \
          /   |   \
         0----|----2
          \   |   /
            \ | /
              1

Omega_h_refine_topology.hpp says

The triangle-of-tet-to-vertices template specifies all triangles curling outward (such that implicit face 
derivation orients all surface aces outward), while the tet-to-vertices convention as the bottom face 
curling inward for consistency ith Gmsh, PUMI, Simmetrix, etc. rather than break either of those two 
compatibilities, we will remember to flip the triangle here to get the right orientation for new tets.

But all the faces in pumi have counter-clockwise vertex ordering when viewed from outside the tet, means outward orientation !

PUMI ordering

The above figure is god for Omega_h. Not sure if the opposite_template use is right.

Face vertices are in counter-clockwise order.

gmsh

It seems the ordering is different. The face ordering is clockwise.

http://gmsh.info/doc/texinfo/gmsh.html#Node-ordering

 Tetrahedron:                          Tetrahedron10:
                  v
                .
              ,/
             /
           2                                     2
         ,/|`\                                 ,/|`\
       ,/  |  `\                             ,/  |  `\
     ,/    '.   `\                         ,6    '.   `5
   ,/       |     `\                     ,/       8     `\
 ,/         |       `\                 ,/         |       `\
0-----------'.--------1 --> u         0--------4--'.--------1
 `\.         |      ,/                 `\.         |      ,/
    `\.      |    ,/                      `\.      |    ,9
       `\.   '. ,/                           `7.   '. ,/
          `\. |/                                `\. |/
             `3                                    `3
            `\.
               ` w