anlsys · fmonna · Aug 20, 2019 · Aug 27, 2019 · Sep 4, 2019 · May 10, 2021
diff --git a/doc/index.rst b/doc/index.rst
@@ -21,8 +21,8 @@ blocks*, used to develop explicit memory and data management policies. The goals
 of AML are:
 
 * **composability**: application developers and performance experts should be
-  able to pick and choose the building blocks to use depending on their specific
-  needs.
+  able to pick and choose which building blocks to use depending on their
+  specific needs.
 
 * **flexibility**: users should be able to customize, replace, or change the
   configuration of each building block as much as possible.
@@ -36,7 +36,7 @@ AML currently implements the following abstractions:
 * :doc:`Area <pages/areas>`, a set of addressable physical memories,
 * :doc:`Layout <pages/layout>`, a description of data structure organization,
 * :doc:`Tiling <pages/tilings>`, a description of data blocking (decomposition)
-* :doc:`DMA <pages/dmas>`, an engine to asynchronously move data structures between areas,
+* :doc:`DMA <pages/dmas>`, an engine to asynchronously move data structures between areas.
 
 Each of these abstractions has several implementations. For instance, areas
 may refer to the usual DRAM or its subset, to GPU memory, or to non-volatile memory.
@@ -76,7 +76,7 @@ Installation
 Workflow
 ~~~~~~~~
 
-Include the AML header:
+Include AML header:
 
 .. code-block:: c
 
@@ -93,7 +93,7 @@ Check the AML version:
       return 1;
   }
 
-Initialize and clean up the library:
+Initialize and cleanup AML:
 
 .. code-block:: c
 
@@ -106,8 +106,8 @@ Initialize and clean up the library:
 
 Link your program with *-laml*.
 
-Check the above building-blocks-specific pages for further examples and
-information on the library features.
+See the above pages on specific building blocks for further examples and
+information on library features.
 
 Support
 -------

diff --git a/doc/pages/area_cuda_api.rst b/doc/pages/area_cuda_api.rst
@@ -1,4 +1,18 @@
 Area Cuda Implementation API
 =================================
+Cuda Implementation of Areas.
+
+.. codeblock:: c
+        #include <aml/area/cuda.h>
+
+Cuda implementation of AML areas.
+This building block relies on Cuda implementation of
+malloc/free to provide mmap/munmap on device memory.
+Additional documentation of cuda runtime API can be found here:
+https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html
+
+AML cuda areas may be created to allocate current or specific cuda devices.
+Also allocations can be private to a single device or shared across devices.
+Finally allocations can be backed by host memory allocation.
 
 .. doxygengroup:: aml_area_cuda
diff --git a/doc/pages/area_linux_api.rst b/doc/pages/area_linux_api.rst
@@ -1,4 +1,89 @@
-Area Linux Implementation API
-=================================
+Area Linux Implementation 
+=========================
+
+This is the Linux implementation of AML areas.
+
+This building block relies on the libnuma implementation and the Linux
+mmap() / munmap() to provide mmap() / munmap() on NUMA host processor memory. 
+New areas may be created to allocate a specific subset of memories.
+This building block also includes a static declaration of a default initialized
+area that can be used out-of-the-box with the abstract area API.
+
+.. codeblock:: c
+        #include <aml/area/linux.h
+
+Example
+-------
+Using built-in feature of linux areas:
+We allocate data accessible by several processes with the same address, spread
+across all CPU memories (using linux interleave policy)
+
+.. codeblock:: c
+  // include ..
+
+  struct aml_area* area;
+  aml_area_linux_create(&area, AML_AREA_LINUX_MMAP_FLAG_SHARED, NULL,
+                        AML_AREA_LINUX_BINDING_FLAG_INTERLEAVE);
+
+  // When work is done with this area, free resources associated with it
+  aml_area_linux_destroy(&area);
+
+Integrating new feature in a new area implementation with some linux features:
+You need an area feature not integrated in AML, but you want to work with AML
+features around areas.
+You can extend the features of linux area and reimplement a custom
+implementation of mmap and munmap functions with
+additional fields.
+
+.. codeblock:: c
+  // include ..
+
+  // declaration of data field used in generic areas
+  struct aml_area_data {
+     // uses features of linux areas
+     struct aml_area_linux_data linux_data;
+     // implements additional features
+     void* my_data;
+  };
+
+  // create your struct my_area_data with custom linux settings
+  struct aml_area_data {
+     .linux_data = {
+         .nodeset = NULL,
+         .binding_flags = AML_AREA_LINUX_BINDING_FLAG_INTERLEAVE,
+         .mmap_flags = AML_AREA_LINUX_FLAG_SHARED,
+     },
+     .my_data = whatever_floats_your_boat,
+  } my_area_data;
+
+  // implements mmap using linux area features and custom features
+  void* my_mmap(const struct aml_area_data* data, void* ptr, size_t size){
+      program_data = aml_area_linux_mmap(data->linux_data, ptr, size);
+      aml_area_linux_mbind(data->linux_data, program_data, size);
+      // additional work we wnat to do on top of area linux work
+      whatever_shark(data->my_data, program_data, size);
+      return program_data;
+  }
+  // same for munmap
+  int* my_munmap(cont struct aml_area_data* data, void* ptr, size_t size);
+
+  // builds your custom area
+  struct aml_area_ops {
+     .mmap = my_mmap,
+     .munmap = my_munmap,
+  } my_area_ops;
+
+  struct aml_area {
+     .ops = my_area_ops,
+     .data = my_area_data,
+  } my_area;
+
+  void* program_data = aml_area_mmap(&my_area, NULL, size);
+
+
+And now you can call the generic API on your area.
+
+Area Linux API
+==============
 
 .. doxygengroup:: aml_area_linux
diff --git a/doc/pages/area_opencl_api.rst b/doc/pages/area_opencl_api.rst
@@ -1,4 +1,15 @@
 Area OpenCL Implementation API
 =================================
 
+OpenCL Implementation of Areas.
+
+.. codeblock:: c
+        #include <aml/area/opencl.h>
+
+OpenCL implementation of AML areas.
+This building block relies on OpenCL implementation of
+device memory allocation to provide mmap/munmap on device memory.
+Additional documentation of OpenCL memory model can be found here:
+https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html#_memory_model
+
 .. doxygengroup:: aml_area_opencl
diff --git a/doc/pages/area_ze_api.rst b/doc/pages/area_ze_api.rst
@@ -1,4 +1,16 @@
 Area Level Zero Implementation API
 ==================================
 
+Implementation of Areas with Level Zero API.
+
+.. codeblock:: c
+        #include <aml/area/ze.h>
+
+Implementation of Areas with Level Zero API.
+This building block relies on Ze implementation of
+host and device memory mapping to provide mmap/munmap on device memory.
+Additional documentation of Ze memory model can be found here:
+
+https://spec.oneapi.com/level-zero/latest/core/api.html#memory
+
 .. doxygengroup:: aml_area_ze
diff --git a/doc/pages/areas.rst b/doc/pages/areas.rst
@@ -1,10 +1,90 @@
 Areas: Addressable Physical Memories
 ====================================
 
+AML areas represent places where data can be stored.
+In shared memory systems, locality is a major concern for performance.
+Being able to query memory from specific places is of major interest to achieve
+this goal.
+AML areas provide low-level mmap() / munmap() functions to query memory from
+specific places materialized as areas. 
+Available area implementations dictate the way such places can be arranged and
+their properties.
+
+.. image:: ../img/area.png 
+   :width=700px
+"Illustration of areas on a complex system."
+
+An AML area is an implementation of memory operations for several type of
+devices through a consistent abstraction.
+This abstraction is meant to be implemented for several kind of devices, i.e.
+the same function calls allocate different kinds of devices depending on the
+area implementation provided.
+
+With the high level API, you can:
+
+* Use an area to allocate space for your data
+* Release the data in this area
+
+Example
+-------
+
+Let's look how these operations can be done in a C program.
+
+.. code-block:: c
+  #include <aml.h>
+  #include <aml/area/linux.h>
+
+  int main(){
+
+      void* data = aml_area_mmap(&aml_area_linux, s); 
+      do_work(data);
+      aml_area_munmap(data, s);
+  }
+
+We start by importing the AML interface, as well as the area implementation we
+want to use.
+
+We then proceed to allocate space for the data of size s using the default from
+the AML Linux implementation.
+The data will be only visible by this process and bound to the CPU with the
+default linux allocation policy.
+
+Finally, when the work is done with data, we free it.
+
+
+Area API
+--------
+
+It is important to notice that the functions provided through the Area API are
+low-level functions and are not optimized for performance as allocators are.
+
 .. doxygengroup:: aml_area
 
+
 Implementations
 ---------------
+Aware users may create or modify implementation by assembling appropriate
+operations in an aml_area_ops structure.
+
+The linux implementation is the go to for using simple areas on NUMA CPUs with
+linux operating system. 
+
+There is an ongoing work on hwloc, CUDA and OpenCL areas.
+
+Let's look at an example of a dynamic creation of a linux area identical to the
+static default aml_area_linux:
+
+.. code-block:: c
+  #include <aml.h>
+  #include <aml/area/linux.h>
+
+  int main(){
+      struct aml_area* area;
+      aml_area_linux_create(&area, AML_AREA_LINUX_MMAP_FLAG_PRIVATE, NULL,
+                        AML_AREA_LINUX_BINDING_FLAG_DEFAULT);
+      do_work(area);
+      aml_area_linux_destroy(&area);
+  }
 
 .. toctree::