Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing containers within HCL containers is broken #5

Open
ChristopherHogan opened this issue Apr 10, 2020 · 12 comments
Open

Storing containers within HCL containers is broken #5

ChristopherHogan opened this issue Apr 10, 2020 · 12 comments

Comments

@ChristopherHogan
Copy link
Collaborator

I had a hunch that storing containers in HCL containers was not working correctly after reading this from the Boost.Interprocess documentation:

Boost.Interprocess containers are placed in shared memory/memory mapped files, etc... using two mechanisms at the same time:

Boost.Interprocess construct<>, find_or_construct<>... functions. These functions place a C++ object in the shared memory/memory mapped file. But this places only the object, but not the memory that this object may allocate dynamically.
Shared memory allocators. These allow allocating shared memory/memory mapped file portions so that containers can allocate dynamically fragments of memory to store newly inserted elements.
This means that to place any Boost.Interprocess container (including Boost.Interprocess strings) in shared memory or memory mapped files, containers must:

Define their template allocator parameter to a Boost.Interprocess allocator.
Every container constructor must take the Boost.Interprocess allocator as parameter.
You must use construct<>/find_or_construct<>... functions to place the container in the managed memory.
If you do the first two points but you don't use construct<> or find_or_construct<> you are creating a container placed only in your process but that allocates memory for contained types from shared memory/memory mapped file.

We follow this advice properly for the HCL container itself. For example, the internal bip::unordered_map in the hcl::unordered_map takes a Boost.Interprocess shared memory allocator as a template parameter. However, if we then try to store a bip::string in our hcl::unordered_map, the documentation suggests that the string constructor needs to take a proper Boost.Interprocess allocator (which is currently not possible without changing the library). To verify my hunch, I did some investigation in the debugger. First, I examine the shared memory segment in hcl::unordered_map to see the valid range of memory addresses that it emcompasses.

(gdb) p segment.m_mfile.m_mapped_region
$10 = {
  m_base = 0x7fffea218000, 
  m_size = 134217728, 
  m_page_offset = 0, 
  m_mode = boost::interprocess::read_write, 
  m_is_xsi = false
}

So the shared memory addresses range from 0x7fffea218000 to 0x7ffff2218000.

So let's see where our container data is being stored. iterator here is the result of calling mymap->find(key);

(gdb) p &(*iterator)->second
$12 = (boost::container::basic_string<char, std::char_traits<char>, boost::container::new_allocator<char> > *) 0x7fffea2181e0

The bip::string itself is stored within the valid shared memory address range. This is expected, since we define the appropriate shared memory allocator on the hcl::unordered_map. Now let's see where the bip::string has allocated its internal data:

(gdb) p (*iterator)->second.data()
$11 = 0x555555fcee70 'x' <repeats 200 times>...

This address is well outside the bounds of our shared memory. As a sanity check, I performed the same test on an example of containers within containers from the Boost.Interprocess documentation:

(gdb) p shm.m_mapped_region.m_base
$10 = (void *) 0x7ffff7ff5000
(gdb) p myshmvector 
$11 = (MyShmStringVector *) 0x7ffff7ff5268
(gdb) p myshmvector->data()
$12 = (boost::container::basic_string<...> *) 0x7ffff7ff52a0
(gdb) p (*myshmvector)[0].data()
$13 = 0x7ffff7ff52a9 "this is my text"

Here we see that myshmvector (which is a bip::vector of bip::string, with all allocators correctly defined) is constructed in the shared memory address range, as is it's first element, as well as the internal data of the first element.

My guess is that fixing this issue will also resolve #4.

@ChristopherHogan
Copy link
Collaborator Author

Here's a program that triggers a crash because of this issue:

#include <assert.h>
#include <stdio.h>

#include <mpi.h>
#include <boost/interprocess/containers/string.hpp>

#include <hcl/map/map.h>

namespace bip = boost::interprocess;

struct KeyType{
  size_t a;
  KeyType():a(0){}
  KeyType(size_t a_):a(a_){}
#ifdef HCL_ENABLE_RPCLIB
  MSGPACK_DEFINE(a);
#endif
  /* equal operator for comparing two Matrix. */
  bool operator==(const KeyType &o) const {
    return a == o.a;
  }
  KeyType& operator=( const KeyType& other ) {
    a = other.a;
    return *this;
  }
  bool operator<(const KeyType &o) const {
    return a < o.a;
  }
  bool operator>(const KeyType &o) const {
    return a > o.a;
  }
  bool Contains(const KeyType &o) const {
    return a==o.a;
  }
#if defined(HCL_ENABLE_THALLIUM_TCP) || defined(HCL_ENABLE_THALLIUM_ROCE)
  template<typename A>
  void serialize(A& ar) const {
    ar & a;
  }
#endif
};

namespace std {
template<>
struct hash<KeyType> {
  size_t operator()(const KeyType &k) const {
    return k.a;
  }
};
}  // namespace std

namespace thallium {
template<class A>
inline void save(A& ar, bip::string& s) {
  size_t size = s.size();
  ar.write(&size);
  ar.write((const char*)(&s[0]), size);
}

template<class A>
inline void load(A& ar, bip::string& s) {
  size_t size;
  s.clear();
  ar.read(&size);
  s.resize(size);
  ar.read((char*)(&s[0]),size);
}
}  // namespace thallium

// Run with 3 MPI processes
int main(int argc,char* argv[])
{
  int provided;
  MPI_Init_thread(&argc,&argv, MPI_THREAD_MULTIPLE, &provided);
  if (provided < MPI_THREAD_MULTIPLE) {
    printf("Didn't receive appropriate MPI threading specification\n");
    exit(EXIT_FAILURE);
  }

  int comm_size;
  int my_rank;
  MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
  MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);

  if (comm_size < 3) {
    fprintf(stderr, "Run with at least 3 MPI processes to reproduce the error\n");
    MPI_Abort(MPI_COMM_WORLD, -1);
  }

  bool is_server = my_rank == 0;
  HCL_CONF->IS_SERVER = is_server;
  HCL_CONF->MY_SERVER = 0;
  HCL_CONF->NUM_SERVERS = 1;
  HCL_CONF->SERVER_ON_NODE = 1;
  HCL_CONF->SERVER_LIST_PATH = "./server_list";

  const int request_size = 4096;

  hcl::map<KeyType, bip::string> *map;
  if (is_server) {
    map = new hcl::map<KeyType, bip::string>();
  }
  MPI_Barrier(MPI_COMM_WORLD);
  if (!is_server) {
    map = new hcl::map<KeyType, bip::string>();
  }

  MPI_Comm client_comm;
  MPI_Comm_split(MPI_COMM_WORLD, !is_server, my_rank, &client_comm);
  int client_comm_size;
  MPI_Comm_size(client_comm, &client_comm_size);
  int client_rank;
  MPI_Comm_rank(client_comm, &client_rank);

  MPI_Barrier(MPI_COMM_WORLD);
  if (!is_server) {

    auto key = KeyType(0);
    bip::string my_vals(request_size, 'x');
    if (client_rank == 0) {
      map->Put(key, my_vals);
    }
    MPI_Barrier(client_comm);

    if (client_rank != 0) {
      std::pair<bool, bip::string> result_pair = map->Get(key);
      bip::string result = result_pair.second;
      assert(result.size() == my_vals.size());
      for (size_t j = 0; j < result.size(); ++j) {
        assert(result[j] == my_vals[j]);
      }
    }
  }
  MPI_Barrier(MPI_COMM_WORLD);
  delete(map);
  MPI_Finalize();

  return 0;
}

@hariharan-devarajan
Copy link
Collaborator

hariharan-devarajan commented Dec 2, 2020

@ChristopherHogan So to solve this as u recommend we need to expose the shared memory segment outside. So that users can use our allocated to build their structures.

so instead of

bip::basic_string mystring("hello");

users will do

/* These two lines can be defined by us for users*/
typedef boost::interprocess::allocator<char, boost::interprocess::managed_mapped_file::segment_manager> HermesCharAllocator;
typedef bip::basic_string<char, std::char_traits<char>, HermesCharAllocator> HermesString;
/* Then they can do */
HermesCharAllocator charallocator(map->segment.get_segment_manager());
HermesString mystring(charallocator);
mystring="hello";

Did this is what u meant?

@ChristopherHogan
Copy link
Collaborator Author

That's part of it, but not the whole story. If we want to have a hcl::map<HermesString, int> for example, then we need to pass an allocator instance to the map when we construct it. Look at the containers of containers example in the docs.

@hariharan-devarajan
Copy link
Collaborator

Not necessarily. We can get the allocator of the Map itself. Something like this

    typedef boost::interprocess::allocator<char, boost::interprocess::managed_mapped_file::segment_manager> CharAllocator;
    typedef bip::basic_string<char, std::char_traits<char>, CharAllocator> MyShmString;
    hcl::unordered_map<KeyType,MyShmString> *map;
    if (is_server) {
        map = new hcl::unordered_map<KeyType,MyShmString>();
    }
    MPI_Barrier(MPI_COMM_WORLD);
    if (!is_server) {
        map = new hcl::unordered_map<KeyType,MyShmString>();
    }

    CharAllocator charallocator(map->segment.get_segment_manager());
    MyShmString shared_vals(charallocator);
    shared_vals.resize(size_of_request,'x');

Essentially this is analogous to the example u put in where the same segment is used to build an allocator out of the shared memory segment. Point is to use the same shared memory segment to build your allocator for the string.

Am I missing something?

@ChristopherHogan
Copy link
Collaborator Author

If that fixes the crash, then sure.

@ChristopherHogan
Copy link
Collaborator Author

MyShmString shared_vals(charallocator);

I think this line is what Thallium will have a problem with. It needs to construct a MyShmString with a default constructor during serialization, but there is no default constructor since you have to pass charallocator.

@hariharan-devarajan
Copy link
Collaborator

hariharan-devarajan commented Dec 2, 2020

Yes, that is a separate problem altogether. Point is we need to use the shared memory allocator for local but normal containers for serialization.

@hariharan-devarajan
Copy link
Collaborator

hariharan-devarajan commented Dec 2, 2020

@ChristopherHogan What about this. Idea is from general unordered_maps which take in allocators for the map structure. Additionally, I can add value allocators and do it internally myself.

/*Class change*/
template<typename KeyType, typename MappedType, class Allocator=std::allocator<std::pair<const KeyType, MappedType>>,class SharedType=std::nullptr_t>
class unordered_map;

/*users can do the following*/
typedef boost::interprocess::allocator<char, boost::interprocess::managed_mapped_file::segment_manager> CharAllocator;
    typedef bip::basic_string<char, std::char_traits<char>, CharAllocator> MappedUnitString;
    hcl::unordered_map<KeyType,std::string,CharAllocator,MappedUnitString> *map;
    if (is_server) {
        map = new hcl::unordered_map<KeyType,std::string,CharAllocator,MappedUnitString>();
        map->SetIsDynamic(true);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    if (!is_server) {
        map = new hcl::unordered_map<KeyType,std::string,CharAllocator,MappedUnitString>();
    }

/* Internally the put can allocation itself unordered_map::put*/

if(is_dynamic){
        Allocator allocator(segment.get_segment_manager());
        SharedType value(allocator);
        value = data;
        myHashMap->insert_or_assign(key, value);
    }else
        myHashMap->insert_or_assign(key, data);

Note it works with both thallium and shared memory this way.
The above flow works. Do u see any optimizations?

@hariharan-devarajan
Copy link
Collaborator

As per my understanding, currently, on Hermes, we are not using these dynamic containers at all. We generally have structs as types that are of const size. Is that correct?

@ChristopherHogan
Copy link
Collaborator Author

As per my understanding, currently, on Hermes, we are not using these dynamic containers at all. We generally have structs as types that are of const size. Is that correct?

We have a map of (int -> char*) that needs the ability to grow dynamically, although the realloc isn't implemented yet.

@hariharan-devarajan
Copy link
Collaborator

These are the use-cases for name right? If so, they are generally used with char[256] upper bound. i have tested that in past it's better to bound them. Unless you're using it for dynamic data that I think you're also managing through fixed-sized arrays.

@ChristopherHogan
Copy link
Collaborator Author

Yes, the names have an upper bound. I just meant the map itself probably should be dynamic eventually. Currently it's a fixed size though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants