-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing containers within HCL containers is broken #5
Comments
Here's a program that triggers a crash because of this issue: #include <assert.h>
#include <stdio.h>
#include <mpi.h>
#include <boost/interprocess/containers/string.hpp>
#include <hcl/map/map.h>
namespace bip = boost::interprocess;
struct KeyType{
size_t a;
KeyType():a(0){}
KeyType(size_t a_):a(a_){}
#ifdef HCL_ENABLE_RPCLIB
MSGPACK_DEFINE(a);
#endif
/* equal operator for comparing two Matrix. */
bool operator==(const KeyType &o) const {
return a == o.a;
}
KeyType& operator=( const KeyType& other ) {
a = other.a;
return *this;
}
bool operator<(const KeyType &o) const {
return a < o.a;
}
bool operator>(const KeyType &o) const {
return a > o.a;
}
bool Contains(const KeyType &o) const {
return a==o.a;
}
#if defined(HCL_ENABLE_THALLIUM_TCP) || defined(HCL_ENABLE_THALLIUM_ROCE)
template<typename A>
void serialize(A& ar) const {
ar & a;
}
#endif
};
namespace std {
template<>
struct hash<KeyType> {
size_t operator()(const KeyType &k) const {
return k.a;
}
};
} // namespace std
namespace thallium {
template<class A>
inline void save(A& ar, bip::string& s) {
size_t size = s.size();
ar.write(&size);
ar.write((const char*)(&s[0]), size);
}
template<class A>
inline void load(A& ar, bip::string& s) {
size_t size;
s.clear();
ar.read(&size);
s.resize(size);
ar.read((char*)(&s[0]),size);
}
} // namespace thallium
// Run with 3 MPI processes
int main(int argc,char* argv[])
{
int provided;
MPI_Init_thread(&argc,&argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
printf("Didn't receive appropriate MPI threading specification\n");
exit(EXIT_FAILURE);
}
int comm_size;
int my_rank;
MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);
if (comm_size < 3) {
fprintf(stderr, "Run with at least 3 MPI processes to reproduce the error\n");
MPI_Abort(MPI_COMM_WORLD, -1);
}
bool is_server = my_rank == 0;
HCL_CONF->IS_SERVER = is_server;
HCL_CONF->MY_SERVER = 0;
HCL_CONF->NUM_SERVERS = 1;
HCL_CONF->SERVER_ON_NODE = 1;
HCL_CONF->SERVER_LIST_PATH = "./server_list";
const int request_size = 4096;
hcl::map<KeyType, bip::string> *map;
if (is_server) {
map = new hcl::map<KeyType, bip::string>();
}
MPI_Barrier(MPI_COMM_WORLD);
if (!is_server) {
map = new hcl::map<KeyType, bip::string>();
}
MPI_Comm client_comm;
MPI_Comm_split(MPI_COMM_WORLD, !is_server, my_rank, &client_comm);
int client_comm_size;
MPI_Comm_size(client_comm, &client_comm_size);
int client_rank;
MPI_Comm_rank(client_comm, &client_rank);
MPI_Barrier(MPI_COMM_WORLD);
if (!is_server) {
auto key = KeyType(0);
bip::string my_vals(request_size, 'x');
if (client_rank == 0) {
map->Put(key, my_vals);
}
MPI_Barrier(client_comm);
if (client_rank != 0) {
std::pair<bool, bip::string> result_pair = map->Get(key);
bip::string result = result_pair.second;
assert(result.size() == my_vals.size());
for (size_t j = 0; j < result.size(); ++j) {
assert(result[j] == my_vals[j]);
}
}
}
MPI_Barrier(MPI_COMM_WORLD);
delete(map);
MPI_Finalize();
return 0;
} |
@ChristopherHogan So to solve this as u recommend we need to expose the shared memory segment outside. So that users can use our allocated to build their structures. so instead of
users will do
Did this is what u meant? |
That's part of it, but not the whole story. If we want to have a |
Not necessarily. We can get the allocator of the Map itself. Something like this
Essentially this is analogous to the example u put in where the same segment is used to build an allocator out of the shared memory segment. Point is to use the same shared memory segment to build your allocator for the string. Am I missing something? |
If that fixes the crash, then sure. |
I think this line is what Thallium will have a problem with. It needs to construct a |
Yes, that is a separate problem altogether. Point is we need to use the shared memory allocator for local but normal containers for serialization. |
@ChristopherHogan What about this. Idea is from general unordered_maps which take in allocators for the map structure. Additionally, I can add value allocators and do it internally myself.
Note it works with both thallium and shared memory this way. |
As per my understanding, currently, on Hermes, we are not using these dynamic containers at all. We generally have structs as types that are of const size. Is that correct? |
We have a map of ( |
These are the use-cases for name right? If so, they are generally used with char[256] upper bound. i have tested that in past it's better to bound them. Unless you're using it for dynamic data that I think you're also managing through fixed-sized arrays. |
Yes, the names have an upper bound. I just meant the map itself probably should be dynamic eventually. Currently it's a fixed size though. |
I had a hunch that storing containers in HCL containers was not working correctly after reading this from the Boost.Interprocess documentation:
We follow this advice properly for the HCL container itself. For example, the internal
bip::unordered_map
in thehcl::unordered_map
takes a Boost.Interprocess shared memory allocator as a template parameter. However, if we then try to store abip::string
in ourhcl::unordered_map
, the documentation suggests that the string constructor needs to take a proper Boost.Interprocess allocator (which is currently not possible without changing the library). To verify my hunch, I did some investigation in the debugger. First, I examine the shared memory segment inhcl::unordered_map
to see the valid range of memory addresses that it emcompasses.So the shared memory addresses range from
0x7fffea218000
to0x7ffff2218000
.So let's see where our container data is being stored.
iterator
here is the result of callingmymap->find(key);
The
bip::string
itself is stored within the valid shared memory address range. This is expected, since we define the appropriate shared memory allocator on thehcl::unordered_map
. Now let's see where thebip::string
has allocated its internal data:This address is well outside the bounds of our shared memory. As a sanity check, I performed the same test on an example of containers within containers from the Boost.Interprocess documentation:
Here we see that
myshmvector
(which is abip::vector
ofbip::string
, with all allocators correctly defined) is constructed in the shared memory address range, as is it's first element, as well as the internal data of the first element.My guess is that fixing this issue will also resolve #4.
The text was updated successfully, but these errors were encountered: