Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hpx-parcelport analysis/code review skeleton #4

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions catalogs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This directory contains files that 'catalog' libfabric primitives (types,
variables, function calls) used in existing C or C++ codes (libraries,
applications, etc).
11 changes: 11 additions & 0 deletions catalogs/hpx_parcelport.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
libfabric-primitive,file,line,description,,
fi_info * fabric_info_,/plugins/parcelport/libfabric/libfabric_controller.hpp,124,variable decl,,
fid_fabric * fabric_,/plugins/parcelport/libfabric/libfabric_controller.hpp,125,variable decl,,
fid_domain * fabric_domain_,/plugins/parcelport/libfabric/libfabric_controller.hpp,126,variable decl,,
fid_pep * ep_passive_,/plugins/parcelport/libfabric/libfabric_controller.hpp,128,variable decl - server/listener for RDMA connections,,
fid_ep * ep_active_,/plugins/parcelport/libfabric/libfabric_controller.hpp,129,variable decl - server/listener for RDMA connections,,
fid_ep * ep_shared_rx_ctx_,/plugins/parcelport/libfabric/libfabric_controller.hpp,130,variable decl - server/listener for RDMA connections,,
fid_eq * event_queue_,/plugins/parcelport/libfabric/libfabric_controller.hpp,133,variable decl,one event queue for all connections,,
fid_cq * txcq_,/plugins/parcelport/libfabric/libfabric_controller.hpp,134,variable decl,one event queue for all connections,,
fid_cq * rxcq_,/plugins/parcelport/libfabric/libfabric_controller.hpp,134,variable decl,one event queue for all connections,,
fid_av * av_,/plugins/parcelport/libfabric/libfabric_controller.hpp,135,variable decl,one event queue for all connections,,
123 changes: 123 additions & 0 deletions catalogs/hpx_parcelport.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
the file hpx_parcelport.csv contains a table cataloging the following
information about the use of libfabric primitives in the hpx runtime.

libfabric-primitive,file,line,description,,

this table includes information about how the hpx runtime uses libfabric in
it's parcelport, or communication, system.

here is a description of the columns:

* libfabric-primitive - contains the name of the type and variable used in hpx
parcelport
* file - the file name containing the libfabric primitive
* line - the line number referencing the libfabric primitive's instantiation
* description - a quick explanation of how the libfabric primitive's instance
is used

this document will contain a more contextual analysis describing the hpx
parcelport's design and implementation. this document will attempt to provide
a more general perspective, or analysis, that ties together contents included
in the csv file.

file reviewed /plugins/parcelport/libfabric/libfabric_controller.hpp

*** summary of lines 142:176

libfabric_controller::libfabric_controller(
std::string provider,
std::string domain,
std::string endpoint
int port=7910);

constructor uses a function called open_fabric(provider, domain, endpoint) to
access the hardware.

constructor creates a memory pool object which manages a memory containing
rma_memory_pool<libfabric_region_provider>(fabric_domain_) data segments.
the constructor also initalizes a passive listener (or an active RDM endpoint)
using the function called create_local_endpoint()

*** summary of lines 315::402

libfabric_controller::open_fabric(
std::string provider,
std::string domain,
std::string endpoint_type);

this method calls fi_allocinfo() and stores results into a local variable called
struct fi_info * fabric_hints_. the method checks to see if fi_allocinfo returns
correctly.

fabric_hints_->caps is set to FI_MSG|FI_RMA|FI_SOURCE|FI_WRITE
|FI_READ|FI_REMOTE_READ|FI_REMOTE_WRITE|FI_RMA_EVENT

fabric_hints_->mode is set to FI_CONTEXT|FI_LOCAL_MR

fabric_hintes_->fabric_attr->prov_name is set to provider.c_str()

fabric_hints_->domain_attr->name is set to domain.c_str()

fabric_hints_->domain_attr->mr_mode is set to FI_MR_BASIC (basic IB
registration)

progress threads are disabled by setting

fabric_hints_->domain_attr->control_progress to FI_PROGRESS_MANUAL
fabric_hints_->domain_attr->data_progress to FI_PROGRESS_MANUAL

thread safe mode is enabled (and notes that this does not work with psm2
provider) by setting

fabric_hints_->domain_attr->threading to FI_THREAD_SAFE

resource management is enabled by setting

fabric_hints_->domain_attr->resource_mgmt to FI_RM_ENABLED

shared recv context is enabled for active endpoints

fabric_hints_->ep_attr->rx_ctx_cnt = FI_SHARED_CONTEXT

if the endpoint_type value is set to "msg" then the following value
is set:

fabric_hints->ep_attr->type to FI_EP_MSG

if the endpoint_type value is set to "rdm" then the following value
is set:

fabric_hints->ep_attr->type to FI_EP_RDM

if the endpoint_type value is set to "dgram" then the following value
is set:

fabric_hints->ep_attr->type to FI_EP_DGRAM

by default, the method wants completions on both tx/rx events and sets

fabric_hints_->tx_attr->op_flags to FI_COMPLETION
fabric_hints_->rx_attr->op_flags to FI_COMPLETION

fi_get_info is called and then tests to see if the following values
are set:

fabric_info_->rx_attr->mode & FI_RX_CQ_DATA != 0
fabric_hints_->mode & FI_CONTEXT != 0

fi_fabric is called and given the following arguments

fi_fabric(fabric_into_->fabric_attr, & fabric_, nullptr )

fi_domain is called and given the following arguments

fi_domain(fabric_, fabric_info_, &fabric_domain_, nullptr)

a method called '_set_disable_registration()' for Cray systems, it
disables memory registration caching.

fi_free_info is called and passed the following arguments

fi_free_info( fabric_hints_ )


10 changes: 5 additions & 5 deletions proposal.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,30 @@ VII. Acknowledgements
## I. Introduction

In the fall of 2017, the Open Fabrics Working Group (OFIWG) discussed proposing an extention to the current version of the ISO C++
Networking Technical Specification (N4643) to include support for HPC Fabrics. The intent of this proposal is to improve the
Networking Technical Specification (N4695) to include support for HPC Fabrics. The intent of this proposal is to improve the
programmability and accessibility of HPC interconnect hardware.

## II. Motivation and Scope

N4643 currently targets commodity, ethernet based, interconnects. Developing a fabric extension to N4643 will increase the accessibility
N4695 currently targets commodity, ethernet based, interconnects. Developing a fabric extension to N4695 will increase the accessibility
of fabric interconnects to HPC applications, runtimes, and languages. A fabric extension to the C++ Networking Technical Specification
will provide new mechanisms improving HPC application, runtime, and language performance and efficiencies.

## III. Impact On the Standard

The proposed fabric extension will depend on N4643. The proposed fabric extension is a "pure extension" of N4643. The current suite of
The proposed fabric extension will depend on N4695. The proposed fabric extension is a "pure extension" of N4695. The current suite of
libraries used for HPC fabrics are implemented in C99. This proposed fabric extension can be implemented, at a minimum, using C++11
compilers and libraries.

## IV. Design Decisions

Design decisions in this proposal are presented as an extension to N4643. This proposal may impact N4643.
Design decisions in this proposal are presented as an extension to N4695. This proposal may impact N4695.

## V. Technical Specifications

## VI. References

## VII. Acknowledgements

This document is based on N4643, the ISO C++ Networking Technical Specification.
This document is based on N4695, the ISO C++ Networking Technical Specification.