Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete topo map #41

Open
wants to merge 162 commits into
base: huawei
Choose a base branch
from

Conversation

shizhibao
Copy link

No description provided.

alinask and others added 30 commits March 27, 2019 08:54
…_check

UCT: Disable the cm transport if the ib_ucm.ko module is not loaded - v1.6.x
UCX won't pass host memory to rocm_ipc, so remove
the CMA code in IPC which is used to support host
memory.

Signed-off-by: Qiang Yu <[email protected]>
dlerror(), which is called from dynamic module loader, is calling
asprintf() which is allocating memory. Under valgrind, reloc hooks are
not catching intra-glibc calls to malloc, so need special hook for
asprintf().
UCM: Add reloc hooks for (v)asprintf - v1.6.x
Since now libucm.so is not linked with "nodelete", it may be unloaded
before it installs malloc hooks. In this case we would not have any
environment strings, so no need to release. If it did install hooks,
it means the library was aready reopened with RTLD_NODELETE, so no need
the strings anyway.

This fixes a case where libucm.so was unloaded before hooks were
installed, and called clearenv() which removed all environment variables
from the program, leading to segfault.

Since we are not clearing the environment now, we also cannot release
the allocated strings. This is not reported as memory leak by Valgrind,
since they are still reachable from the global array of environ strings.
GTEST/JENKINS: Fix CUDA testing - v1.6.x
…-v1.6.x

UCM: Avoid releasing environment strings in library destructor - v1.6.x
…et_orig_v1_6_x

UCM: Make ucm_reloc_get_orig() a static function - v1.6.x
(cherry picked from commit 086f664)
Also, unite with purge code and add unit test.

(cherry picked from commit 5dfef6d)
- in some cases reply to FC_HARD_REQ could not be sent immediately due
  to lack of HW resources, in this case request is pushed into arbiter.
  But in case if peer is falled into same situation - it could cause
  deadlock.
- fix: add FC grand request with high priority to send it out-of-order

(cherry picked from commit 0e3f71a)
(cherry picked from commit fb78062)
(cherry picked from commit c4d448d)
…r-head-v1.6

 UCS/ARBITER: Add function to push element to group head, fixed RC FC deadlock - v1.6
…_v1.6.x

UCT/DC: Fix OOO support for TM DCIs - v1.6.x
…nd-v1.6.x

UCM/TEST: Fix handling of shmat(SHM_REMAP|SHM_RND) - v1.6.x
nvidia-driver-devel replaced xorg-x11-drv-nvidia-devel, and we can have
boolean OR expressions only with rpm >= v4.13 (on rh7.4 it's v4.11).
…ild-deps-v1.6.x

SPEC: Remove CUDA BuildRequires, since package name is not consistent - v1.6.x
rpmbuild includes them in %postun scriptlet, which causes issues with
RPM uninstall step.
nsosnsos and others added 28 commits December 10, 2020 22:10
Fix UT test for discontig datatype
Create function_test.sh for automatic function test
Add communicator operation test cases
upgrade ucx to 1.9.0
Fixed the problem of long running time of some test scripts
fix bug for big messages of TCP
Modify GTEST to match the refactored code
Reduce the memory of the step structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.