Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOOLS/INFO: Validate -m flag #10524

Merged
merged 1 commit into from
Mar 4, 2025

Conversation

ovidiusm
Copy link
Contributor

@ovidiusm ovidiusm commented Feb 28, 2025

What?

Fix input validation for -m flag in ucx_info, to allow only positive memory size values

Why?

Before:

ucx_info -u a -m -1,host
[1740736432.183591] [rock16:2056552:0]             sys.c:924  UCX  ERROR   shmget(size=0 flags=0xfb0) for user memory failed: Invalid argument, please check shared memory limits by 'ipcs -l'
[1740736432.183613] [rock16:2056552:0]             sys.c:924  UCX  ERROR   shmget(size=0 flags=0x7b0) for user memory failed: Invalid argument, please check shared memory limits by 'ipcs -l'
[1740736432.183621] [rock16:2056552:0]         mm_sysv.c:114  UCX  ERROR   failed to allocate 18446744073709551615 bytes with mm for user memory
[1740736432.183628] [rock16:2056552:0]         uct_mem.c:161  UCX  ERROR   failed to allocate 18446744073709551615 bytes using md sysv for user memory: Out of memory
[1740736445.064992] [rock16:2056552:0]        mm_posix.c:208  UCX  ERROR   Not enough memory to write total of 18446744073709551615 bytes. Please check that /dev/shm or the directory you specified has more available memory.
[1740736448.669457] [rock16:2056552:0]         uct_mem.c:161  UCX  ERROR   failed to allocate 18446744073709551615 bytes using md posix for user memory: Out of memory
#
# UCP memory allocation
#
#  allocated 0 at address 0x7f4ff8200000 with thp, pagesize: 4K
#  registered on: self mlx5_0 mlx5_1 mlx5_2 mlx5_1 knem
#
<Failed to pack rkey: Invalid parameter>
[rock16:2056552:0:2056552]    rcache.inl:89   Assertion `region->refcount > 0' failed

/.autodirect/mtrswgwork/ovidium/ucx/src/ucs/memory/rcache.inl: [ ucs_rcache_region_put_unsafe() ]
      ...
       86 {
       87     ucs_rcache_region_lru_add(rcache, region);
       88
==>    89     ucs_assert(region->refcount > 0);
       90     if (ucs_unlikely(--region->refcount == 0)) {
       91         ucs_mem_region_destroy_internal(rcache, region, 0);
       92     }

==== backtrace (tid:2056552) ====
 0 0x00000000000582d6 ucs_rcache_region_put_unsafe()  /.autodirect/mtrswgwork/ovidium/ucx/src/ucs/memory/rcache.inl:89
 1 0x00000000000582d6 ucp_memh_put_rcache()  /.autodirect/mtrswgwork/ovidium/ucx/src/ucp/core/ucp_mm.c:421
 2 0x00000000000583f7 ucp_memh_cleanup()  /.autodirect/mtrswgwork/ovidium/ucx/src/ucp/core/ucp_mm.c:450
 3 0x000000000005b592 ucp_mem_unmap()  /.autodirect/mtrswgwork/ovidium/ucx/src/ucp/core/ucp_mm.c:1166
 4 0x000000000005cb8e ucp_mem_print_info()  /.autodirect/mtrswgwork/ovidium/ucx/src/ucp/core/ucp_mm.c:1571
 5 0x0000000000403c27 print_ucp_info()  /.autodirect/mtrswgwork/ovidium/ucx/src/tools/info/proto_info.c:382
 6 0x0000000000402ced main()  /.autodirect/mtrswgwork/ovidium/ucx/src/tools/info/ucx_info.c:318
 7 0x000000000003aca3 __libc_start_main()  ???:0
 8 0x0000000000402dfe _start()  ???:0
=================================
ucx_info -u a -m 0,host
#
# UCP memory allocation
#
#  allocated 0 at address (nil) with [rock16:2059140:0:2059140] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:2059140) ====
 0 0x0000000000012ce0 __funlockfile()  :0
 1 0x00000000000ccc95 __strlen_avx2()  :0
 2 0x0000000000087f69 __GI__IO_fputs()  :0
 3 0x000000000005cc26 ucp_mem_print_info()  /.autodirect/mtrswgwork/ovidium/ucx/src/ucp/core/ucp_mm.c:1541
 4 0x0000000000403c27 print_ucp_info()  /.autodirect/mtrswgwork/ovidium/ucx/src/tools/info/proto_info.c:382
 5 0x0000000000402ced main()  /.autodirect/mtrswgwork/ovidium/ucx/src/tools/info/ucx_info.c:318
 6 0x000000000003aca3 __libc_start_main()  ???:0
 7 0x0000000000402dfe _start()  ???:0
=================================

After:

ucx_info -u a -m -1,host
<Memory size must be greater than 0>
ucx_info -u a -m 0,host
<Memory size must be greater than 0>

@ovidiusm ovidiusm changed the title Tools/validate input m flag TOOLS/INFO: Validate input m flag Feb 28, 2025
@ovidiusm ovidiusm changed the title TOOLS/INFO: Validate input m flag TOOLS/INFO: Validate -m flag Feb 28, 2025
@ovidiusm ovidiusm force-pushed the tools/validate_input_m_flag branch from fe19c30 to a8e3a39 Compare February 28, 2025 10:33
@ovidiusm ovidiusm force-pushed the tools/validate_input_m_flag branch 3 times, most recently from def84a3 to 777477d Compare February 28, 2025 11:25
@ovidiusm ovidiusm force-pushed the tools/validate_input_m_flag branch 3 times, most recently from 0ef1747 to cb43ce2 Compare February 28, 2025 15:33
Co-authored-by: Raul Akhmetshin <[email protected]>

Co-authored-by: Raul Akhmetshin <[email protected]>
@ovidiusm ovidiusm force-pushed the tools/validate_input_m_flag branch from cb43ce2 to 2bebe6e Compare February 28, 2025 16:03
@brminich brminich merged commit e79ee1f into openucx:master Mar 4, 2025
151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants