Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support HMM profiling event #96

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Commits on Jan 26, 2022

  1. Add license file to smi-lib package

    Install LICENSE.txt to share/doc/smi-lib
    
    Change-Id: Idcbb70db8808111203e8e4a4c3ab4d1e070ac79d
    bill-shuzhou-liu committed Jan 26, 2022
    Configuration menu
    Copy the full SHA
    0da6e0e View commit details
    Browse the repository at this point in the history
  2. Add rpm License header

    Add rpm License header for cpack
    
    Change-Id: I2f4a89015b6389cfde801f41d4f6e0f59e7087aa
    bill-shuzhou-liu committed Jan 26, 2022
    Configuration menu
    Copy the full SHA
    bd3fda7 View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2022

  1. Add fix to check for vector size while reading pp_dpm_pcie

    pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.
    
    Signed-off-by: Divya Shikre <[email protected]>
    Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
    (cherry picked from commit ec71380)
    dishikre committed Jan 28, 2022
    Configuration menu
    Copy the full SHA
    66e101a View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2022

  1. ROCm SMI CLI: fix showevents from multiple GPUs

    SMI lib function rsmi_event_notification_get read events from all GPUs,
    each event returned with device dv_idx. Currently we create read thread
    for each GPU, it is not necessary because each thread reads same events,
    and each thread display events from other GPUs with incorrect GPU index.
    
    Create one read thread for multiple GPUs, and display event with correct
    GPU index received from data.dv_idx.
    
    Signed-off-by: Philip Yang <[email protected]>
    PhilipYangA committed Feb 15, 2022
    Configuration menu
    Copy the full SHA
    e96385d View commit details
    Browse the repository at this point in the history
  2. ROCm SMI LIB: add HMM migration and recoverable page fault events

    Update kfd_ioctl.h from KFD to add HMM migration and recoverable page
    fault, queue eviction and restore event, and event triggers defines.
    
    Update rocm_smi.h to add new SMI notification events and triggers
    defines, with the same enum value as kfd_ioctl.h, to avoid value
    translation in smi lib.
    
    Change fscanf %63s format to %MAX_EVENT_NOTIFICATION_MSG_SIZE[^\n] to
    read entire line as one message.
    
    Signed-off-by: Philip Yang <[email protected]>
    PhilipYangA committed Feb 15, 2022
    Configuration menu
    Copy the full SHA
    6020b7d View commit details
    Browse the repository at this point in the history
  3. ROCm SMI CLI: support HMM migration events

    Use SMI_EVENT_ALL_PROCESS to receive event from all processes
    because HMM migration events are per process event, KFD requires
    this flag plus super user premission to receive events from other
    process, so showevents to relaunchAsSudo if arguments are in
    new event list.
    
    User can specify event name in short format, for example "--showevents
    migrate" will show MIGRATE_START, MIGRATE_END events.
    
    Define message size using macro from rocm_smi.h
    
    Signed-off-by: Philip Yang <[email protected]>
    PhilipYangA committed Feb 15, 2022
    Configuration menu
    Copy the full SHA
    2e7eaf4 View commit details
    Browse the repository at this point in the history
  4. Unit test for HMM migration events

    Add new event names defines, set event mask RSMI_EVT_NOTIF_ALL_PROCESS
    to receive events from all processes.
    
    Add protection check in case new event type returns from KFD, to avoid
    out of range access segmentation fault.
    
    Signed-off-by: Philip Yang <[email protected]>
    PhilipYangA committed Feb 15, 2022
    Configuration menu
    Copy the full SHA
    0128466 View commit details
    Browse the repository at this point in the history