We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://hub.docker.com/r/rapidsai/base
version : 24.10
No response
Hi.
I want to profile cugraph with nsight system to check gpu dram bandwidth and pcie bandwidth.
For this, i use nsys profile --gpu-metrics-device=0 command.
I got the profiling result, but result has some error.
Below is nsys profile --gpu-metrics-device=0 command output.
Importer error status: Importation succeeded with non-fatal errors. **** Analysis failed with: Status: TargetProfilingFailed Props { Items { Type: DeviceId Value: "Local (CLI)" } } Error { Type: RuntimeError Props { Items { Type: ErrorText Value: "GPU Metrics [0]: NVPA_STATUS_ERROR\n- API function: Nvpw.GPU_PeriodicSampler_DecodeCounters_V2(¶ms)\n- Error code: 1\n- Source function: virtual QuadDDaemon::EventSource::PwMetrics::PeriodicSampler::DecodeResult QuadDDaemon::EventSource::{anonymous}::GpuPeriodicSampler::DecodeCounters(uint8_t*, size_t) const\n- Source location: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Target/Daemon/EventSource/GpuMetrics.cpp:242" } } }
The image show that collecting gpu metrics suddenly stoped.
import dask_cudf from dask.distributed import Client from dask_cuda import LocalCUDACluster import cugraph import cugraph.dask as dask_cugraph import cugraph.dask.comms.comms as Comms from cugraph.generators.rmat import rmat import time import argparse import rmm def main(): parser = argparse.ArgumentParser() description = '''python bfs.py --n_workers 1 --visible_devices 0,1,2,3 --dataset /HUVM/dataset/graph/soc-twitter-2010.csv --loop''' parser.add_argument('--n_workers', type=int, required=True, help='number of workers') parser.add_argument('--visible_devices', type=str, required=True, help='comma-separated CUDA_VISIBLE_DEVICES (e.g. 0,1,2,3)') parser.add_argument('--dataset', type=str, required=True, help='path to graph dataset') parser.add_argument('--loop', default=False, action='store_true', help='run one time or in loop') args = parser.parse_args() # Initialize the CUDA cluster cluster = LocalCUDACluster( rmm_managed_memory=True, rmm_pool_size="50GB", CUDA_VISIBLE_DEVICES=args.visible_devices, n_workers=args.n_workers ) client = Client(cluster) Comms.initialize(p2p=True) # Initialize multi-GPU communication # Set the reader chunk size to automatically get one partition per GPU chunksize = dask_cugraph.get_chunksize(args.dataset) # Multi-GPU CSV reader e_list = dask_cudf.read_csv( args.dataset, chunksize=chunksize, delimiter=' ', names=['src', 'dst'], dtype=['int32', 'int32'] ) # Create a directed graph from the edge list G = cugraph.Graph(directed=True) G.from_dask_cudf_edgelist(e_list, source='src', destination='dst') # Run BFS in loop or once based on the argument if args.loop: while True: t_start = time.time() result = dask_cugraph.bfs(G, start=1) # Use 'start' argument # wait(result) # Ensure computation finishes print("Execution time: ", time.time() - t_start) else: t_start = time.time() result = dask_cugraph.bfs(G, start=1) # Use 'start' argument # wait(result) # Ensure computation finishes print("Execution time: ", time.time() - t_start) # Clean up Comms.destroy() client.close() cluster.close() if __name__ == "__main__": main()
This is my bfs benchmark code.
Is there any bug about cugraph application with gpu performance counter?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Version
https://hub.docker.com/r/rapidsai/base
version : 24.10
Which installation method(s) does this occur on?
No response
Describe the bug.
Hi.
I want to profile cugraph with nsight system to check gpu dram bandwidth and pcie bandwidth.
For this, i use nsys profile --gpu-metrics-device=0 command.
I got the profiling result, but result has some error.
Below is nsys profile --gpu-metrics-device=0 command output.
The image show that collecting gpu metrics suddenly stoped.
This is my bfs benchmark code.
Is there any bug about cugraph application with gpu performance counter?
Code of Conduct
The text was updated successfully, but these errors were encountered: