Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMGX crashed due to out of memory[Issue] #306

Open
haochenscut opened this issue May 22, 2024 · 1 comment
Open

AMGX crashed due to out of memory[Issue] #306

haochenscut opened this issue May 22, 2024 · 1 comment
Labels

Comments

@haochenscut
Copy link

Hello!

One of my AMGX cases crashed due to an error: out of memory.

Backtrace

See details below:

Caught amgx exception: Cuda failure: 'out of memory'
 at: /home/test/software/amgx/src/solvers/dense_lu_solver.cu:632
Stack trace:
 /home/test/software/cu_amgx/lib/libamgxsh.so : void amgx::dense_lu_solver::DenseLUSolver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::allocMem<float, int>(float*&, int, bool)+0x333
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::dense_lu_solver::DenseLUSolver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::solver_setup(bool)+0x361
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&, bool)+0x108
 /home/test/software/cu_amgx/lib/libamgxsh.so : void amgx::AMG_Setup<(AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2>::setup<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2>, (AMGX_MemorySpace)1, (AMGX_MemorySpace)0>(amgx::AMG<(AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2>*, amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&)+0x114
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::AMG<(AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2>::setup(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&)+0xeb
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::AlgebraicMultigrid_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::solver_setup(bool)+0x67
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&, bool)+0x108
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::FGMRES_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::solver_setup(bool)+0x2e6
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&, bool)+0x108
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::setup_no_throw(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&, bool)+0x80
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::AMG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >::setup(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)1, (AMGX_MatPrecision)1, (AMGX_IndPrecision)2> >&)+0x53
 /home/test/software/cu_amgx/lib/libamgxsh.so : amgx::AMGX_ERROR amgx::(anonymous namespace)::set_solver_with_shared<(AMGX_Mode)8465, amgx::AMG_Solver, amgx::Matrix>(AMGX_solver_handle_struct*, AMGX_matrix_handle_struct*, amgx::Resources*, amgx::AMGX_ERROR (amgx::AMG_Solver<amgx::TemplateMode<(AMGX_Mode)8465>::Type>::*)(std::shared_ptr<amgx::Matrix<amgx::TemplateMode<(AMGX_Mode)8465>::Type> >))+0x3eb
 /home/test/software/cu_amgx/lib/libamgxsh.so : AMGX_solver_setup()+0x282
 /home/test/workspace/mylib/bin/../lib/libmysolverlib.so : MySolver::solve()+0x8e4
 mysolverbin : main()+0x34a6
 /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0xf3
 mysolverbin : ()+0x27b4e

The program only involves an Ax=b solution, where A is a CSR square (float type) matrix with:

nnz = 373760

rowx = rowy = 51200

I have read your paper in SISC and realized the maximum nnz of the matrix can be far larger than mine.

Additionally, your GPU (K40) has identical memory capacity to mine (RTX 4070 Ti).

Config

Would you mind giving some advice on whether I need to set the memory pool size manually?

My configuration string only includes the solver:

    const char* config_string = R"(
        {
            "config_version": 2, 
            "solver": {
                "preconditioner": {
                    "error_scaling": 0, 
                    "print_grid_stats": 0, 
                    "max_uncolored_percentage": 0.05, 
                    "algorithm": "AGGREGATION", 
                    "solver": "AMG", 
                    "smoother": "MULTICOLOR_DILU", 
                    "presweeps": 0, 
                    "selector": "SIZE_2", 
                    "coarse_solver": "DENSE_LU_SOLVER", 
                    "max_iters": 1, 
                    "postsweeps": 2, 
                    "min_coarse_rows": 2, 
                    "relaxation_factor": 0.75, 
                    "scope": "amg", 
                    "max_levels": 1, 
                    "matrix_coloring_scheme": "PARALLEL_GREEDY", 
                    "cycle": "V"
                }, 
                "use_scalar_norm": 1, 
                "solver": "FGMRES", 
                "print_solve_stats": 0, 
                "obtain_timings": 0, 
                "max_iters": 10, 
                "monitor_residual": 1, 
                "gmres_n_restart": 10, 
                "convergence": "RELATIVE_INI_CORE", 
                "scope": "main", 
                "tolerance": 1e-10, 
                "norm": "L2"
            }
        }
    )";

Platform

NO_MPI, NO_OPENMP

Ubuntu 20.04 GCC 9.4.0 CUDA 12.2 CARD: RTX 4070 Ti (sm_89)

AMGX: the latest commit

@mattmartineau
Copy link
Collaborator

Apologies for the slow reply here.

If you are still encountering this, please share the full output.

AFAICT you are hitting the issue because you are setting max_levels to 1. The likely consequence is that the dense solver is immediately invoked and so we will convert the input sparse matrix to a dense matrix, which will be huge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants