Testing merge #17

MarcelKoch · 2024-12-09T10:12:18Z

No description provided.

This PR adds workaround for tbb on github action because the environment does not install oneTBB. Related PR: ginkgo-project#1666

…el precision. Thus, we can not cast e to current level precision unless it is the last level

This pr fixes the casting issue in mixed multigrid Related PR: ginkgo-project#1663

Co-authored-by: Pratik Nayak <[email protected]>

Unify and simplify batch functionality: Multivector Related PR: ginkgo-project#1651

+ also fix kernel names: remove _kernel suffix

Unify and simplify batch functionality: Matrix formats (csr, dense, ell) Related PR: ginkgo-project#1669

Co-authored-by: Marcel Koch <[email protected]> Co-authored-by: Pratik Nayak <[email protected]>

This pr adds the file config for schwarz preconditioner. Moreover, it add the global_index into type_descriptor now Related PR: ginkgo-project#1658

This causes some kernels on ROCm debug builds to fail

There is some weird interaction between inlining of shfl_xor and the (otherwise unused) members of thread_block_tile. The easiest way of working around it is to inline them explicitly as __shfl_xor(_sync).

This fixes some issues with assertions and the bitonic sorting kernels on ROCm 6.x Related PR: ginkgo-project#1670

Co-authored-by: Yu-Hsiang M. Tsai <[email protected]> Co-authored-by: Marcel Koch <[email protected]>

…ead_distributed Additive read distributed This PR adds an option to communicate overlap in the distributed matrix' `read_distributed`. If the option is used, nonzero entries present in multiple ranks are added up on the owning rank rather than thrown away in `read_distributed`. This functionality can also be used via a free function `assemble_rows_from_neighbors` just using `device_matrix_data`. This can be useful e.g. if in a domain decomposed finite element setting, each rank assembles their local contribution to a global system matrix and when assembling the global system matrix information on the subdomain boundaries has to be exchanged. Related PR: ginkgo-project#1650

- bitmap: lead wrong answer or segfault - hashmap: infinite loop

…pposite behavior). Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Natalie Beams <[email protected]>

…rategy" to avoid new strategy input

Co-authored-by: Tobias Ribizel <[email protected]>

Co-authored-by: Natalie Beams <[email protected]>

Co-authored-by: Marcel Koch <[email protected]>

Co-authored-by: Tobias Ribizel <[email protected]>

This PR adds ginkgo own impelementation for ILU and IC. It is controlled by `incomplete_algorithm::sync_free`. Related PR: ginkgo-project#1684

yhmtsai and others added 30 commits August 20, 2024 10:12

use tbb from onemkl, and add the path after installing

b7aa7d3

Merge fix not-found tbb in github action

3bf0ce7

This PR adds workaround for tbb on github action because the environment does not install oneTBB. Related PR: ginkgo-project#1666

e uses next level precision, but the coarest solver uses the last lev…

bee3291

…el precision. Thus, we can not cast e to current level precision unless it is the last level

Merge Fix the casting issue in mixed multigrid

9e1c73c

This pr fixes the casting issue in mixed multigrid Related PR: ginkgo-project#1663

unify cuda/hip batch_mvec

c7336ca

[cuda,hip] update namespaces and includes

69fbc2c

[ref, omp] move kernels to headers

af240e0

[kernels] remove GKO_DEVICE_NAMESPACE

56f38c4

[dpcpp] move to proper headers

fa7c43f

[format] Format files

a4b4e22

Co-authored-by: Pratik Nayak <[email protected]>

Merge (ginkgo-project#1651): Unify batch functionality: Multivector

83a577c

Unify and simplify batch functionality: Multivector Related PR: ginkgo-project#1651

[cuda, hip] unify csr, dense and ell kernels

c0e434c

[ref, omp] unify csr, dense, ell kernels

88f330a

+ also fix kernel names: remove _kernel suffix

[dpcpp] unify dpcpp kernels

58b184b

[hip, cuda] remove unnecessary .hip.cpp/.cu files

dc8c904

fixup! [dpcpp] unify dpcpp kernels

1714994

[cuda, hip] unify batch_struct headers

daa1087

[cuda, hip] rem anon namespace, type defs

e567dd2

[ref] set device namespace with CMake

114bf3e

[unified] rem device_namespace defines in source

404de48

Merge (ginkgo-project#1669): Unify batch functionality: Matrix formats

70669c6

Unify and simplify batch functionality: Matrix formats (csr, dense, ell) Related PR: ginkgo-project#1669

add schwarz config whose global index from file

bb7385c

only set the global index via type descriptor

82e8d62

add schwarz config test

ca1b236

update documentation and use macro

9335476

Co-authored-by: Marcel Koch <[email protected]> Co-authored-by: Pratik Nayak <[email protected]>

Merge Add Schwarz config

9929854

This pr adds the file config for schwarz preconditioner. Moreover, it add the global_index into type_descriptor now Related PR: ginkgo-project#1658

remove assertion workaround

eb97b49

This causes some kernels on ROCm debug builds to fail

fix ROCm 6.x segfaults on MI50

0d66f5e

There is some weird interaction between inlining of shfl_xor and the (otherwise unused) members of thread_block_tile. The easiest way of working around it is to inline them explicitly as __shfl_xor(_sync).

more precise shuffle bounds

acb4ccc

Merge fix for ROCm 6.1 issues

c09529f

This fixes some issues with assertions and the bitonic sorting kernels on ROCm 6.x Related PR: ginkgo-project#1670

Fritz Goebel and others added 29 commits November 28, 2024 13:51

Fix circular dependency with array.fill

7d04306

Address Review comments

fd6c6c4

Move additive read distributed to free function

b7026a2

Add documentation for the assemble function

6ffc9b3

Address review comments

e7feebd

Move fill_send_buffers to unified kernels

8c75a11

Address review comments

248688c

Adress review comments

9f01c53

Co-authored-by: Yu-Hsiang M. Tsai <[email protected]> Co-authored-by: Marcel Koch <[email protected]>

Fix multiple definitions in dpcpp

1bb85b9

Move assembly_helpers to assembly

d1a56f3

add failed test when given symbolic without fillin.

7c1c8f4

- bitmap: lead wrong answer or segfault - hashmap: infinite loop

fix infinite loop of lookup_hash_unsafe and add test in reference

68cb9db

add checked_lookup into LU

df8eab4

add ilu syncfree through lu implementation

8179c0b

update the documentation, change checked_lookup -> has_full_fillin (o…

e7330b1

…pposite behavior). Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Natalie Beams <[email protected]>

use unpack directly from factorization and add unpack with strategy

8d4d85c

remove the duplicated initialization

cbde108

Revert "use unpack directly from factorization and add unpack with st…

d2ee3a3

…rategy" to avoid new strategy input

update documentation

38bc404

cholesky failed test

c258472

move the algorithm enum to another header

8951e72

cholesky with safe lookup, wrap it into Ic, and add some missing tests

7df9746

copy the call from lu/cholesky to ilu/ic and delete full_fillin

29c7ce2

Co-authored-by: Tobias Ribizel <[email protected]>

refine the wording and fix wrong bool value

da98fc2

Co-authored-by: Natalie Beams <[email protected]>

update to incomplete_factorization and throw with omp using sparselib

c28c9a2

fix Ref also considered as Omp executor

cd31504

Co-authored-by: Marcel Koch <[email protected]>

use if constexpr when it is possible and reverse if-else

f1951bb

Co-authored-by: Tobias Ribizel <[email protected]>

Merge ginkgo-project#1684 Add ginkgo own ILU and IC

e848148

This PR adds ginkgo own impelementation for ILU and IC. It is controlled by `incomplete_algorithm::sync_free`. Related PR: ginkgo-project#1684

MarcelKoch merged commit 97c825c into main Dec 9, 2024
5 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing merge #17

Testing merge #17

MarcelKoch commented Dec 9, 2024

Testing merge #17

Testing merge #17

Conversation

MarcelKoch commented Dec 9, 2024