Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs about reading symmetric mtx data and processing data #5

Open
YangWang92 opened this issue Dec 22, 2019 · 8 comments
Open

Bugs about reading symmetric mtx data and processing data #5

YangWang92 opened this issue Dec 22, 2019 · 8 comments

Comments

@YangWang92
Copy link

YangWang92 commented Dec 22, 2019

Hi all, I found a bug in reading symmetric mtx data.
For example, when I tried to run

./gspmm --debug=true --max_ncols=4 ./4_4coo_dense.mtx

to read 4x4 dense matrix from 4_4coo_dense.mtx.

It will load a broken matrix from the symmetric mtx.

Wrong results

%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 13
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:2 [6]:3 [7]:0 [8]:1 [9]:3 [10]:0 [11]:1 [12]:2 [13]:0 [14]:4113 [15]:0 [16]:0 [17]:0 [18]:0 [19]:0 [20]:0 [21]:0 [22]:0 [23]:0 [24]:0 [25]:0 [26]:0 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:0 [33]:0 [34]:0 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:7 [3]:10 [4]:13 [5]:14 [6]:81 [7]:0 [8]:0 [9]:0 [10]:0 [11]:0 [12]:1 [13]:1 [14]:1 [15]:2 [16]:2 [17]:2 [18]:3 [19]:3 [20]:3 [21]:35143 [22]:3 [23]:3 [24]:35143 [25]:-1456 [26]:81 [27]:0 [28]:0 [29]:1 [30]:2 [31]:3 [32]:0 [33]:2 [34]:3 [35]:0 [36]:1 [37]:3 [38]:0 [39]:1
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:9.10844e-44 [15]:0 [16]:1.51901e-38 [17]:0 [18]:1.49695e-38 [19]:0 [20]:2.69808e-38 [21]:0 [22]:0 [23]:1.44118e+17 [24]:1.05553e+14 [25]:4.58715e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:1.23145e+14 [33]:4.58715e-41 [34]:2.93874e-38 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
pretty print:
x x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.036416 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
x x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 13.

Correct Results

%%MatrixMarket matrix coordinate real general
%
4 4 16
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 1
2 3 1
2 4 1
3 1 1
3 2 1
3 3 1
3 4 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:35143 [17]:96 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:81 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:0 [9]:0 [10]:4482352 [11]:0 [12]:-1672595478 [13]:7953 [14]:33 [15]:0 [16]:42126000 [17]:0 [18]:42126096 [19]:0 [20]:32 [21]:0 [22]:49 [23]:0 [24]:0 [25]:0 [26]:34833056 [27]:0 [28]:42189248 [29]:0 [30]:1638970554 [31]:1868983913 [32]:203358240 [33]:7967 [34]:49 [35]:0 [36]:0 [37]:0 [38]:41772304 [39]:0
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:1.13505e-43 [19]:0 [20]:0 [21]:0 [22]:1 [23]:1 [24]:1 [25]:1 [26]:1 [27]:1 [28]:1 [29]:1 [30]:1 [31]:1 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2.95797e+17 [37]:4.58631e-41 [38]:2.70451e-43 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.033792 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

@ctcyang
Copy link
Contributor

ctcyang commented Dec 24, 2019

Hi @wyatuestc, I tried on the latest commit from master (e37dea0) and I was unable to reproduce the error you got. Here's what I got:

ctcyang@mario:~/merge-spmm/build$ bin/gspmm --debug=true --max_ncols=4 ../4_4coo_dense.mtx
ta:    32
tb:    32
nt:    128
row:   1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:861104459 [17]:1414221919 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:273 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:40336064 [9]:0 [10]:40336496 [11]:0 [12]:0 [13]:0 [14]:49 [15]:0 [16]:40312448 [17]:0 [18]:1685221231 [19]:1952542313 [20]:1701978213 [21]:1730178145 [22]:1919250021 [23]:27745 [24]:1162162274 [25]:1768519237 [26]:81 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2 [37]:2 [38]:2 [39]:2
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:3.82554e-43 [19]:0 [20]:1.70078e-37 [21]:0 [22]:1.70092e-37 [23]:0 [24]:8.96831e-44 [25]:0 [26]:1.70057e-37 [27]:0 [28]:1.56318e-37 [29]:0 [30]:-7.03773e+13 [31]:4.56599e-41 [32]:-7.03773e+13 [33]:4.56599e-41 [34]:1.43493e-42 [35]:0 [36]:7.60905e-43 [37]:0 [38]:1.56322e-37 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.044384 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

Could you doublecheck you were on the latest master branch commit?

@YangWang92
Copy link
Author

YangWang92 commented Dec 24, 2019

Hi Carel,
Thanks for the reply.
I mean that it can not read the "symmetric" mtx data rather than "general" data.
You can try to read this file

%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1

Thanks!
Yang

@ctcyang
Copy link
Contributor

ctcyang commented Dec 24, 2019

Hi @wyatuestc, thanks I confirmed that this is indeed a bug. It should be fixed in the new commit c147935. The reason is that as pointed out in this issue, the code will filter out the self-loops (i.e. elements on the diagonal). However, the bug caused it to fail to do so if the diagonal nonzero happened to be the 1 1 element (i.e. the first element). Now the code behaves as intended:

ctcyang@mario:~/merge-spmm/build$ bin/gspmm --debug=true --max_ncols=4 ../4sym_coo_dense.mtx
ta:    32
tb:    32
nt:    128
row:   1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 12
csrColInd:
[0]:1 [1]:2 [2]:3 [3]:0 [4]:2 [5]:3 [6]:0 [7]:1 [8]:3 [9]:0 [10]:1 [11]:2 [12]:3 [13]:1065353216 [14]:65 [15]:0 [16]:1065353216 [17]:1065353216 [18]:1065353216 [19]:1065353216 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:7 [29]:1065353216 [30]:65 [31]:0 [32]:17620512 [33]:0 [34]:17483824 [35]:0 [36]:26246064 [37]:0 [38]:0 [39]:1543504138
csrRowPtr:
[0]:0 [1]:3 [2]:6 [3]:9 [4]:12 [5]:0 [6]:33 [7]:0 [8]:26245536 [9]:0 [10]:26245968 [11]:0 [12]:0 [13]:0 [14]:49 [15]:0 [16]:26222064 [17]:0 [18]:1685221231 [19]:1952542313 [20]:1701978213 [21]:1931504737 [22]:1701670265 [23]:1667854964 [24]:1162162176 [25]:1768519237 [26]:81 [27]:0 [28]:0 [29]:0 [30]:0 [31]:1 [32]:1 [33]:1 [34]:2 [35]:2 [36]:2 [37]:3 [38]:3 [39]:3
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:9.80909e-45 [13]:1 [14]:9.10844e-44 [15]:0 [16]:2.58733e-38 [17]:0 [18]:2.54902e-38 [19]:0 [20]:5.30747e-38 [21]:0 [22]:0 [23]:1.4412e+17 [24]:7.03687e+13 [25]:4.56683e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:4.37148e-38 [33]:0 [34]:0 [35]:0 [36]:7.03687e+13 [37]:4.56683e-41 [38]:2.93874e-39 [39]:0
pretty print:
0 x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.054528 ms
denseVal:
[0]:0 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
0 x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 12.

As pointed out in the issue, if you don't want it to filter out diagonal elements, you will have to set to false

bool remove_self_loops=true ) {

Then the result will be the same as the general case:

ctcyang@mario:~/merge-spmm/build$ bin/gspmm --debug=true --max_ncols=4 ../4sym_coo_dense.mtx
ta:    32
tb:    32
nt:    128
row:   1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:861104459 [17]:1414221919 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:273 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:27720240 [9]:0 [10]:27720672 [11]:0 [12]:0 [13]:0 [14]:49 [15]:0 [16]:27696624 [17]:0 [18]:1685221231 [19]:1952542313 [20]:1701978213 [21]:1931504737 [22]:1701670265 [23]:1667854964 [24]:1162162176 [25]:1768519237 [26]:81 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2 [37]:2 [38]:2 [39]:2
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:3.82554e-43 [19]:0 [20]:6.13444e-38 [21]:0 [22]:6.13518e-38 [23]:0 [24]:8.96831e-44 [25]:0 [26]:6.13342e-38 [27]:0 [28]:5.44648e-38 [29]:0 [30]:4.00049 [31]:4.56151e-41 [32]:4.00049 [33]:4.56151e-41 [34]:1.43493e-42 [35]:0 [36]:7.60905e-43 [37]:0 [38]:5.44664e-38 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.046368 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

@YangWang92
Copy link
Author

Thanks!
BTW, is it possible to execute these codes on newer architecture GPUs (turing/volta/pascal) ?

@ctcyang
Copy link
Contributor

ctcyang commented Dec 24, 2019

I've tested on Volta and Pascal and it works, but have not tested Turing. It should work for Turing in theory, or at least with minimal modification.

@YangWang92
Copy link
Author

YangWang92 commented Dec 24, 2019

Thanks! I'm running gspmm on RTX2080 (Turing).
I found that the correctness of gspmm depends on the sharp of the matrix.
For example, it worked well on a 4x4 matrix but crashed on a 4x32 matrix.
I also execute gspmm on some square matrices, and I found it cannot work on some matrices (512x512, 1024x1024).
I'm not sure whether it is related to GPU arch or some corner cases in source codes.
Thanks!
Yang

  • 4x4 Correct
  • 4x32 Wrong
  • 16x16 Correct
  • 32x32 Correct
  • 64x64 Correct
  • 128x128 Correct
  • 256x256 Correct
  • 512x512 Wrong
  • 1024x1024 Wrong

command:

./bin/gspmm --debug=true --mode="mergepath" ./[matrix].mtx

data:
data.zip

@YangWang92 YangWang92 changed the title Bug in reading symmetric mtx data. Bugs about reading symmetric mtx data and processing data Dec 25, 2019
ctcyang pushed a commit that referenced this issue May 5, 2020
@ctcyang
Copy link
Contributor

ctcyang commented May 5, 2020

Sorry for the slow response. Thank you so much for bringing this to my attention, Yang! I tried your datasets, and 4 x 32/1024 x 1024 are indeed wrong. I tested on Tesla V GPU with 12GB memory.

I tracked down the 4 x 32 error to an incorrect assumption in the test file gspmm.cu where I assumed square matrices. Therefore, only the first A.nrows of the dense B matrix was initialized correctly. This should be fixed in commit 100ddca. Please see the diff here: 100ddca

Need some more time to investigate 1024 x 1024 error.

@ctcyang
Copy link
Contributor

ctcyang commented Jul 31, 2020

Hi @YangWang92, thanks for pointing out the error. If you use the command: bin/gspmm --debug 1 --mode="mergepath" --nt=512 --iter=10 dataset/data/1024_1024coo_dense.mtx, it gives the correct solution.

Still investigating why this value for nt is the magic number. I suspect it has to do with how the number of blocks is calculated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants