Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happened when three different ends, and why. #503

Closed
ha0lyu opened this issue Nov 13, 2024 · 14 comments
Closed

What happened when three different ends, and why. #503

ha0lyu opened this issue Nov 13, 2024 · 14 comments
Assignees
Labels
question Further information is requested

Comments

@ha0lyu
Copy link

ha0lyu commented Nov 13, 2024

Related component: cosim with a REF

Describe the question
I built the env by xs-env and run emu to test xiangshan, the ref model is NEMU. Recently, I encountered 3 different ends I have never met.

  1. When instr trace & ref regs are printed:
    1.1 "Core 0 dump: critical_error raise" printed.
    What happened to here? Why this happened?
    1.2 "Core 0: [31mABORT at pc = "
    Usually, when "Core 0: [31mABORT at pc = " printed, it will print "xxx different at pc =", but this time, it did not print it.
    This is abnormal, isn't it?

  2. “Assertion failed at /home/xs-env/XiangShan/build/rtl/DCache.sv:1430. The simulation stopped. There might be some assertion failed.”:
    What happened and why? Is there a bug triggered it?

What you expect us (DiffTest developers) to do for you
I need someone can help me to figure out those questions.

Thanks a lot.

@ha0lyu ha0lyu added the question Further information is requested label Nov 13, 2024
@poemonsense
Copy link
Member

  1. When instr trace & ref regs are printed:
    1.1 "Core 0 dump: critical_error raise" printed.
    What happened to here? Why this happened?

See #486

1.2 "Core 0: [31mABORT at pc = "
Usually, when "Core 0: [31mABORT at pc = " printed, it will print "xxx different at pc =", but this time, it did not print it.
This is abnormal, isn't it?

Please list full messages for us to help you.

  1. “Assertion failed at /home/xs-env/XiangShan/build/rtl/DCache.sv:1430. The simulation stopped. There might be some assertion failed.”:
    What happened and why? Is there a bug triggered it?

Simply as it says, the assertion fails.

@ha0lyu
Copy link
Author

ha0lyu commented Nov 13, 2024

Here is the tail of a log:

......
[24] commit pc 0000000080000014 inst 00000297 wen 1 dst 05 data 0000000080000014 idx 00e auipc   t0, 0x0
[25] commit pc 0000000080000018 inst 00028293 wen 1 dst 05 data 0000000080000014 idx 00f mv      t0, t0
[26] commit pc 000000008000001c inst 30529073 wen 0 dst 00 data 0000000000000000 idx 010 csrw    mtvec, t0 <--
......
virtualization mode: 0
privilege mode:3
pmp: 16 entries active, details:
......
vtype: 0x8000000000000000 vstart: 0x0000000000000000 vxsat: 0x0000000000000001
vxrm: 0x0000000000000003 vl: 0x0000000000000000 vcsr: 0x0000000000000007
tselect: 0x0000000000000000
privilegeMode: 3
Core 0: �[31mABORT at pc = 0x80000e78
�[0m�[35mCore-0 instrCnt = 7,025, cycleCnt = 70,921, IPC = 0.099054
�[0m�[34mSeed=0 Guest cycle spent: 70,924 (this will be different from cycleCnt if emu loads a snapshot)
�[0m�[34mHost time spent: 126,463ms
�[0m

I guess there is a mismatch of mtval reg, but there was no message about it, especially different at pc =. I ran this workload again, it shows. It may be a problem with the previous redirection of output, but now it's shows in terminal.

Thank you sir.

@ha0lyu ha0lyu closed this as completed Nov 13, 2024
@poemonsense
Copy link
Member

Please list the full message here. Otherwise, we are unable to help.

@ha0lyu
Copy link
Author

ha0lyu commented Nov 18, 2024

1.2 We set raaddr as trap handler, so the ra holds the pc of raaddr function, and (ra) is the first instruction of the function. When we run amoand.w gp, sp, (ra) to set the instruction to 0x00000000 at (ra), then jump to (ra), it should execute 0x00000000 instruction, which is c.unimp, right?
image

But we use DiffTest to execute it, the instruction in pc=0x800000010 is not changed, and we got abort latter.
basic_abort.zip

@ha0lyu ha0lyu reopened this Nov 18, 2024
@ha0lyu
Copy link
Author

ha0lyu commented Nov 18, 2024

Please note the "In the last commit group" field in log files. Based on A, mentioned above, I test B, C & D.

B: Do something before nop at raaddr function:

raaddr:
    # DO SOMTHING #
    nop
    nop

B-1: addi t0, zero, 0x50. XiangShan will execute it bug NEMU won't.
B-2: addi t0, zero, 0x0 and la t0, raaddr: NEMU did not load the address, XiangShan load a wrong value.
case-B.zip

@ha0lyu
Copy link
Author

ha0lyu commented Nov 18, 2024

C: load imm:

raaddr:
    la t0, raaddr
    li t0, imm
    nop
    nop

If we don't set t0 = 0x0 before execute illegual instruction, NEMU will not load the imm.

imm = 123, we get:
t0 different at pc = 0x0080000010, right= 0x0000000080000010, wrong = 0x000000000000007b

We set t0 = 0x0:

when: li t0, 0x80000000:
t0 different at pc = 0x0080000010, right= 0x0000000000000000, wrong = 0x0000000000000001
when: li t0, 0x8ffff
t0 different at pc = 0x0080000010, right= 0x0000000000000000, wrong = 0x000000000008ffff

@ha0lyu
Copy link
Author

ha0lyu commented Nov 18, 2024

Last one D: XiangShan load raaddr correctly:

raaddr:
    la t0, raaddr
    nop
    nop

we got:

t0 different at pc = 0x0080000010, right= 0x0000000000000000, wrong = 0x0000000080000010

C.log
D.log

Environment
XiangShan branch: Master
XiangShan commit id: 011f1ef
NEMU commit id: b9507fb

@ha0lyu
Copy link
Author

ha0lyu commented Nov 18, 2024

I am not sure whether the bugs are in XiangShan or NEMU, or the DiffTest. Because I got the 1.2 ABORT first, and I checked it got those results.

@poemonsense
Copy link
Member

For us as the cosim framework developers, we are responsible for ensuring DiffTest reports the correct error information to the user. We will not respond to any bugs you encountered on any designs, because that is out of the scope of this repo.

Thus, if you are facing DiffTest errors on XiangShan with some error messages, please raise an issue in the XiangShan repo.

If you are facing a case where difftest aborts but no error information is shown, please list the full message here, so that we can help see what happens.

@ha0lyu
Copy link
Author

ha0lyu commented Nov 18, 2024

Thanks for your reply.
I'm not sure if there is a problem with DiffTest, I want to confirm there is no problem in DiffTest, please check the log file in the zip file I attached before. I think I've attached the full message, if you think that's not it, please tell me what the full message is, and how to generate.
BTW: I generate the log by run emu -i /path2/test.bin 2>/dev/null 1>/path2/test.log.

@poemonsense
Copy link
Member

Please also redirect stderr to file and provide us with full messages.

@ha0lyu
Copy link
Author

ha0lyu commented Nov 19, 2024

Run ./build/emu -i /home/path2/b1.bin > /home/path2/b1.log 2>&1, got this:
b1.log
I have no more messages.

@poemonsense
Copy link
Member

In your log file: t0 different at pc = 0x0080000010, right= 0x0000000000000000, wrong = 0x0000000000000050.

Though this log seems different from the last zip file you provided, I believe DiffTest shows the correct error information for this case.

@ha0lyu
Copy link
Author

ha0lyu commented Nov 20, 2024

Thanks.

@ha0lyu ha0lyu closed this as completed Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants