Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(BPU): btbHit is incorrectly invalidated causing the correct PHT result to be invalidated #147

Merged
merged 2 commits into from
Dec 4, 2023

Conversation

ngc7331
Copy link
Contributor

@ngc7331 ngc7331 commented Nov 29, 2023

Behavior

I was trying to implement GShare algorithm on NutShell's BPU(ngc7331@bc7cddf), then I noticed something strange (lines 1, 6 and 7 of the log below).

1 [                3005] GHR update spec pc=0080004c7c ghr=77ff->0efff
2 [                3005] valid=1, btbHit=1, btbRead._type === BTBtype.B=1, phtTaken=1
3 [                3006] valid=0, btbHit=0, btbRead._type === BTBtype.B=1, phtTaken=1
4 [                3007] valid=0, btbHit=0, btbRead._type === BTBtype.B=1, phtTaken=1
5 [                3028] predictWrong=1, taken=1, isBranch=1, brIdx(0)=0, predicted at                 3007
6 [                3029] GHR update miss pc=0080004c7c ghr=efff->0efff, predicted at                 3007
7 [                3030] PHT update pc=0080004c7c, ghr=77ff, cnt=2->3
  1. 3005: PHT gives the prediction 1(=taken) and updates GHR speculatively.
  2. 3029: GHR receives an update request from ALU, suggesting that 3007 made an incorrect prediction, but the real branch result is 1(=taken) too.

So I added more logs to the ALU and BPU, giving line 5 and lines 2~4 above, respectively. We can see that:

  1. 3005: BTB gives hit, type = BTBtype.B, and PHT gives prediction = taken, so the BPU gives redirect valid = 1. (correct)
  2. 3006: Somehow btbHit resets to 0, so the BPU gives valid = 0. (wrong)
  3. 3007: IFU handshake with imem bus successful, select snpc as fetch address based on BPU output (valid=0). (wrong)
  4. 3028: ALU detects a branch prediction error, the real result is taken, and issued a redirect request.
  5. 3029/3030: BPU updates accroding to the request.

To sum up, the correct PHT result is invalidated and caused the IFU to operate on the wrong fetch path, reducing branch prediction accuracy and IPC.

The correct log should look like:

1 [                3005] GHR update spec pc=0080004c7c ghr=77ff->0efff
2 [                3005] valid=1, btbHit=1, btbRead._type === BTBtype.B=1, phtTaken=1
3 [                3006] valid=1, btbHit=1, btbRead._type === BTBtype.B=1, phtTaken=1
4 [                3007] valid=1, btbHit=1, btbRead._type === BTBtype.B=1, phtTaken=1
5 [                3028] predictWrong=0, taken=1, isBranch=1, brIdx(0)=1, predicted at                 3007
// no update is needed

The original BPU behaves exactly the same except for the GHR update mechanism, so it's not a bug of my changes on the fork.

Solution

btb.io.r.req.valid := io.in.pc.valid

val btbHit = btbRead.valid && btbRead.tag === btbAddr.getTag(pcLatch) && !flush && RegNext(btb.io.r.req.fire, init = false.B) && !(pcLatch(1) && btbRead.brIdx(0))

Notice the RegNext(btb.io.r.req.fire, init = false.B) in btbHit. It is only valid for one cycle after the request is valid (io.in.pc.valid = 1), but the IFU is not guaranteed a successful handshake with the imem in that cycle, causing the problem.

On the other hand, btbRead.valid seems to be enough to ensure that btbHit is valid.

Therefore, a quick solution is to just delete this RegNext.

The modifications are tested using:

  • microbench test
  • microbench ref
  • dhrystone 500000 runs
  • coremark 1000 iterations

And the result shows a 2%~14% increase on prediction accuracy and 1.5%~7% increase on IPC over baseline (aeb6019).

image

Note: In the above table, the "right" column in the "IPC" row is the actual IPC value and the "acc" column is the ratio to baseline.

@ngc7331
Copy link
Contributor Author

ngc7331 commented Nov 29, 2023

BPU_embedded is not changed because it looks like it hasn't been maintained in a long time. I can't even compile it by make CORE=embedded emu.

@ngc7331 ngc7331 marked this pull request as draft November 29, 2023 08:03
@ngc7331
Copy link
Contributor Author

ngc7331 commented Nov 29, 2023

Update:

If just delete RegNext(btb.io.r.req.fire, init = false.B), btbRead.valid could be fake when SRAM is doing reset (e.g. fence.i and sfence.vma will trigger BPU flush), and further caused linux kernel panic.

So I use btb.io.r.req.ready to replace btb.io.r.req.fire. On the one hand, it indicates whether SRAM is doing reset, if it is false.B, btbHit will not be set; on the other hand, it will not be affected by io.in.pc.valid, still solve the original problem.

@ngc7331 ngc7331 marked this pull request as ready for review November 29, 2023 08:51
@sashimi-yzh sashimi-yzh merged commit 63a2687 into OSCPU:master Dec 4, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants