Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test-triage] chip_sw_pwrmgr_main_power_glitch_reset #23961

Closed
2 tasks
elliotb-lowrisc opened this issue Jul 8, 2024 · 5 comments · Fixed by #24007
Closed
2 tasks

[test-triage] chip_sw_pwrmgr_main_power_glitch_reset #23961

elliotb-lowrisc opened this issue Jul 8, 2024 · 5 comments · Fixed by #24007

Comments

@elliotb-lowrisc
Copy link
Contributor

Hierarchy of regression failure

Chip Level

Failure Description

    Offending '(rst_sys_req[1] || rst_sys_src_n[1])'
  UVM_ERROR @ 1924.502411 us: (pwrmgr_rstmgr_sva_if.sv:44) [ASSERT FAILED] SysHandshakeOff_A

Steps to Reproduce

  • GitHub Revision: 2e5d86c9b5
  • dvsim invocation command to reproduce the failure, inclusive of build and run seeds:
    ./util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_pwrmgr_main_power_glitch_reset --build-seed 75221189197949424635294305394615322888112457483844341597147780944629972574676 --fixed-seed 19739929326588231339427436212387352319943555198022910986496607969950925791351 --waves fsdb
  • Kokoro build number if applicable

Tests with similar or related failures

  • Test_name_1
  • Test_name_2
@elliotb-lowrisc elliotb-lowrisc changed the title [test-triage] chip_sw_pwrmgr_main_power_glitch_reset [test-triage] chip_sw_pwrmgr_main_power_glitch_reset Jul 8, 2024
@vogelpi vogelpi assigned matutem and unassigned hayleynewton Jul 11, 2024
@vogelpi
Copy link
Contributor

vogelpi commented Jul 11, 2024

Asked @matutem whether he could look into this and he accepted :-)

@vogelpi vogelpi added this to the Earlgrey-PROD.M7 milestone Jul 11, 2024
@matutem
Copy link
Contributor

matutem commented Jul 11, 2024

This is an SVA issue: the assertion is missing the por_n_i rstmgr input. Bit 0 of por_n_i will cause a POR, but bit 1 indicates a glitch in the main power, and it will trigger a Domain0 reset. The SVA fix is easy.

One interesting and disturbing issue is that bit 1 of por_n_i is set to the ast's ast)pwst.main_pok signal, which will be active when there is a power glitch in the main (not aon) power, but this is synchronized to the AON clock in rstmgr. This means a short main glitch may not be sampled by the aon clock, and won't cause a reset. This also depends on the logic driving ast_pwst_o.main_pok in the AST. For safety in rstmgr rtl we could sync por_n_i[1] to its clk_i, and only clear that on rst_ni.

@andreaskurth
Copy link
Contributor

One interesting and disturbing issue is that bit 1 of por_n_i is set to the ast's ast)pwst.main_pok signal, which will be active when there is a power glitch in the main (not aon) power, but this is synchronized to the AON clock in rstmgr. This means a short main glitch may not be sampled by the aon clock, and won't cause a reset. This also depends on the logic driving ast_pwst_o.main_pok in the AST. For safety in rstmgr rtl we could sync por_n_i[1] to its clk_i, and only clear that on rst_ni.

Are you sure rstmgr samples its por_n_i input with clk_aon_i? I only found it connected to the rst_ni port of two modules:

rstmgr_por u_rst_por_aon (
.clk_i(clk_aon_i),
.rst_ni(por_n_i[i]),
.scan_rst_ni,
.scanmode_i(prim_mubi_pkg::mubi4_test_true_strict(por_scanmode[0])),
.rst_no(rst_por_aon_n[i])
);

prim_flop_2sync #(
.Width(1),
.ResetValue('0)
) u_por_domain_sync (
.clk_i(clk_aon_i),
// do not release from reset if aon has not
.rst_ni(rst_por_aon_n[DomainAonSel] & por_n_i[i]),
.d_i(1'b1),
.q_o(rst_por_aon_premux)
);

and I this reset is asynchronous, thus not sampled with clk_aon_i.

Am I missing something? If so, could you please point to the code you're referring to?

@matutem
Copy link
Contributor

matutem commented Jul 12, 2024

My concern is about u_por_domain_sync in line 114 show above. The clock used for the flops is clk_aon_i, so por_n_i is only captured once every 5 micros.

matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 12, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 15, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
@andreaskurth
Copy link
Contributor

Just discussed with @matutem that the reset is asynchronous, so using clk_aon_i is not a problem. PR #24007 (DV fix) can thus close this issue.

matutem added a commit to matutem/opentitan that referenced this issue Jul 17, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
matutem added a commit to matutem/opentitan that referenced this issue Jul 17, 2024
The assertions pwrmgr_rstmgr_sva_if need to be disabled when either aon or
main_pok are inactive. It is more reliable to bind them to rstmgr for
top-level simulations. This PR does that and connects rst_slow_ni to
&rst_por_aon_n, since rst_por_aon_n is 2-bit wide (one per domain), and
captures the rstmgr behavior more accurately.

This also removes the rstmgr_unit_only* files for simplicity.

Fixes lowRISC#23961

Signed-off-by: Guillermo Maturana <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants