Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xtensa Multi-core critical-section is not sound(?) #1171

Closed
i404788 opened this issue Feb 15, 2024 · 7 comments · Fixed by #1175
Closed

Xtensa Multi-core critical-section is not sound(?) #1171

i404788 opened this issue Feb 15, 2024 · 7 comments · Fixed by #1175
Assignees
Labels
bug Something isn't working

Comments

@i404788
Copy link
Contributor

i404788 commented Feb 15, 2024

I'm currently debugging a weird issue on the ESP32S3 with embassy multi-core related to critical section, as it seems like it tries to double-unlock the MULTICORE_LOCK from different threads when using the embassy executor and I'm not entirely sure why it happens.

What seems to happen:

  • esp-hal's embassy impl uses set_alarm for the executor poll (Seems to be on thread_id=1, token=4294967295)
  • In set_alarm it aquires a critical_section, and at the end attempts to drop it
  • While Dropping (at https://github.com/esp-rs/esp-hal/blob/main/esp-hal/src/lib.rs#L279) a level3 interrupt hits
  • The interrupt gets handled by time_driver_timg.rs EmbassyTimer::on_interrupt
  • EmbassyTimer::on_interrupt aquires it's own critical_section and when it tries to drop the owner isn't thread_id() (thread_id seems to be 0 at this stack frame).

I'm not sure how it's possible to have 2 threads in one call stack, but it seems like thread_1 unlocks, and before finishing the sync thread_0 locks, causing thread_1 to overwrite the owner. Let me know if I'm missing something

@i404788 i404788 changed the title Multi-core critical-section is not sound(?) Xtensa Multi-core critical-section is not sound(?) Feb 15, 2024
@MabezDev
Copy link
Member

@bugadani sorry for the ping, but do you have any ideas on this one?

@jessebraham jessebraham added the bug Something isn't working label Feb 15, 2024
@bugadani
Copy link
Contributor

Someone broke this assumption

@bjoernQ
Copy link
Contributor

bjoernQ commented Feb 16, 2024

Oh this

fn get_raw_core() -> usize {
    ((xtensa_lx::get_processor_id() as usize & 0x2000) != 0) as usize
}

will very likely produce a value of 1

@MabezDev
Copy link
Member

Git blaming that line points me to this PR: #1126 I'm not sure what the crash was? I don't see an issue associated with it.

@bjoernQ
Copy link
Contributor

bjoernQ commented Feb 16, 2024

@MabezDev
Copy link
Member

@i404788 Apologies for the breakage, I have the fix in #1175, would you mind giving it a try? Likewise @ProfFan would you mind testing that I haven't broken your use case too?

@ProfFan
Copy link
Contributor

ProfFan commented Feb 16, 2024

Git blaming that line points me to this PR: #1126 I'm not sure what the crash was? I don't see an issue associated with it.

Oops, it's me :P

@github-project-automation github-project-automation bot moved this from Todo to Done in esp-rs Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants