Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: refactor RROS kernel code #48

Open
3 tasks
shannmu opened this issue Oct 1, 2024 · 8 comments
Open
3 tasks

Tracking issue: refactor RROS kernel code #48

shannmu opened this issue Oct 1, 2024 · 8 comments
Assignees

Comments

@shannmu
Copy link
Contributor

shannmu commented Oct 1, 2024

Tracking issue

Additional Information

@shannmu
Copy link
Contributor Author

shannmu commented Oct 1, 2024

Refactor the current code, which is a breaking change that we will not implement immediately. This thread is for discussing and documenting the issues encountered during the current project development, including unreasonable code design, unsafe code implementations, missing infrastructure, and hacky code that had to be used to push the project forward, etc. We will also discuss how to resolve these issues to make it easier for contributors to contribute to the community and benefit from it.

For any unreasonable parts of the current code, or areas you would like to improve, feel free to give us your feedback.

@Richardhongyu
Copy link
Contributor

Currently, RROS relies on dynamic detection to warn when calling a non-realtime API in the out-of-band code path. This can not cover all the cases.

One possible solution is to utilize the Rust-type system to prevent calling non-realtime API in the real-time code path. This is also beneficial for recovering the R kernel when Linux breaks due to BUGs/wild pointers.

@shannmu
Copy link
Contributor Author

shannmu commented Oct 8, 2024

I am considering whether we need to introduce Cargo. Currently, we are facing several issues:

  • Introducing Cargo into the project requires additional work, mainly some modifications to the compilation process.
  • Introducing Cargo would bring about the issue of kernel code referencing external code, which might be unacceptable to the upstream community.

The benefits of introducing Cargo are also obvious:

  • A richer ecosystem of no_std crates.
  • Cargo makes project management simpler.
  • It allows the no_std components implemented by RROS to provide crates to external users as one of the targets, making the code more usable.

@shannmu
Copy link
Contributor Author

shannmu commented Oct 8, 2024

I think we need a stronger lint check to standardize the community's coding style, which will also make our code more rusty.

@shannmu
Copy link
Contributor Author

shannmu commented Oct 8, 2024

I think the detailed code refactoring is actually more important than the overall design. To keep this thread from getting too messy, I’m going to split the code detail refactoring into a separate issue. See #53

@Richardhongyu
Copy link
Contributor

Richardhongyu commented Oct 8, 2024

I am considering whether we need to introduce Cargo. Currently, we are facing several issues:

As a kernel project relying on Linux, we need to sync with the upstream. Whether to utilize Cargo is up to the choice of RFL.

  • A richer ecosystem of no_std crates.

Using Cargo can bring more crates. But importing crates can be achieved with Cargo. You can check the following:

  • Cargo makes project management simpler.

Linux has its build toolchain. Using Cargo may conflict with the previous.

IMO, we could use Cargo for importing crates but not for building our system. When to enable Cargo formally depends on the RFL. But we could construct a no_std development environment based on the Cargo experimentally to explore.

@ruiqurm
Copy link
Contributor

ruiqurm commented Oct 8, 2024

I am considering whether we need to introduce Cargo. Currently, we are facing several issues:

As a Linux kernel project, we need to sync with the upstream. Whether to utilize Cargo is up to the choice of RFL.

  • A richer ecosystem of no_std crates.

Using Cargo can bring more crates. But importing crates can be achieved with Cargo. You can check the following:

  • Cargo makes project management simpler.

Linux has its build toolchain. Using Cargo may conflict with the previous.

IMO, we could use Cargo for importing crates but not for building our system. When to enable Cargo formally depends on the RFL. But we could construct a no_std development environment based on the Cargo experimentally.

Using cargo as built tools:
Pros:

  • Rich ecosystem
  • We can run an independent OS without Linux in some days.

Using Kbuild:
Pros:

  • integrate with Linux origin code perfectly.
  • It seems that RFL maintainers do not want to introduce Cargo. Using Kbuild can synchronize with them more easily.

And there is another issue we need to consider. To what extent can the no_std library be adapted for use? I noticed some Rust OS like Redox did not use many dependencies. So we may not get a lot of benefit from the no_std ecosystem?

@Richardhongyu
Copy link
Contributor

Using cargo as built tools: Pros:

  • Rich ecosystem

OSDK is organized with Cargo. If we could utilize Rust drivers from OSDK, we could import Cargo. We need to investigate whether this is possible.

  • We can run an independent OS without Linux in some days.

If there exists a powerful open-source GPOS being widely used in smart controlling tasks and we decide to support it, Cargo is a good choice. But for the foreseeable days, RROS is still bound to Linux.

Using Kbuild: Pros:

  • integrate with Linux origin code perfectly.
  • It seems that RFL maintainers do not want to introduce Cargo. Using Kbuild can synchronize with them more easily.

And there is another issue we need to consider. To what extent can the no_std library be adapted for use? I noticed some Rust OS like Redox did not use many dependencies. So we may not get a lot of benefit from the no_std ecosystem?

I think the main benefit is to re-use drivers like virtio and some language features that are not yet merged with Rustc but exist as a library.

shannmu pushed a commit to shannmu/RROS that referenced this issue Dec 15, 2024
During the migration of Soundwire runtime stream allocation from
the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845
soundcard was forgotten.

At this point any playback attempt or audio daemon startup, for instance
on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer
NULL dereference:

 Unable to handle kernel NULL pointer dereference at virtual
 address 0000000000000020
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000
 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [BUPT-OS#1] PREEMPT SMP
 Modules linked in: ...
 CPU: 5 UID: 0 PID: 1198 Comm: aplay
 Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty BUPT-OS#18
 Hardware name: Thundercomm Dragonboard 845c (DT)
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus]
 lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus]
 sp : ffff80008a2035c0
 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000
 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800
 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003
 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000
 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000
 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec
 x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003
 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8
 Call trace:
  sdw_stream_add_slave+0x44/0x380 [soundwire_bus]
  wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x]
  snd_soc_dai_hw_params+0x3c/0xa4
  __soc_pcm_hw_params+0x230/0x660
  dpcm_be_dai_hw_params+0x1d0/0x3f8
  dpcm_fe_dai_hw_params+0x98/0x268
  snd_pcm_hw_params+0x124/0x460
  snd_pcm_common_ioctl+0x998/0x16e8
  snd_pcm_ioctl+0x34/0x58
  __arm64_sys_ioctl+0xac/0xf8
  invoke_syscall+0x48/0x104
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x34/0xe0
  el0t_64_sync_handler+0x120/0x12c
  el0t_64_sync+0x190/0x194
 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22)
 ---[ end trace 0000000000000000 ]---

0000000000006108 <sdw_stream_add_slave>:
    6108:       d503233f        paciasp
    610c:       a9b97bfd        stp     x29, x30, [sp, #-112]!
    6110:       910003fd        mov     x29, sp
    6114:       a90153f3        stp     x19, x20, [sp, BUPT-OS#16]
    6118:       a9025bf5        stp     x21, x22, [sp, BUPT-OS#32]
    611c:       aa0103f6        mov     x22, x1
    6120:       2a0303f5        mov     w21, w3
    6124:       a90363f7        stp     x23, x24, [sp, BUPT-OS#48]
    6128:       aa0003f8        mov     x24, x0
    612c:       aa0203f7        mov     x23, x2
    6130:       a9046bf9        stp     x25, x26, [sp, BUPT-OS#64]
    6134:       aa0403f9        mov     x25, x4        <-- x4 copied to x25
    6138:       a90573fb        stp     x27, x28, [sp, #80]
    613c:       aa0403fb        mov     x27, x4
    6140:       f9418400        ldr     x0, [x0, #776]
    6144:       9100e000        add     x0, x0, #0x38
    6148:       94000000        bl      0 <mutex_lock>
    614c:       f8420f22        ldr     x2, [x25, BUPT-OS#32]!  <-- offset 0x44
    ^^^
This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave()
where data abort happens.
wsa881x_hw_params() is called with stream = NULL and passes it further
in register x4 (5th argument) to sdw_stream_add_slave() without any checks.
Value from x4 is copied to x25 and finally it aborts on trying to load
a value from address in x25 plus offset 32 (in dec) which corresponds
to master_list member in struct sdw_stream_runtime:

struct sdw_stream_runtime {
        const char  *              name;	/*     0     8 */
        struct sdw_stream_params   params;	/*     8    12 */
        enum sdw_stream_state      state;	/*    20     4 */
        enum sdw_stream_type       type;	/*    24     4 */
        /* XXX 4 bytes hole, try to pack */
 here-> struct list_head           master_list;	/*    32    16 */
        int                        m_rt_count;	/*    48     4 */
        /* size: 56, cachelines: 1, members: 6 */
        /* sum members: 48, holes: 1, sum holes: 4 */
        /* padding: 4 */
        /* last cacheline: 56 bytes */

Fix this by adding required calls to qcom_snd_sdw_startup() and
sdw_release_stream() to startup and shutdown routines which restores
the previous correct behaviour when ->set_stream() method is called to
set a valid stream runtime pointer on playback startup.

Reproduced and then fix was tested on db845c RB3 board.

Reported-by: Dmitry Baryshkov <[email protected]>
Cc: [email protected]
Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards")
Cc: Srinivas Kandagatla <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Alexey Klimov <[email protected]>
Tested-by: Steev Klimaszewski <[email protected]> # Lenovo Yoga C630
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Srinivas Kandagatla <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants