Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoundWire: start adding BPT/BRA support #5266

Open
wants to merge 16 commits into
base: topic/sof-dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 337 additions & 0 deletions Documentation/driver-api/soundwire/bra.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
==========================
Bulk Register Access (BRA)
==========================

Conventions
-----------

Capitalized words used in this documentation are intentional and refer
to concepts of the SoundWire 1.x specification.

Introduction
------------

The SoundWire 1.x specification provides a mechanism to speed-up
command/control transfers by reclaiming parts of the audio
bandwidth. The Bulk Register Access (BRA) protocol is a standard
solution based on the Bulk Payload Transport (BPT) definitions.

The regular control channel uses Column 0 and can only send/retrieve
one byte per frame with write/read commands. With a typical 48kHz
frame rate, only 48kB/s can be transferred.

The optional Bulk Register Access capability can transmit up to 12
Mbits/s and reduce transfer times by several orders of magnitude, but
has multiple design constraints:

(1) Each frame can only support a read or a write transfer, with a
10-byte overhead per frame (header and footer response).

(2) The read/writes SHALL be from/to contiguous register addresses
in the same frame. A fragmented register space decreases the
efficiency of the protocol by requiring multiple BRA transfers
scheduled in different frames.

(3) The targeted Peripheral device SHALL support the optional Data
Port 0, and likewise the Manager SHALL expose audio-like Ports
to insert BRA packets in the audio payload using the concepts of
Sample Interval, HSTART, HSTOP, etc.

(4) The BRA transport efficiency depends on the available
bandwidth. If there are no on-going audio transfers, the entire
frame minus Column 0 can be reclaimed for BRA. The frame shape
also impacts efficiency: since Column0 cannot be used for
BTP/BRA, the frame should rely on a large number of columns and
minimize the number of rows. The bus clock should be as high as
possible.

(5) The number of bits transferred per frame SHALL be a multiple of
8 bits. Padding bits SHALL be inserted if necessary at the end
of the data.

(6) The regular read/write commands can be issued in parallel with
BRA transfers. This is convenient to e.g. deal with alerts, jack
detection or change the volume during firmware download, but
accessing the same address with two independent protocols has to
be avoided to avoid undefined behavior.

(7) Some implementations may not be capable of handling the
bandwidth of the BRA protocol, e.g. in the case of a slow I2C
bus behind the SoundWire IP. In this case, the transfers may
need to be spaced in time or flow-controlled.

(8) Each BRA packet SHALL be marked as 'Active' when valid data is
to be transmitted. This allows for software to allocate a BRA
stream but not transmit/discard data while processing the
results or preparing the next batch of data, or allowing the
peripheral to deal with the previous transfer. In addition BRA
transfer can be started early on without data being ready.

(9) Up to 470 bytes may be transmitted per frame.

(10) The address is represented with 32 bits and does not rely on
the paging registers used for the regular command/control
protocol in Column 0.


Error checking
--------------

Firmware download is one of the key usages of the Bulk Register Access
protocol. To make sure the binary data integrity is not compromised by
transmission or programming errors, each BRA packet provides:

(1) A CRC on the 7-byte header. This CRC helps the Peripheral Device
check if it is addressed and set the start address and number of
bytes. The Peripheral Device provides a response in Byte 7.

(2) A CRC on the data block (header excluded). This CRC is
transmitted as the last-but-one byte in the packet, prior to the
footer response.

The header response can be one of
(a) Ack
(b) Nak
(c) Not Ready

The footer response can be one of
(1) Ack
(2) Nak (CRC failure)
(3) Good (operation completed)
(4) Bad (operation failed)

Example frame
-------------

The example below is not to scale and makes simplifying assumptions
for clarity. The different chunks in the BRA packets are not required
to start on a new SoundWire Row, and the scale of data may vary.

::

+---+--------------------------------------------+
+ | |
+ | BRA HEADER |
+ | |
+ +--------------------------------------------+
+ C | HEADER CRC |
+ O +--------------------------------------------+
+ M | HEADER RESPONSE |
+ M +--------------------------------------------+
+ A | |
+ N | |
+ D | DATA |
+ | |
+ | |
+ | |
+ +--------------------------------------------+
+ | DATA CRC |
+ +--------------------------------------------+
+ | FOOTER RESPONSE |
+---+--------------------------------------------+


Assuming the frame uses N columns, the configuration shown above can
be programmed by setting the DP0 registers as:

- HSTART = 1
- HSTOP = N - 1
- Sampling Interval = N
- WordLength = N - 1

Addressing restrictions
-----------------------

The Device Number specified in the Header follows the SoundWire
definitions, and broadcast and group addressing are permitted. For now
the Linux implementation only allows for a single BPT transfer to a
single device at a time. This might be revisited at a later point as
an optimization to send the same firmware to multiple devices, but
this would only be beneficial for single-link solutions.

In the case of multiple Peripheral devices attached to different
Managers, the broadcast and group addressing is not supported by the
SoundWire specification. Each device must be handled with separate BRA
streams, possibly in parallel - the links are really independent.

Unsupported features
--------------------

The Bulk Register Access specification provides a number of
capabilities that are not supported in known implementations, such as:

(1) Transfers initiated by a Peripheral Device. The BRA Initiator is
always the Manager Device.

(2) Flow-control capabilities and retransmission based on the
'NotReady' header response require extra buffering in the
SoundWire IP and are not implemented.

Bi-directional handling
-----------------------

The BRA protocol can handle writes as well as reads, and in each
packet the header and footer response are provided by the Peripheral
Target device. On the Peripheral device, the BRA protocol is handled
by a single DP0 data port, and at the low-level the bus ownership can
will change for header/footer response as well as the data transmitted
during a read.

On the host side, most implementations rely on a Port-like concept,
with two FIFOs consuming/generating data transfers in parallel
(Host->Peripheral and Peripheral->Host). The amount of data
consumed/produced by these FIFOs is not symmetrical, as a result
hardware typically inserts markers to help software and hardware
interpret raw data

Each packet will typically have

(1) a 'Start of Packet' indicator.

(2) an 'End of Packet' indicator.

(3) a packet identifier to correlate the data requested and
transmitted, and the error status for each frame

Hardware implementations can check errors at the frame level, and
retry a transfer in case of errors. However, as for the flow-control
case, this requires extra buffering and intelligence in the
hardware. The Linux support assumes that the entire transfer is
cancelled if a single error is detected in one of the responses.

Abstraction required
~~~~~~~~~~~~~~~~~~~~

There are no standard registers or mandatory implementation at the
Manager level, so the low-level BPT/BRA details must be hidden in
Manager-specific code. For example the Cadence IP format above is not
known to the codec drivers.

Likewise, codec drivers should not have to know the frame size. The
computation of CRC and handling of responses is handled in helpers and
Manager-specific code.

The host BRA driver may also have restrictions on pages allocated for
DMA, or other host-DSP communication protocols. The codec driver
should not be aware of any of these restrictions, since it might be
reused in combination with different implementations of Manager IPs.

Concurrency between BRA and regular read/write
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The existing 'nread/nwrite' API already relies on a notion of start
address and number of bytes, so it would be possible to extend this
API with a 'hint' requesting BPT/BRA be used.

However BRA transfers could be quite long, and the use of a single
mutex for regular read/write and BRA is a show-stopper. Independent
operation of the control/command and BRA transfers is a fundamental
requirement, e.g. to change the volume level with the existing regmap
interface while downloading firmware. The integration must however
ensure that there are no concurrent access to the same address with
the command/control protocol and the BRA protocol.

In addition, the 'sdw_msg' structure hard-codes support for 16-bit
addresses and paging registers which are irrelevant for BPT/BRA
support based on native 32-bit addresses. A separate API with
'sdw_bpt_msg' makes more sense.

One possible strategy to speed-up all initialization tasks would be to
start a BRA transfer for firmware download, then deal with all the
"regular" read/writes in parallel with the command channel, and last
to wait for the BRA transfers to complete. This would allow for a
degree of overlap instead of a purely sequential solution. As a
results, the BRA API must support async transfers and expose a
separate wait function.


Peripheral/bus interface
------------------------

The bus interface for BPT/BRA is made of two functions

- sdw_bpt_message_send_async(bpt_message)

This function sends the data using the Manager
implementation-defined capabilities (typically DMA or IPC
protocol).

Queueing is currently not supported, the caller
needs to wait for completion of the requested transfer.

- sdw_bpt_message_wait()

This function waits for the entire message provided by the codec
driver in the 'send_async' stage. Intermediate status for
smaller chunks will not be provided back to the codec driver,
only a return code will be provided.


Regmap use
~~~~~~~~~~

Existing codec drivers rely on regmap to download firmware to
Peripherals. regmap exposes an async interface similar to the
send/wait API suggested above, so at a high-level it would seem
natural to combine BRA and regmap. The regmap layer could check if BRA
is available or not, and use a regular read-write command channel in
the latter case.

The regmap integration will be handled in a second step.

BRA stream model
----------------

For regular audio transfers, the machine driver exposes a dailink
connecting CPU DAI(s) and Codec DAI(s).

This model is not required BRA support:

(1) The SoundWire DAIs are mainly wrappers for SoundWire Data
Ports, with possibly some analog or audio conversion
capabilities bolted behind the Data Port. In the context of
BRA, the DP0 is the destination. DP0 registers are standard and
can be programmed blindly without knowing what Peripheral is
connected to each link. In addition, if there are multiple
Peripherals on a link and some of them do not support DP0, the
write commands to program DP0 registers will generate harmless
COMMAND_IGNORED responses that will be wired-ORed with
responses from Peripherals which support DP0. In other words,
the DP0 programming can be done with broadcast commands, and
the information on the Target device can be added only in the
BRA Header.

(2) At the CPU level, the DAI concept is not useful for BRA; the
machine driver will not create a dailink relying on DP0. The
only concept that is needed is the notion of port.

(3) The stream concept relies on a set of master_rt and slave_rt
concepts. All of these entities represent ports and not DAIs.

(4) With the assumption that a single BRA stream is used per link,
that stream can connect master ports as well as all peripheral
DP0 ports.

(5) BRA transfers only make sense in the concept of one
Manager/Link, so the BRA stream handling does not rely on the
concept of multi-link aggregation allowed by regular DAI links.

Audio DMA support
-----------------

Some DMAs, such as HDaudio, require an audio format field to be
set. This format is in turn used to define acceptable bursts. BPT/BRA
support is not fully compatible with these definitions in that the
format and bandwidth may vary between read and write commands.

In addition, on Intel HDaudio Intel platforms the DMAs need to be
programmed with a PCM format matching the bandwidth of the BPT/BRA
transfer. The format is based on 192kHz 32-bit samples, and the number
of channels varies to adjust the bandwidth. The notion of channel is
completely notional since the data is not typical audio
PCM. Programming such channels helps reserve enough bandwidth and adjust
FIFO sizes to avoid xruns.

Alignment requirements are currently not enforced at the core level
but at the platform-level, e.g. for Intel the data sizes must be
multiples of 32 bytes.
Loading
Loading