Skip to content
Kal Sze edited this page Jun 13, 2017 · 4 revisions

I try to look at past Linux USB issues and gather their root causes and resolutions here in case I forget something.

xHCI xhci_drop_endpoint called with disabled ep

Happens during libusb_set_configuration() and libusb_set_interface_alt_setting(). This warning seems useless.

Resolution: This warning was suppressed in kernel 4.0 https://github.com/torvalds/linux/commit/a6134136d938ed9298f15e865e4a035f9c0eeb9c. Kernel 4.0+ should no longer see this.

WARN Event TRB for slot x ep 2 with no TDs queued?

Four times. This happens during libusb bulk transfers which query P0Tables and other camera parameters. Bulk transfers seem to generate this warnings if they have larger buffers but receive less data. This warning is harmless.

Resolution: probably fixed in https://github.com/torvalds/linux/commit/e210c422b6fdd2dc123bedc588f399aefd8bf9de (4.3-rc7, or 4.3+, may be backported).

ERROR Transfer event TRB DMA ptr not part of current TD

[  503.234842] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
...
[  509.238571] xhci_hcd 0000:03:00.0: xHCI host not responding to stop endpoint command.
[  509.238580] xhci_hcd 0000:03:00.0: Assuming host is dying, halting host.
[  509.238799] usb 9-2.1: usbfs: usb_submit_urb returned -22
[  509.238886] usb 9-2.1: usbfs: usb_submit_urb returned -22
[  509.238941] usb 9-2.1: usbfs: usb_submit_urb returned -22
[  509.238995] usb 9-2.1: usbfs: usb_submit_urb returned -22
[  509.239049] xhci_hcd 0000:03:00.0: HC died; cleaning up

Asmedia controllers seem to have this behavior. Right before the first error,

				/* Some host controllers give a spurious
				 * successful event after a short transfer.
				 * Ignore it.
				 */
				if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
						ep_ring->last_td_was_short) {
					ep_ring->last_td_was_short = false;
					ret = 0;
					goto cleanup;
				}
				/* HC is busted, give up! */
				xhci_err(xhci,
					"ERROR Transfer event TRB DMA ptr not "

Maybe it is because the controller should but does not have XHCI_SPURIOUS_SUCCESS quirk.

This can be tested by finding out HCI version dmesg | grep 'hci version' and boot the kernel with parameter xhci_hcd.quirks=0x10 to manually add this quirk.

Not enough bandwidth for new device state.

The bandwidth reservation is determined almost entirely by the controller's firmware with configure endpoint command. According to a software bandwidth checking formula used in Linux kernel (xhci_get_ss_bw_consumed(): (SS_OVERHEAD_BURST+mult*num_packets*(SS_OVERHEAD+mps))*SS_BLOCK/125e-6*8/1e9), Kinect2's isochronous endpoint will reserve 2.47 Gbps bandwidth. The RGB endpoint does not reserve bandwidth, but still effectively consumes 0.25 Gbps. Note that these two endpoints transfer in burst and stay idle most of the time, requiring extra margin of bandwidth. And there is other bandwidth reservation. So 3 Gbps should be a reasonable minimum estimate. This may be useful to determine multi-Kinect configuration.

PCIe v1.x single lane bandwidth is 2 Gbps, not enough for the isochronous endpoint. PCIe v2.x single lane is 4 Gbps, better. PCIe v1.x multi-lane might also work.

MaxIsoPacketSize being 33792 (0x8400)

Though this is obtained from wBytesPerInterval, it is alternatively determined by (Mult+1)*(bMaxBurst+1)*wMaxPacketSize. bMaxBurst is 10 on lsusb output, thus making it short of 49152 which is the current maximum allowed value for wBytesPerInterval in Linux kernel. I don't know where the value of bMaxBurst originates from, but it can't be set by the kernel. If it's set by the Kinect firmware, this value is probably cross platform.

Other USB limits

libusb: submit_iso_transfer(transfer):

  • number of URBs = 1 + (num_iso_packets * packet length)/6MB

kernel usb/core/devio.c:

USBDEVFS_URB_TYPE_ISO
if number_of_packets > 128, return -EINVAL
/*
 * arbitrary limit need for USB 3.0
 * bMaxBurst (0~15 allowed, 1~16 packets)
 * bmAttributes (bit 1:0, mult 0~2, 1~3 packets)
 * sizemax: 1024 * 16 * 3 = 49152
 */
if isopkt[u].length > 49152, return -EINVAL

if totlen >= USBFS_XFER_MAX, return -EINVAL
if usbfs_increase_memory_usage(totlen + structs), return -ENOMEM

AppleUSBXHCI::UIMCreateIsochTransfer:

if ( (command->GetNumFrames() == 0) || (command->GetNumFrames() > 1000) ), "bad frameCount"

Design of alternative transfer pool.

I am looking at if using less transfers and more packets per transfer is a good idea for Linux. One problem with Linux kernel is that kmalloc of more than 2MB is discouraged, so transfers with buffer size of more than 2MB probably are not a good idea. MacOSX's kernel may or may not have the same issue.

The transfer timing pattern looks like this (a: a 33792-byte packet; b: a 28312-byte packet; .: an empty packet; each aaaaaaaab is a 298648-byte depth "sub-layer")

aaaaaaaab.aaaaaaaab...aaaaaaaab.aaaaaaaab...aaaaaaaab.aaaaaaaab..aaaaaaaab..aaaaaaaab...aaaaaaaab................................aaaaaaaab.................................................................................................................................
aaaaaaaab.aaaaaaaab..aaaaaaaab..aaaaaaaab..aaaaaaaab..aaaaaaaab..aaaaaaaab.aaaaaaaab...aaaaaaaab................................aaaaaaaab.................................................................................................................................

The period of such transfers is 266 or 267 packets. Each packet takes a service interval of 125us to transmit. Therefore each period takes 266.6 * 125us = 33.33375ms, which is exactly the Kinect 2 frame rate 30Hz.

The 10-th "sub-layer" is not actually used for depth computation, so the first 9 "sub-layers" takes less than 100 packets to transmit. After the first 9 sub-layers, the rest of the period does not have any actual data that requires an actual buffer to take. Perhaps we can get away with not submitting a transfer at all and wait until the next period begins and then submit another transfer. There is no getting away with not submitting transfers when idle.

 * Device drivers must explicitly request that repetition, by ensuring that
 * some URB is always on the endpoint's queue (except possibly for short
 * periods during completion callbacks).  When there is no longer an urb
 * queued, the endpoint's bandwidth reservation is canceled.

So maybe in the future, a stream "non-parser" will just take a libusb buffer, check its lengths, and send that to depth processors without doing any copying based parsing. This will also reduce the number of transfers required.

Zerocopy design

Before:

  • (kernel) DMA to kernel memory
  • (kernel) copy_to_user() user buffer
  • (stream parser) memcpy() assembles small pieces to a whole frame buffer
  • (opencl/cuda) memcpy() to a page locked memory (already removed)
  • (opencl/cuda) DMA to GPU memory
  • (opencl/cuda) DMA result to CPU memory

Use usb zerocopy:

  • (kernel) DMA to user memory
  • (stream parser) memcpy() assembles small pieces to a page-locked frame
  • (opencl/cuda) DMA to GPU memory
  • (opencl/cuda) DMA result to CPU memory

Use non-stream parser:

  • (kernel) DMA to kernel memory
  • (kernel) copy_to_user() page-locked user buffer
  • (stream parser) length and signature validation only, no memcpy()
  • (opencl/cuda) DMA to GPU memory
  • (opencl/cuda) DMA result to CPU memory

Use usb zerocopy and non-stream parser:

  • (kernel) DMA to user memory
  • (stream parser) length and signature validation only, no memcpy()
  • (opencl/cuda) DMA to GPU memory
  • (opencl/cuda) DMA result to CPU memory

USB audio

On kernel 4.2 (maybe earlier) Kinect is automatically recognized for a USB audio device.

It seems its parameters are 16000Hz 4 channels S32_LE format. arecord -L can see various PCms exported by the device. To test it

arecord -t wav -r 16000 -c 4 -f S32_LE -D hw:CARD=Sensor test.wav
# make some sound
aplay test.wav