Hello ![👋]()
As the subject line says, it appears that the RP2040 USB hardware lockup described in Erratum E15 can be triggered under more conditions than those listed in the erratum.
We have a custom data acquisition product whose latest revision has been upgraded to use RP2040s. The product has passed certifications, and was performing fine during development and initial field testing. However, during our field rollout we have now encountered two sites where we see the USB device controller hang to the point where the SOF_RD register is no longer updating and it requires a controller reset to make USB go again. The problem persists across units, but is currently only seen at these two field sites. Given the strong interference we're seeing in the signal data we're capturing from external devices, we're assuming similar interference is present on the USB bus and is the trigger of the lockups. The source of said interference is not known, but is presumed to be some manner of radio transmitter.
Our device comprises four RP2040s hanging off two daisy-chained Texas Instruments TUSB2036VFR FullSpeed hub chips with two RPs off each hub. The device is directly connected to an Advantech RSB-4411 host using a regular shielded USB A-B cable. The RSB-4411 is an i.MX6 based solution from Advantech which provides four USB 2.0 ports using Transaction Translators rather than a companion OHCI controller.
With some effort I have managed to reproduce these lockups in the lab by injecting interference directly on one or both of the USB data lines. Attached is a tiny program for a regular Pico, which when run generates interference spikes on IO0/1. To use it, I've first exposed the ground, D- and D+ lines on the USB cable to the device under test, then I've cut a couple of dupont cables in half and exposed the ends and clipped the ground one onto the USB ground sheath. Touching one of the IO0 or IO1 outputs to a USB data line for a handful of seconds is enough to trigger the USB device controller to lock up in the same manner observed in the field. For longer running tests of our workarounds I left IO1 clipped onto a data line (see attached photo).
In most (but not all) cases all four RP2040s lock up simultaneously, leading us to suspect that it's a corrupted SOF frame that is the triggering cause, but we have not attempted to verify this in any detail as our focus has been on working around the issue.
The attached interference generator has been tuned for aggressive lockups on this particular setup (spike generated every ~1.5ms). More frequent spikes tends to make the host unhappy enough that it kicks the device off. If testing with a different host, try playing around with the timing between the spikes to find the sweet spot where the host is happy but the RP2040 is not. The USB 3.0 controller in my workstation for example would not have a bar of this at all, so I would strongly suggest using only USB 2.0 ports for this. It's probably obvious, but the timing should be such that the spikes get spread across the frame window for the best chance of hitting an uncomfortable spot for the device controller.
So, this is my pretty-please-fix-the-USB-device-controller-state-machine-in-the-next-hardware-revision plea![🙂]()
If you need any more information or are having trouble reproducing on your end, feel free to reach out!
~J
As the subject line says, it appears that the RP2040 USB hardware lockup described in Erratum E15 can be triggered under more conditions than those listed in the erratum.
We have a custom data acquisition product whose latest revision has been upgraded to use RP2040s. The product has passed certifications, and was performing fine during development and initial field testing. However, during our field rollout we have now encountered two sites where we see the USB device controller hang to the point where the SOF_RD register is no longer updating and it requires a controller reset to make USB go again. The problem persists across units, but is currently only seen at these two field sites. Given the strong interference we're seeing in the signal data we're capturing from external devices, we're assuming similar interference is present on the USB bus and is the trigger of the lockups. The source of said interference is not known, but is presumed to be some manner of radio transmitter.
Our device comprises four RP2040s hanging off two daisy-chained Texas Instruments TUSB2036VFR FullSpeed hub chips with two RPs off each hub. The device is directly connected to an Advantech RSB-4411 host using a regular shielded USB A-B cable. The RSB-4411 is an i.MX6 based solution from Advantech which provides four USB 2.0 ports using Transaction Translators rather than a companion OHCI controller.
With some effort I have managed to reproduce these lockups in the lab by injecting interference directly on one or both of the USB data lines. Attached is a tiny program for a regular Pico, which when run generates interference spikes on IO0/1. To use it, I've first exposed the ground, D- and D+ lines on the USB cable to the device under test, then I've cut a couple of dupont cables in half and exposed the ends and clipped the ground one onto the USB ground sheath. Touching one of the IO0 or IO1 outputs to a USB data line for a handful of seconds is enough to trigger the USB device controller to lock up in the same manner observed in the field. For longer running tests of our workarounds I left IO1 clipped onto a data line (see attached photo).
In most (but not all) cases all four RP2040s lock up simultaneously, leading us to suspect that it's a corrupted SOF frame that is the triggering cause, but we have not attempted to verify this in any detail as our focus has been on working around the issue.
The attached interference generator has been tuned for aggressive lockups on this particular setup (spike generated every ~1.5ms). More frequent spikes tends to make the host unhappy enough that it kicks the device off. If testing with a different host, try playing around with the timing between the spikes to find the sweet spot where the host is happy but the RP2040 is not. The USB 3.0 controller in my workstation for example would not have a bar of this at all, so I would strongly suggest using only USB 2.0 ports for this. It's probably obvious, but the timing should be such that the spikes get spread across the frame window for the best chance of hitting an uncomfortable spot for the device controller.
So, this is my pretty-please-fix-the-USB-device-controller-state-machine-in-the-next-hardware-revision plea
If you need any more information or are having trouble reproducing on your end, feel free to reach out!
~J
Statistics: Posted by johny — Mon Jul 22, 2024 6:46 am — Replies 0 — Views 40