-
Notifications
You must be signed in to change notification settings - Fork 537
tRAS of 40 may be too low for newer Carambola 2 devices #207
Comments
Hi @codehero,
Could you give more details here, what kind of issues you are having and how to reproduce them (if you already found out a way).
Have you tried changing it yourself, does higher value solve your issues? Cheers, |
Hi Piotr, I am confused by mantas-p's comment: "Is not the DDR speed 400 MHz??" As you are taking RAM timings in nanoseconds, are you calculating timing values using 400 MHz or 200 MHz? |
Hi @codehero,
The internal clock for the DRAM controller runs at 400 MHz due to the nature of the way how DDR memories work (Double Data Rate comes from the fact that with 200 MHz clock you get theoretical 400 Mbps throughput per pin, reading data twice on every clock tick). So, the internal DRAM controller clock is in fact 400 MHz (that's the PLL value @mantas-p mentioned above and the value kernel shows as DDR clock) but the external one, observed on CLK line (DDR_CK_P/N) with scope is 200 MHz.
There are two things about setting tRAS and other timing values in QCA datasheets (first part comes from AR934x):
The default tRAS value in AR9331 DDR_CONFIG register is 0x10/16. If you assume that value is rounded up in external clock cycles as @mantas-p wrote (200 MHz/5 ns) then you will get tRAS = 80 ns. But... in my code I assume that these values are rounded up in DRAM controller clock cycles (in this case 400 MHz/2.5 ns). There are two main reasons for this assumption:
I just looked at Carambola 2 running my last image and tRAS value in register is set to 0x15/21 which, if I'm correct above, gives 52.5 ns or 105 ns if I'm not right. The reason for that is that for DDR2 I'm using higher minimum clock for calculations, see these lines: https://github.com/pepe2k/u-boot_mod/blob/master/u-boot/cpu/mips/ar7240/qca_dram.c#L917 and https://github.com/pepe2k/u-boot_mod/blob/master/u-boot/cpu/mips/ar7240/qca_dram.c#L517. With latest Caraboot image, I see tRAS set to 0x10/16. Anyway, I don't think that tRAS value is the reason of your problems. I have some other ideas what might be wrong here but currently I lack free time to help you with that... maybe next week. Cheers, |
I would certainly be glad to hear what your ideas are. FWIW, changing 40 to 45 I have not noticed any issues after programming a few dozen. |
Hi @codehero
I suppose you mean here changing to 45ns on Caraboot, not in my code? As I wrote above, Carambola 2 with my code will have tRAS set to 52.5 ns. Cheers, |
Hi @codehero! We have seen similar issues on hardware we have in production which is using skylabs SKW72A. We actually started receiving batches of devices that switched from A3R12E40DBF-AH (Zentel) to W9751G6KB-25 (Winbond) DRAM - which coincidently is the same part you struggled with! We are still trying a few things here, like reset the DDR registers closer to the original AP121 reference settings (uboot) but since you are already way ahead of us with this, can you please let me know if you resolved this completely with the adjustment to tRAS? Interestingly the actual register value in the DDR_CONFIG register for tRAS was 21, originally in AP121 uboot it was 16, but it sounds like you would have adjusted this to 24 (if our calculations are correct). Can you please confirm? I am also running the same stress test here on a few devices as this problem has been elusive, so far only devices running WinBond and the MTBF has been on some devices 4-5 days of usage in the field. Thanks! David. |
Hi @codehero, An update, we have continued with our testing here, so far I can state the following: -
Just curious if you were able to get to the bottom of this in the end? Regards, David. |
Hi @thornley-touchstar ,
a bug report on LEDE's bug system was logged, but I did not get anywhere other than it possibly being a DRAM issue. I've also gone back and forth between the DRAM timings from Arduino Yun's bootloader, pep2k's standard as well as messing with tRAS and CAS timings with no change in behavior ( other than no boot if they are mis-configured ). - carambola2 setup is pretty much identical to the Arduino Yun, Do you know if memtest does large buffer flushes or large DMA burst writes like I suspect the network driver does when it shuts down or creates network buffers - those RAM interactions might be more sensitive to RAM timings than regular single page reads and writes. Do you have inline Resistors on your DRAM lines ( like Arduino Yun to attenuate signal reflections ) or are they directly connected and only length matched ( like ours are ) ? Arduino Yun uses the same Winbond memory but has series resistors on the DQ , clk and addr lines.. |
Hi @DanielRIOT, We tried the alt memory test in pepe2k's uboot and couldn't reproduce the problem. Additionally we tried memtester (mentioned in 8devices/Caraboot#5) and also using stress (CPU) and iperf (WIFI) combination... with no luck. I imagine the latter would stress DMA but the memory tests themselves don't specificlly target DMA (to my knowledge). @pepe2k did suggest that we drop the memory bus bandwidth (from 200Mhz), but we haven't got around to that yet. It's interesting that you mentioned switching WIFI might be related, and my understanding is that the AR9331 originally had issues with USB stability related to this. It could be a process issue or imitations of the WSoc design itself, all speculation. I did manage to source an Arduino Yun and I can see a lot of additional resistors around the DRAM chip when compared to the SKW72A and Camabola2 which also use the WinBond part. Which device are you actually testing with? Also, do you know if the Arduino Yun specifically does not suffer from this problem? |
I've also tried to get mine to fail with memtester, but they ran flawlessly for a few days until I cycled the wifi a few times ( change channel, call "wifi" ) as a sanity check and they crashed again. I have not had the same issues on my modified Yuns, only on our layout We started on the Aruino Yun as a proof of concept and then created our own board - from microscope images and bits and pieces of AR9331 schematics and layouts around the internet (QCA is not very helpful with new product development in EMEA untill one gets the needed volumes). We use a different antenna arrangement as well as GPS, and a few other peripherals on the main PCB. I'll keep poking at the RAM configuration from u-boot ( it looks like Linux doesn't touch it after u-boot sets it up ), as well as add the 22R resistors on the DQ lines and do further tests ( on die termination is not enabled in this application so it could be the ringing in the lines that mess with data )- I have a spectrum analyser (40MHz BW, but LO moves around ) and a 100 MHz scope so I cannot easily capture a time series at 200 MHz with those fast rise times to "see" ringing. |
@DanielRIOT, today I tried switching channels with memtester running and iperf (pushing data over wifi) and could not reproduce what you experience (on a SKW72) I use the following method to randomly change channel: -
I am curious about one thing though, if you push 'memtester' too much so that it allocates all available memory. In my case, the 'want' is 40M but it shows 'got' as 24MB, and when this occurs it tends to naturally crash anyway as the kernel is exhausted of memory (I don't have any swap). Do you experience the same? |
@thornley-touchstar I also get an eventual crash when allocate large blocks with memtester ( 40M ish ) - but they all had an obvious "out of memory" type message. my channel change was a shell script from the openwrt bug page. its a strange crash and I only get it when Wifi channel or power is changed, and sometimes on boot ( when the wifi system comes up - I'm usually running in 5MHz wifi mode but even when i put it to normal 20MHz mode it acts the same) . I've disabled my batmesh and VLAN configurations and it still happens
most other people who looked at the logs say its a RAM issue, this week will be "scrape tracks under a microscope and add series resistors" week
|
@DanielRIOT understood, if you do have any success with adding the 22R resistors it would be great to hear if it resolves the issue :) |
@thornley-touchstar after adding inline resistors to the DQ and control lines ( following a TI and Micron app note ) it didn't seem to change my crash behavior... will keep digging |
I eventually reflowed a new RAM module ( micrel art IIRC )to the device - destroyed 1 board and the other had no change in behavior - bummer. I also found a few older bug reports that related to kernel crashes when networking stuff happens, ( https://dev.archive.openwrt.org/ticket/22283 and https://dev.archive.openwrt.org/ticket/22265.html ), so i removed 2 patches (020-backport_netfilter_rtcache and 150-bridge_allow_receiption_on_disabled_port ). - there was also a mention to remove these in the 19.07 pull requests The system was more stable and while trying to get a clearer picture of why the crash happened ( in the few times i got a kernel crashlog ) I enabled a few more debugging features :CONFIG_DEBUG=y the most notable one is CONFIG_KERNEL_PROVE_LOCKING=y , that seemed to make the crash on wireless start go away - but then the system would crash on reboot for which i found this bug report : seems the re-anabling of the interrupts right after writing to the reset register caused the reset bit to be cleared before the system could reset, so i placed a delay right after the register write and now reboot also works again - seems like a few race conditions are around :( but my system works "good enough" for now |
Sorry but this project is no longer maintained. |
I am having very intermittent stability issues using both Caraboot and pepe-2k.
Both u-boot versions assume a tRAS value of 40 ns
This was true when the Carambola2 module used the W9751G6JB25 DDR2 module.
However, I popped off the cap of a newer Carambola2 module and it now uses W9751G6KB25
The tRAS value specified is 45 ns for the KB25
Should the default safe value be at 45 ns???
See datasheets
Page 45 of
http://digichip.ru/datasheet/PDF/df799b2e552ae92d5acb3f8b9c437f77/68da5750c408c276e3bcd1df60096ddc/W9751G6JB25.pdf
Page 45 of
https://www.winbond.com/resource-files/da00-w9751g6kbg1.pdf
The text was updated successfully, but these errors were encountered: