-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XDMA QuickStart Guide Tutorial and/or Wiki #255
Comments
Thanks! :) Cool. Doesn't look like Xilinx/AMD is doing anything with the pull requests though :( |
These instructions look very nice; even a SW like like me could probably build a FPGA image with them.
They have not been doing that for as long as I can remember. This repo is more of a one way street and updated very infrequently, but it is their choice. After all this is called the 'reference' implementation of the XDMA driver and one is free not to use it of course if quality sucks for them (I don't know how the QDMA looks like I'm working with XDMA only). It seems that the way this (XDMA) driver is developed it is never ending up in the linux kernel upstream. I guess one of the reasons for them not being able to get it upstreamed is a custom DMA engine implementation (libxdma.c) and a too narrow use case (userspace char device interfaces only). What AMD is now trying to do is to only get the DMA engine parts into upstream kernel (ie. https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/dma/xilinx/xdma.c) as a dmaengine kernel module. This means that none of the char device interfaces that we see here today are being upstreamed. That driver is still not accepted into upstream but at least there is a desire to get it there. If one is interested on how it would be used there exists a video for linux driver that utilizes it (https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/tree/drivers/media/pci/mgb4/mgb4_core.c). I guess the idea is that XDMA dmaengine driver , once upstreamed , can be used with a variety of subsystems and not only expose char devices to userspace. I've tinkered with that dmaengine and retrofitted it into this XDMA driver as a quick haxx to see what parts get replaced and what it would take to use it. Essentially, almost all of https://github.com/Xilinx/dma_ip_drivers/blob/master/XDMA/linux-kernel/xdma/libxdma.c code can be considered obsolete when the new XDMA dmaengine driver is used. The other parts like interrupt handling code would be slightly different, but char interfaces code pretty much remain the same. If there are enough knowledgeable folks around here to develop a new char interface driver based on the XDMA dmaengine driver I'm wiling to participate in the efforts and testing on the hardware I have. |
That driver has been accepted upstream https://github.com/torvalds/linux/blob/master/drivers/dma/xilinx/xdma.c I had replaced this xilinx driver by a custom implementation a while ago for usage in our driver but I plan to rewrite it using the upstream driver in at some point. If nothing happens until then we'll probably release it publicly. |
Cool, did not notice that, thanks! |
Hi Matthew, I see that it provides a README.md file in the 'Markdown' plain-text file format. It looks best when pasting it right here, and then pressing 'Preview'. Perhaps you know a better way to make this file readable? Also, before we dive into fancy DDR4 memory controllers, I was able to change the number of 36K BRAM blocks to 248. Trying to allocate more, results in a complaints from Vivado. There should be 300 available. My guess is that 52 36K BRAM blocks are being used by the Opal Kelly firmware? But, when I change xdma_mm.sh line 13 (as indicated by Opal Kelly) to either:
it fails at 2). Any idea why this is happening? |
I'm not sure what you are trying to accomplish but I gave up on the dma_ip_drivers XDMA
There are cascade limits/issues/caveats with built-in memory and some of it is probably being used by other IP internal to the example design. Keep in mind the Block RAM is Distributed so parts of it get used throughout your design and they block the ideal cascading of all of it. About 50 BRAM blocks are required by the XDMA Block. Do not edit the Block Memory Generator. If you want to create a larger contiguous block of memory, change the Range in the Address Editor. dma_ip_drivers
You can also use
|
There is no Address Editor when opening the AXI4 Memory Mapped Default Example Design from a DMA/Bridge Subsystem for PCI Express added through the IP Catalog, because there is no Block Design associated with this example. The section does state that the BRAM size can be changed, so I can either try changing the settings in the IP Customizations PCIe: BARs tab, or I can try the Vivado IP Integrator-Based Example Design which starts out by opening a Block Design. |
The example designs are not easily altered. They exist to prove functionality. I strongly recommend you move on to an IP Integrator Block Diagram design. You can follow the XDMA Example Design notes or my tutorial. The sooner you are communicating between your host system and an AXI Block the sooner you can pursue your project. Notice all the test scripts were last edited over 2 years ago. If you are working with a custom board, it can be useful to delay your motherboard's BIOS Boot to allow for FPGA Configuration. It is difficult to meet the 100ms PCIe startup requirement. You can do this by pressing the POWER button, then pressing and holding the RESET button for a second before releasing it. Or, connect a 330uF-1000uF capacitor across the reset pins of an ATX motherboard's Front Panel Header: |
Well, this example design does not function very well with the scripts provided. Even with the BRAM set to its default size, the scripts are complaining about data integrity.
I'll take the detour, I'm going to rewrite them.
Actually I was still struggling with the fan. While the board is brand new, the firmware is controlled by a 20 years old closed source software library. |
Matthew,
That's exactly where to edit the BRAM size! I was at exactly the right spot!
It was the scripts that were not functioning as they should. Let me work out the details. One moment please. |
When you edit the Range (Memory Size) in the Address Editor and leave the Block Memory Generator on Auto that value (Address Width) will be propagated throughout the project. From the Block Memory Generator Product Guide (PG058) Pg#96: Width and depth parameters in BMG are calculated and What you are doing will work for your current goals but in a large project it may lead to some obscure bug.
It does not look like this project will accept any pull requests. The test scripts use
Create a file full of random data:
All-zeros file:
|
Thanks for the link, hadn't found this one yet. Figured it would happen like that.
But for the moment, I'm just trying to get the Default Example Design to work. It has no Address Editor, since it has no Block Design, and still, from the DMA/Bridge Subsystem for PCI Express Product Guide (PG195): The example design from the IP catalog has only 4 KB block RAM; you can regenerate the subsystem for larger block RAM size, if wanted. Et voilà, which the necessary modifications to the scripts: ===>./io.sh xdma0, channel 0:0, block size 262144, address 0, offset 0, data file: /tmp/xdma0_h2c0c2h0/datafile-256-K, integrity 1, dmesg 1. xdma0 channel 0:0: block size 262144, address 0, offset 0, data match. Bigger BRAM sizes will follow. The 4KB example works as well, with the modifications. Why do you think merge requests are being blocked and pull requests are being ignored? Isn't it of vital importance to AMD that at least the default examples are properly functioning? Maybe that's just it, only they aren't. I'll try to get their attention when I'm done. |
Matthew, I've tried a script like yours, and I noticed two things while testing different IP Customizations:
|
The XDMA IP Block uses Block RAM for internal buffers, variables, etc. That reduces the available pool of BRAM Blocks. When you generated your XDMA Example you likely chose PCIe x8 8.0GT/s which is the maximum the XCAU25P-FFVB676 you mentioned supports as it has 3 GTY Quads adjacent to the PCIE4 Block. That is roughly 8Gbytes/s of PCIe bandwidth. The XDMA Block attempts to match the AXI Bandwidth by setting the AXI Width to 256-bit and the AXI Clock Frequency to 250MHz, Refer to the Block Memory Generator Product Guide Pg#90, Block RAM Usage.
Please post a screenshot of your Design Run and Project Summary - Utilization. The Artix+ Datasheet mentions 300 BRAM Blocks and you should be able to count on using at least half. |
About my first item: Thanks for your elaborate answer, but:
When using the minimum area algorithm, it is not as easy to determine the exact block RAM These BRAM blocks are 36 kbit, instead of 36 kbyte :) About my second item: Using a Port A Width (PW) of 128, 64, or 32 bits, can you reproduce the malfunction?
|
In the XDMA Example project, Vivado will not let you directly connect two busses with different data widths as that implies data loss. You need an AXI Interconnect or AXI SmartConnect block between them. Here is a Block Diagram recreation of my understanding of the XDMA Example project: The AXI BRAM Controller is set up with the same data width as the XDMA The BRAM is mapped to address When I run Synthesis+Implementation Vivado uses 50 BRAM Blocks for XDMA and 128 for the Block Memory Generator. The Address Range must be an exponential of 2 so the next larger option is 1M: When I set the Range to 1M: Over 300 BRAM Blocks are required: Which causes Implementation to fail: By adding an AXI SmartConnect block I can connect two AXI BRAM Controllers: Mapping their address ranges consecutively allows for 768Kytes of memory which is accessible as a single block of memory by other AXI Blocks. The design now uses 242 BRAM Blocks: If I then add 3 more BRAM Controllers with consecutive addresses I can get For a total BRAM Block usage of 298. 248 of those Blocks are used by the memory array, When I Implement the design for a board I have, I am able to write and read the complete BRAM Address space:
PCIe has 128 to 4096 byte payload sizes. The default payload size that shows up ( |
Hi Matthew :) Once again, thank you for your elaborate response. We learn as we go.
This brought me onto something. There is another Data Width, the AXI Data Width, when re-customizing the DMA/Bridge Subsystem for PCI Express. I set the Maximum Link Speed to 2.5GT/s, thereby reducing the AXI Data Width to 64 bits. Et voilà, then it is 2) that starts working.
Couldn't find any 32 bits AXI Data Width for any of the Lane Widths and Maximum Link Speeds though. I'll be continuing to Section 4.2: Tandem Configuration now. Thank you for your support! I'll know to find you when I'm at a loss somewhere en route. |
From the XDMA Product Guide (PG195): Support for 64, 128, 256, 512-bit datapath. I took the 5 BRAM Controller project: Changed all the BRAM Controllers to use a 32-bit AXI Data Width (AXI SmartConnect performs clock domain crossing and data width translation): Implementation results in the same 298 total BRAM Blocks used: |
Hi @jberaud, It would be great if you publish a userspace bridge to the xdma upstream driver, I might be able to help where I can to allow people drop this buggy driver. What I am thinking is something similar to: int fd = open(); // open device
void *region = mmap(fd, ..., size); // pin memory
blob blob = {
.addr = p, // must be within region
.size = s,
};
write(fd, &blob, sizeof(blob)); // send blobs to fill for receive or using sendmsg
poll(fd); // support async - important
read(fd, &blobk, sizeof(blob)); // receive ready blobs When outgoing the blobs are filled with data and notification is when send completed, when incoming the blobs are sent as candidates filled before notification. Thanks, |
As @alonbl says, guidance on how to upgrade this driver to use the latest upstream DMA engine implementation, while maintaining a simple file descriptor interface, would be greatly appreciated. To be honest, I am surprised that there is no example code out there, yet? Or am I just missing something. |
WinDriver, driver development toolkit, offers both XDMA and QDMA code samples. You can download WinDriver from its official website and enjoy a 30-day free evaluation. If you have any questions, feel free to contact me at [email protected]. |
It would have been very useful when I started an XDMA-based project to have any kind of notes or a tutorial like that for QDMA. I had to dive into the driver code just to figure out basic usage. Now that the XDMA Driver is in the Linux Kernel please improve the documentation.
I would like to have been able to sit down at a system with Vivado and a PCIe-based card installed and get to a working XDMA-based design that I am confidently able to modify in about two hours.
I have made an attempt at such notes. The associated images take up 1.9MB in color or 925KB in grayscale. I can submit a pull request or help out with the wiki.
Github Settings:
Allows you to enable Wikis or disable Issues, if that is the intention.
The text was updated successfully, but these errors were encountered: