High-speed FPGA to CPU Interface

From time to time one has the need to transfer lots of data between an FPGA and a CPU. Unfortunately, there are not that many ways to do that, if you want to get more than a couple MByte/s. There are a couple of approaches, people have been using in the past:

  • Memory bus interface
  • PCIe interface
  • An integrated FPGA+SoC chip
  • InterChip USB

The memory bus is probably the canonical solution for this problem. It has seen much use ever since FPGAs became available. Unfortunately, modern CPUs, especially SoCs, do not export a "normal" memory interface anymore, or if they do, it's shared with the DRAM and NAND flash interace, making it unavailable for all practical purposes. Another problem of this approach is the large pin count that is needed to achieve a decent throughput. But at least the approach is simple, easy to implement and also easy to deal with in software. The new serial memory interfaces (like HyperBus) might change that in the future, but currently, there are only memory modules supporting them and no SoCs

The PCIe interface is becoming more and more common, as it is a low pin count interface to extend the capabilities of SoCs. But it still requires an SoC that is "bigger" and thus more expensive than one might actually want to deal with. Additionally, PCIe is not trivial to get properly working, even if using an IP core. The control logic needed to handle the PCIe (or actually PCI in general) communication is quite complex. Additionally, the signals with their 2.5GHz signaling frequency are not easy to handle if you have to debug the bus.

Similar things can be said about InterChip USB. Although USB is available on almost all SoCs these days (and even micro-controllers), InterChip USB is rather rare. The FPGA emulates in this case an USB device, which gets connected directly to the SoC (or an USB hub, if needed) without going onto a connector. The advantage of a very low pin count (just two wires) is great, but in reality one would need something like UTMI or its cousin ULPI to connect an USB PHY to the FPGA. Still, ULPI is with just 12 signals (or 8 if using 4 bit signaling) quite lightweight. Here again, we have the drawback of having a relatively complex signaling protocol between the device and the host, due to the complexity of USB.

The combination of an FPGA and an ARM SoC on the same chip seems to be getting more popular these days, and both Xilinx and Altera have quite good offerings there. This offers also a very easy and very fast (both in terms of throughput and latency) interface between the FPGA and SoC part. But unfortunately, these chips are rather expensive, and the FPGA part is often smaller than what you would get for an similar dedicated FPGA chip.

I was today talking about this very problem with Marek Vasut. And after enumerating the obvious solutions above, we had the idea to emulate an Ethernet PHY on the FPGA. This means, the FPGA acts like an network port and transmits and receives Ethernet packets. An RGMII interface needs only 12 pins, runs at 125MHz (albeit DDR) and can transfer up to 100MByte/s of data. Sending and receiving is pretty simple: for sending, just build an Ethernet frame (possibly containing an IP packet and an UDP header) where most of the boilerplate data is just constant. For receiving, the FPGA needs to detect the beginning of the Ethernet frame, and extract the data (possibly validating that it's actual desired data from and not some runt application trying to configure the network interface. Beside that, there is no complex control protocol running or any state to be kept inside the FPGA. No need for big memory pages to keep static configuration to fulfill some protocol requirements. And what makes this so beautiful: SoCs with more than just one RGMII interface are quite common and the interface does not even need special drivers on the operating system, as long as the right packets can be produced and received (which all operating systems I know off support). And if one is a bit mischievous one could think of ways how to make the FPGA pass all packets that are not meant for it to a real Ethernet PHY and thus not occupy any resources on the SoC at all.