Tips for using Xilinx Ultra RAMs

Started by weetabixharry 5 years ago2 replieslatest reply 5 years ago2387 views

What are the main considerations that I should be aware of if I want to use Ultra RAMs in my design?

I am familiar with the Block RAMs used in 5-, 6- and 7-series Xilinx devices. As far as I am aware, the BRAMs in Ultrascale and Ultrascale+ devices are similar to 7-series: 36k, true dual port, asynchronous, built-in FIFO logic.

However, I'm interested in what's different about URAMs. As far as I can see, they are 288k, true dual port, but synchronous and with no dedicated FIFO logic. (So 8x as deep, but similar r/w interfaces to a BRAM, albeit with a single clock?). It also seems like they cascade together efficiently across the whole device, so you can get extra deep (or wide?) RAMs with good timing performance without using much of the general FPGA fabric.

What are the other main differences?

Do the tools (Vivado) infer URAMs sensibly from general RTL code?

What other implementation considerations are there? (e.g. number of pipelining registers on read ports).

[ - ]
Reply by rajkeerthy18June 21, 2019

Ultraram is in ram only mode and no fifo mode available. Single clock (not an asynchronous memory) 2 ports but not a true dual port. Synthesis infers ultra ram. Fixed behaviour, Port A operation completes first. Programmable latency. Interesting block to design with not as flexible as block rams.

[ - ]
Reply by weetabixharryJune 21, 2019
2 ports but not a true dual port... Fixed behaviour, Port A operation completes first.

Ah yes, thanks for the correction. It seems like the URAM dual-port behaviour is slightly restricted compared to true dual-port BRAMs. In many use cases, I can see this "Port A then Port B" operation will not be a problem. However, I also notice in the Xilinx docs that "Each port can only execute either a write or read operation in one cycle. When executing a write operation, the read outputs are unchanged and hold the previous value." I can imagine some scenarios where this would be an interesting challenge to manage.

Programmable latency. 

Could you say something more about this? I can't see the relevant info in the Xilinx docs. Is read latency ≤ 2 clock cycles like with BRAM?

Interesting block to design with

Absolutely. I hope I get a chance to soon :)