Storing Image Data in Block RAM on a Xilinx FPGA

Now that we have a VGA synchronization circuit we can move on to designing a pixel generation circuit that specifies unique RGB data for certain pixels (i.e. an image). Before we actually go there, I thought I would separately talk a little bit about how to store image data on an FPGA. This discussion will focus mainly with using a Xilinx FPGA, more specifically the Basys 3, which uses 12-bit color.

Raw images are arrays of pixel data. Each pixel has a number of bits that specifies the intensity of the red, green, and blue color components. Assuming that an image is stored in 24-bit “True Color”, there are 8 bits specifying each respective color component. Since we are using the Basys 3 FPGA, we will need only the upper four bits of each color component (3 colors * 4 bits/color = 12 bits). So we need 12 bits per pixel to represent color, and y*x pixels, where y is the image height, and y is the image width.

The image of Yoshi shown above is scaled up such that we can easily see each pixel as a block of color on our screen. This image has a height of 32 pixels, and a width of25 pixels. That means that in total we have 800 pixels, each needing 12 bits to represent color, so 9600 bits altogether. There are a few ways to store all of these bits.

The first option is to use an external RAM chip, which is the best option in terms of having a lot of memory to work with. This option requires designing a memory controller in HDL, which is not a trivial task. Perhaps another time.

The second option is to use an FPGA’s distributed RAM. The FPGA logic cells have look up tables (LUTs) that can be configured as memory. For the Artix-7 FPGA used on the Basys 3 there are 5200 slices with 4 LUTs per slice, with each LUT being able to act as 64-bits of synchronous RAM. While this gives us 1331 kb of RAM in total to work with, it comes at the cost of using the logic cells we need to implement our logic circuits. Because of this trade off, distributed RAM is generally used in small amounts for smaller memories.

The third option is to use Block RAM, or BRAM, which are dedicated memory modules embedded in the FPGA. The Basys 3 has 100 BRAM modules with 18 kb each, for a total of 1800 kb. BRAM can be configured as single, or dual port RAM, a ROM, or even a FIFO. To store our image data we will use a ROM, or a read-only-memory.

One way to instantiate a ROM using BRAM is to use the Xilinx LogiCORE Block Memory Generator, which comes with ISE or Vivado. This tool can instantiate Xilinx IP memory modules which are device specific and can be initialized with image data using a coefficients or .coe file.

Instead, we will use the Xilinx language templates to infer a ROM using BRAM. While this method is still Xilinx specific, it is semi device agnostic. The ROM we infer in HDL will work for both the Basys 2 and Basys 3, among others. The XST User Guide documents in detail the Verilog and VHDL language templates that can infer many different types of logic components.

Here is a general outline for how to infer a synchronous ROM with 3 address bits that store bytes:

module bram_rom
	(
		input wire clk,
		input wire [2:0] address
		output reg [7:0] data
	);

	(* rom_style = "block" *)

	//signal declaration
	reg [2:0] address_reg;

	always @(posedge clk)
		address_reg <= address;

	always @*
	case (address)
		3'b000: data = 8'b00000001;
		.
		.
		.
		default: data = 8'b11111111;
	endcase
endmodule

The module is clocked, and has an input port for the address to read from, and an output port for the data at that address. The ROM is synchronous, so the address register is updated on each clock cycle, which also updates the output data synchronously. The output data is routed by a case statement in an always block that maps address values to output values. The default value is necessary to include if not every address has a corresponding data output, as this would infer a latch.

With this basic template we can expand it to infer a synchronous ROM for our 32 x 25 image of Yoshi. While an image can be modeled as an array, the actual inferred RAM will be flat (i.e addresses from 0 to 799), and so we will have to combine the y, x (row, column) pixel addresses into a single address for the case statement. The minimum width of the column and row addresses are 5 bits each (log₂32 = 5), so we will have inputs for these, and use a concatenation of the column and row addresses for the case statement. We will also need to widen the output port to 12 bits, because we are storing 12 bits for each pixel.

module yoshi_rom
	(
		input wire clk,
		input wire [4:0] row,
		input wire [4:0] col,
		output reg [11:0] color_data
	);

	(* rom_style = "block" *)

	//signal declarations
	reg [4:0] row_reg;
	reg [4:0] col_reg;

	always @(posedge clk)
		begin
		row_reg <= row;
		col_reg <= col;
		end

always @*
    case ({row_reg, col_reg})
        10'b0000000000: color_data = 12'b011011001100;
        10'b0000000001: color_data = 12'b011011001100;
        10'b0000000010: color_data = 12'b011011001100;
        10'b0000000011: color_data = 12'b011011001100;
        10'b0000000100: color_data = 12'b011011001100;
        10'b0000000101: color_data = 12'b011011001100;
        10'b0000000110: color_data = 12'b011011001100;
        10'b0000000111: color_data = 12'b011011001100;
        10'b0000001000: color_data = 12'b011011001100;
        10'b0000001001: color_data = 12'b000000000000;
        10'b0000001010: color_data = 12'b000000000000;
        10'b0000001011: color_data = 12'b011011001100;
        10'b0000001100: color_data = 12'b011011001100;
        10'b0000001101: color_data = 12'b011011001100;
        10'b0000001110: color_data = 12'b011011001100;
        10'b0000001111: color_data = 12'b011011001100;
        10'b0000010000: color_data = 12'b011011001100;
		.
		.
		.
		
		default: color_data = 12'b000000000000;
	endcase
endmodule

Above is the modified code for the Yoshi ROM, with most of the cases omitted for brevity. Clearly it would be a good idea to automate the creation of the Verilog HDL for a ROM, given an image. Below is a Python program I wrote that does this.

# converts image to Verilog HDL that infers a ROM using Xilinx Block RAM
# note: 12-bit color map word is r3, r2, r1, r0, g3, g2, g1, g0, b3, b2, b1, b0

from scipy import misc
import math

# returns string of 12-bit color at row x, column y of image
def get_color_bits(im, y, x):
    # convert color components to byte string and slice needed upper bits
    b  = format(im[y][x][0], 'b').zfill(8)
    rx = b[0:4]
    b  = format(im[y][x][1], 'b').zfill(8)
    gx = b[0:4]
    b  = format(im[y][x][2], 'b').zfill(8)
    bx = b[0:4]

    # return concatination of RGB bits
    return str(rx+gx+bx)

# write to file Verilog HDL
# takes name of file, image array,
# pixel coordinates of background color to mask as 0
def rom_12_bit(name, im, mask=False, rem_x=-1, rem_y=-1):

    # get colorbyte of background color
    # if coordinates left at default, map all data without masking
    if rem_x == -1 or rem_y == -1:
        a = "000000000000"
        
    # else set mask compare byte
    else:
        a = get_color_bits(im, rem_x, rem_y)

    # make output filename from input
    file_name = name.split('.')[0] + "_12_bit_rom.v"

    # open file
    f = open(file_name, 'w')

    # get image dimensions
    y_max, x_max, z = im.shape

    # get width of row and column case words
    row_width = math.ceil(math.log(y_max-1,2))
    col_width = math.ceil(math.log(x_max-1,2))

    # write beginning part of module up to case statements
    f.write("module " + name.split('.')[0] + "_rom\n\t(\n\t\t")
    f.write("input wire clk,\n\t\tinput wire [" + str(row_width-1) + ":0] row,\n\t\t")
    f.write("input wire [" + str(col_width-1) + ":0] col,\n\t\t")
    f.write("output reg [11:0] color_data\n\t);\n\n\t")
    f.write("(* rom_style = \"block\" *)\n\n\t//signal declaration\n\t")
    f.write("reg [" + str(row_width-1) + ":0] row_reg;\n\t")
    f.write("reg [" + str(col_width-1) + ":0] col_reg;\n\n\t")
    f.write("always @(posedge clk)\n\t\tbegin\n\t\trow_reg <= row;\n\t\tcol_reg <= col;\n\t\tend\n\n\t")
    f.write("always @*\n\tcase ({row_reg, col_reg})\n")
    

    # loops through y rows and x columns
    for y in range(y_max):
        for x in range(x_max):
            # write : color_data = 
            case = format(y, 'b').zfill(row_width) + format(x, 'b').zfill(col_width)
            f.write("\t\t" + str(row_width + col_width) + "'b" + case + ": color_data = " + str(12) + "'b")

            # if mask is set to false, just write color data
            if(mask == False):
                f.write(get_color_bits(im, y, x))
                f.write(";\n")

            elif(get_color_bits(im, y, x) != a):
                # write color bits to file
                f.write(get_color_bits(im, y, x))
                f.write(";\n")
                
            else:
                f.write("000000000000;\n")
                
        f.write("\n")
        
    # write end of module
    f.write("\t\tdefault: color_data = 12'b000000000000;\n\tendcase\nendmodule")

    # close file
    f.close()    

# driver function
def generate(name, rem_x=-1, rem_y=-1):
    im = misc.imread(name, mode = 'RGB')
    print("width: " + str(im.shape[1]) + ", height: " + str(im.shape[0]))
    rom_12_bit(name, im)

# generate rom from full bitmap image
generate("yoshi.bmp")

This Python code is written for Python 3, and needs to have numpy, scipy, and PIL installed. The generate function is called at the bottom, with the name of the image file to read in. The generate function has two optional arguments rem_x, and rem_y, which are passed on to another function. Inside the generate call, the image is read, and immediately the width and height are printed out for our reference.

Next rom_12_bit() is called, which creates the Verilog file. This function has three optional arguments, mask, rem_x, and rem_y. If mask is set True, instances of the color data at location rem_x, rem_y will be removed from the output ROM and replaced with 0. If the defaults are left alone, the ROM will contain the exact image data. Later when displaying the image in HDL we can choose the background cyan color to instead be replaced with a background color of our choice. Because of this, it is important to make sure that the background color in the image we use doesn’t appear in the actual sprite of Yoshi.

The synthesis report from Vivado shows that the Verilog HDL inferred a 1024 x 12 ROM using block RAM. Note that the actual image data is 800 x 12, with 800 being between 2⁹ = 512 and 2¹⁰ = 1024, which is why 1024 was used for that dimension of the ROM.

The utilization report also shows that one 18kb block RAM module was used for our image data.

To see a complete FPGA video game project that utilizes block RAM to store sprites, click here.

3 thoughts on “Storing Image Data in Block RAM on a Xilinx FPGA”

jm says:

October 23, 2019 at 12:57 am

If we have a grayscale image then how many bits to represent each pixel

LikeLike

David Marion says:

March 26, 2022 at 1:04 am

In the Python file, in the generate function, misc.imread() is now deprecated. Need to use imageio and change line to im = imageio.imread(name) Great script, thanks so much!

LikeLike

Matt says:

April 22, 2023 at 4:15 am

Hi Joey,

Thanks for the Blog. I am googling for the max distributed RAM size on Basys 3 and found your blog.
Your calculation based on the number of Slices is impressive.
But may be due to other constrains on the FGPA, I later found a XilinxFPGA doc
https://docs.xilinx.com/v/u/en-US/ds180_7Series_Overview
It states on Table 4. Page 4. that XC7A35T (Basys 3) has 400kb of distributed RAM.

LikeLike

Embedded Thoughts

A blog about Microcontroller and FPGA projects

Storing Image Data in Block RAM on a Xilinx FPGA

3 thoughts on “Storing Image Data in Block RAM on a Xilinx FPGA”

Leave a comment Cancel reply

Share this:

3 thoughts on “Storing Image Data in Block RAM on a Xilinx FPGA”

Leave a comment Cancel reply