Storing Image Data in Block RAM on a Xilinx FPGA

Now that we have a VGA synchronization circuit we can move on to designing a pixel generation circuit that specifies unique RGB data for certain pixels (i.e. an image). Before we actually go there, I thought I would separately talk a little bit about how to store image data on an FPGA. This discussion will focus mainly with using a Xilinx FPGA, more specifically the Basys 3, which uses 12-bit color.

Raw images are arrays of pixel data. Each pixel has a number of bits that specifies the intensity of the red, green, and blue color components. Assuming that an image is stored in 24-bit “True Color”, there are 8 bits specifying each respective color component. Since we are using the Basys 3 FPGA, we will need only the upper four bits of each color component (3 colors * 4 bits/color = 12 bits). So we need 12 bits per pixel to represent color, and y*x pixels, where y is the image height, and y is the image width.

The image of Yoshi shown above is scaled up such that we can easily see each pixel as a block of color on our screen. This image has a height of 32 pixels, and a width of25 pixels. That means that in total we have 800 pixels, each needing 12 bits to represent color, so 9600 bits altogether. There are a few ways to store all of these bits.

The first option is to use an external RAM chip, which is the best option in terms of having a lot of memory to work with. This option requires designing a memory controller in HDL, which is not a trivial task. Perhaps another time.

The second option is to use an FPGA’s distributed RAM. The FPGA logic cells have look up tables (LUTs) that can be configured as memory. For the Artix-7 FPGA used on the Basys 3 there are 5200 slices with 4 LUTs per slice, with each LUT being able to act as 64-bits of synchronous RAM. While this gives us 1331 kb of RAM in total to work with, it comes at the cost of using the logic cells we need to implement our logic circuits. Because of this trade off, distributed RAM is generally used in small amounts for smaller memories.

The third option is to use Block RAM, or BRAM, which are dedicated memory modules embedded in the FPGA. The Basys 3 has 100 BRAM modules with 18 kb each, for a total of 1800 kb. BRAM can be configured as single, or dual port RAM, a ROM, or even a FIFO. To store our image data we will use a ROM, or a read-only-memory.

One way to instantiate a ROM using BRAM is to use the Xilinx LogiCORE Block Memory Generator, which comes with ISE or Vivado. This tool can instantiate Xilinx IP memory modules which are device specific and can be initialized with image data using a coefficients or .coe file.

Instead, we will use the Xilinx language templates to infer a ROM using BRAM. While this method is still Xilinx specific, it is semi device agnostic. The ROM we infer in HDL will work for both the Basys 2 and Basys 3, among others. The XST User Guide documents in detail the Verilog and VHDL language templates that can infer many different types of logic components.

Here is a general outline for how to infer a synchronous ROM with 3 address bits that store bytes:

module bram_rom
		input wire clk,
		input wire [2:0] address
		output reg [7:0] data

	(* rom_style = "block" *)

	//signal declaration
	reg [2:0] address_reg;

	always @(posedge clk)
		address_reg <= address;

	always @*
	case (address)
		3'b000: data = 8'b00000001;
		default: data = 8'b11111111;


The module is clocked, and has an input port for the address to read from, and an output port for the data at that address. The ROM is synchronous, so the address register is updated on each clock cycle, which also updates the output data synchronously. The output data is routed by a case statement in an always block that maps address values to output values. The default value is necessary to include if not every address has a corresponding data output, as this would infer a latch.

With this basic template we can expand it to infer a synchronous ROM for our 32 x 25 image of Yoshi. While an image can be modeled as an array, the actual inferred RAM will be flat (i.e addresses from 0 to 799), and so we will have to combine the y, x (row, column) pixel addresses into a single address for the case statement. The minimum width of the column and row addresses are 5 bits each (log232 = 5), so we will have inputs for these, and use a concatenation of the column and row addresses for the case statement. We will also need to widen the output port to 12 bits, because we are storing 12 bits for each pixel.


module yoshi_rom
		input wire clk,
		input wire [4:0] row,
		input wire [4:0] col,
		output reg [11:0] color_data

	(* rom_style = "block" *)

	//signal declarations
	reg [4:0] row_reg;
	reg [4:0] col_reg;

	always @(posedge clk)
		row_reg <= row;
		col_reg <= col;

always @*
    case ({row_reg, col_reg})
        10'b0000000000: color_data = 12'b011011001100;
        10'b0000000001: color_data = 12'b011011001100;
        10'b0000000010: color_data = 12'b011011001100;
        10'b0000000011: color_data = 12'b011011001100;
        10'b0000000100: color_data = 12'b011011001100;
        10'b0000000101: color_data = 12'b011011001100;
        10'b0000000110: color_data = 12'b011011001100;
        10'b0000000111: color_data = 12'b011011001100;
        10'b0000001000: color_data = 12'b011011001100;
        10'b0000001001: color_data = 12'b000000000000;
        10'b0000001010: color_data = 12'b000000000000;
        10'b0000001011: color_data = 12'b011011001100;
        10'b0000001100: color_data = 12'b011011001100;
        10'b0000001101: color_data = 12'b011011001100;
        10'b0000001110: color_data = 12'b011011001100;
        10'b0000001111: color_data = 12'b011011001100;
        10'b0000010000: color_data = 12'b011011001100;
		default: color_data = 12'b000000000000;


Above is the modified code for the Yoshi ROM, with most of the cases omitted for brevity. Clearly it would be a good idea to automate the creation of the Verilog HDL for a ROM, given an image. Below is a Python program I wrote that does this.

# converts image to Verilog HDL that infers a ROM using Xilinx Block RAM
# note: 12-bit color map word is r3, r2, r1, r0, g3, g2, g1, g0, b3, b2, b1, b0

from scipy import misc
import math

# returns string of 12-bit color at row x, column y of image
def get_color_bits(im, y, x):
    # convert color components to byte string and slice needed upper bits
    b  = format(im[y][x][0], 'b').zfill(8)
    rx = b[0:4]
    b  = format(im[y][x][1], 'b').zfill(8)
    gx = b[0:4]
    b  = format(im[y][x][2], 'b').zfill(8)
    bx = b[0:4]

    # return concatination of RGB bits
    return str(rx+gx+bx)

# write to file Verilog HDL
# takes name of file, image array,
# pixel coordinates of background color to mask as 0
def rom_12_bit(name, im, mask=False, rem_x=-1, rem_y=-1):

    # get colorbyte of background color
    # if coordinates left at default, map all data without masking
    if rem_x == -1 or rem_y == -1:
        a = "000000000000"
    # else set mask compare byte
        a = get_color_bits(im, rem_x, rem_y)

    # make output filename from input
    file_name = name.split('.')[0] + "_12_bit_rom.v"

    # open file
    f = open(file_name, 'w')

    # get image dimensions
    y_max, x_max, z = im.shape

    # get width of row and column case words
    row_width = math.ceil(math.log(y_max-1,2))
    col_width = math.ceil(math.log(x_max-1,2))

    # write beginning part of module up to case statements
    f.write("module " + name.split('.')[0] + "_rom\n\t(\n\t\t")
    f.write("input wire clk,\n\t\tinput wire [" + str(row_width-1) + ":0] row,\n\t\t")
    f.write("input wire [" + str(col_width-1) + ":0] col,\n\t\t")
    f.write("output reg [11:0] color_data\n\t);\n\n\t")
    f.write("(* rom_style = \"block\" *)\n\n\t//signal declaration\n\t")
    f.write("reg [" + str(row_width-1) + ":0] row_reg;\n\t")
    f.write("reg [" + str(col_width-1) + ":0] col_reg;\n\n\t")
    f.write("always @(posedge clk)\n\t\tbegin\n\t\trow_reg <= row;\n\t\tcol_reg <= col;\n\t\tend\n\n\t")
    f.write("always @*\n\tcase ({row_reg, col_reg})\n")

    # loops through y rows and x columns
    for y in range(y_max):
        for x in range(x_max):
            # write : color_data = 
            case = format(y, 'b').zfill(row_width) + format(x, 'b').zfill(col_width)
            f.write("\t\t" + str(row_width + col_width) + "'b" + case + ": color_data = " + str(12) + "'b")

            # if mask is set to false, just write color data
            if(mask == False):
                f.write(get_color_bits(im, y, x))

            elif(get_color_bits(im, y, x) != a):
                # write color bits to file
                f.write(get_color_bits(im, y, x))
    # write end of module
    f.write("\t\tdefault: color_data = 12'b000000000000;\n\tendcase\nendmodule")

    # close file

# driver function
def generate(name, rem_x=-1, rem_y=-1):
    im = misc.imread(name, mode = 'RGB')
    print("width: " + str(im.shape[1]) + ", height: " + str(im.shape[0]))
    rom_12_bit(name, im)

# generate rom from full bitmap image


This Python code is written for Python 3, and needs to have numpy, scipy, and PIL installed. The generate function is called at the bottom, with the name of the image file to read in. The generate function has two optional arguments rem_x, and rem_y, which are passed on to another function. Inside the generate call, the image is read, and immediately  the width and height are printed out for our reference.

Next rom_12_bit() is called, which creates the Verilog file. This function has three optional arguments, mask, rem_x, and rem_y. If mask is set True, instances of the color data at location rem_x, rem_y will be removed from the output ROM and replaced with 0. If the defaults are left alone, the ROM will contain the exact image data. Later when displaying the image in HDL we can choose the background cyan color to instead be replaced with a background color of our choice. Because of this, it is important to make sure that the background color in the image we use doesn’t appear in the actual sprite of Yoshi.

The synthesis report from Vivado shows that the Verilog HDL inferred a 1024 x 12 ROM using block RAM. Note that the actual image data is 800 x 12, with 800 being between 29 = 512 and 210 = 1024, which is why 1024 was used for that dimension of the ROM.

The utilization report also shows that one 18kb block RAM module was used for our image data.

To see a complete FPGA video game project that utilizes block RAM to store sprites, click here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s