Multioutput Scaler Reference Design

This application note describes the Altera® Multioutput Scaler Reference Design. Scaling an input video stream to multiple output resolutions is common in many video conferencing and studio multiviewer products. Dedicating a full scaling engine, such as the Altera Scaler II MegaCore® function, to each output resolution can lead to inefficient solutions, because the design can share the video line buffers that each IP core require across all the scaling engines. Depending on the output resolutions, the design may time-division multiplex the algorithmic IP core of a single scaling engine to produce multiple output resolutions.

The Multioutput Scaler reference design demonstrates how to perform the following actions:

■ Combine IP cores from the video and image processing component library to create flexible scaling solutions.
■ Share line buffers across multiple algorithmic functions.
■ Use time-division multiplexed (TDM) algorithmic functions across multiple outputs.

Features

The reference design offers the following features:

■ One 3G-serial digital interface (3G-SDI) 1080p60 input.
■ One 3G-SDI 1080p60 output containing up to five output resolutions mixed over a test pattern base layer.
■ Three scaler algorithmic IP cores:
  ■ One with four horizontal and vertical taps for upscale only
  ■ One with 12 horizontal and vertical taps for upscale and up to three-times downscale
  ■ One with 16 horizontal and vertical taps for upscale and up to four-times downscale
■ System initialization and run-time configuration in software.
■ Rapid system capture and design with Qsys, the Quartus® II software, and the Nios® II development environments.
General Description

The multioutput scaler reference design performs the following actions:

- Takes a 1080p60 or 720p60 input over a 3G-SDI
- Scales the input to up to five output resolutions
- Mixes the five output resolutions over a 1080p test pattern base layer
- Outputs the result over a 3G-SDI

The Altera SDI IP core supports the 3G-SDIs in the FPGA. The reference design has the following five output resolutions:

- $1920 \times 1080$
- Pass-through of input
- Multimode—cycling between $1920 \times 1080$, $1280 \times 720$, $854 \times 480$ and 1/4 input resolution
- 1/2 input resolution
- 1/3 input resolution
You use push button 1 (PB1) on the Stratix IV GX FPGA development board to select the current resolution for the multimode output. PB0 allows you to enable and disable layers of the mixer to turn each output resolution on and off. Figure 1 shows an example output from the reference design with a 720p version of Lena as the input.

Figure 1. Reference Design Output

The reference design uses IP cores from the Video and Image Processing Suite and components from the video and image processing component library, which is a collection of components that you use to build video and image processing IP cores or reference designs. The component library is a collection of common video function building blocks that allows you to create more complex systems than the Video and Image Processing Suite offers. You cannot use component library components alone—you must also use a scheduler, for example, a CPU or state machine.

For more information about the Video and Image Processing Suite, refer to the Video and Image Processing Suite User Guide.

Performance and Resource Utilization

Table 1 lists the resource utilization on a Stratix IV GX device (S4GX230).

<table>
<thead>
<tr>
<th>Usage</th>
<th>ALUTs</th>
<th>Logic Registers</th>
<th>Logic Utilization</th>
<th>Total Blocks</th>
<th>DSP Block 18-Bit Elements</th>
</tr>
</thead>
<tbody>
<tr>
<td>On device</td>
<td>29,230</td>
<td>38,700</td>
<td>47,949</td>
<td>2,115,714</td>
<td>268</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>4,386,816</td>
<td>13</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>132</td>
<td></td>
</tr>
</tbody>
</table>
Table 1. Resource Usage (Part 2 of 2)

<table>
<thead>
<tr>
<th>Usage</th>
<th>ALUTs</th>
<th>Logic Registers</th>
<th>Logic Utilization</th>
<th>Total Blocks</th>
<th>M9K</th>
<th>M144K</th>
<th>DSP Block 18-Bit Elements</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total available on device</td>
<td>182,400</td>
<td>182,400</td>
<td>182,400</td>
<td>14,625,792</td>
<td>1,235</td>
<td>22</td>
<td>1,288</td>
</tr>
<tr>
<td>Percentage used on device</td>
<td>16%</td>
<td>21%</td>
<td>26%</td>
<td>14%</td>
<td>30%</td>
<td>22%</td>
<td>59%</td>
</tr>
</tbody>
</table>

Functional Description

Figure 2 on page 5 shows a block diagram of the reference design.
Figure 2. Block Diagram

- SDI Input
- Clocked Video Input
- Video Input Bridge
- Line Buffer
- Packet Switch
- Kernel Creator
- Scalar Alg
- Frame Reader
- Packet Writer
- Frame Reader
- Test Pattern Generator
- Frame Reader
- Frame Reader
- Frame Reader
- Frame Reader
- Packet Writer
- Packet Writer
- Packet Writer
- Packet Writer
- Packet Writer
- Packet Writer
- Nios II Message Interface Unit
- Qsys System Input or Output
  - Avalon-ST Video
  - Avalon-MM
  - Avalon-ST Message (command)
  - Avalon-ST Message (response)
  - Avalon-ST Message (data)
  - Video and Image Processing Suite IP Core
  - Video and Image Processing
  - Component Library Block
  - Other
- DDR3 SDRAM Controller
- DDR3 SDRAM
- Clocked Video Output
- Gamma Corrector
- Nios II Scheduler
- SDI Output
Table 2 describes the video input blocks. The video input takes a single SDI input, checks that it is in a supported format, and then sends it to the line buffer.

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clocked video input</td>
<td>Video and image processing suite</td>
<td>The clocked video input converts the output of the SDI IP core into Avalon-ST Video protocol.</td>
</tr>
<tr>
<td>Video input bridge</td>
<td>Video and image processing component library</td>
<td>The video input bridge alerts the scheduler to a new packet arriving on its Avalon-ST Video input and then sends it to the destination that the scheduler commands.</td>
</tr>
</tbody>
</table>

Table 3 describes the video pipeline blocks. The video pipeline takes the input video, produces the five output resolutions, and writes them to memory.

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Line buffer</td>
<td>Video and image processing component library</td>
<td>The line buffer uses on-chip memory to store multiple lines and then outputs them in parallel as one packet. The design may send each output packet simultaneously to multiple destinations through multiple outputs. This reference design configures the line buffer to store 16 lines of input video and has five outputs.</td>
</tr>
<tr>
<td>Scaler algorithmic</td>
<td>Video and image processing component library</td>
<td>The scaler algorithmic IP core upscales or downscales the input line by a specified factor.</td>
</tr>
<tr>
<td>Packet writer</td>
<td>Video and image processing component library</td>
<td>The packet writer writes the lines of output video frame into external memory.</td>
</tr>
</tbody>
</table>

Table 4 describes the video control blocks. The video control blocks direct the actions of the video pipeline blocks—they configure, start and stop the video and image processing suite IP cores, and schedule the actions of the component library cores. The components require much lower-level control as they only perform tasks, such as processing input packets, when they receive a command from the scheduler.

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Kernel creator</td>
<td>Video and image processing component library</td>
<td>The kernel creator is a hardware accelerator block that returns the input lines required to produce each output line. The scheduler uses this component to determine which input lines need to be stored in the line buffer.</td>
</tr>
<tr>
<td>Packet switch</td>
<td>Video and image processing component library</td>
<td>The packet switch routes messages to the end point specified in the destination address. This process allows the Nios II processor to send messages to any component in the reference design by altering the destination address.</td>
</tr>
</tbody>
</table>
Table 5 describes the video output blocks. The video output reads the five output resolutions from external memory, mixes the outputs over a test pattern base layer, and sends the resulting video to the SDI output.

The DDR3 SDRAM Controller with UniPHY and the multiport front end perform the following actions:

- Buffer the video to and from external DDR3 SDRAM.
- Arbitrate multiple packet writers and frame reader masters on the single slave interface of the DDR3 SDRAM Controller with UniPHY.

### Video Pipeline

The Multioutput Scaler Reference Design has a single video line buffer component that feeds three scaler algorithmic IP cores. This reference design allows you to scale a single input video stream to (up to) five output resolutions simultaneously—though one output resolution is a nonscaled direct feed through of the input. The design mixes the scaled video data to form a single 1080p60 output video stream, with each of the five output resolutions buffered on a per frame basis in offchip DDR3 SDRAM before mixing. The design mixes the output resolutions to form a single video stream.
for demonstrations although the design can output each resolution independently as separate streams. The frame buffering (double buffering) prevents the underflow or overflow issues that occur when mixing smaller and larger resolutions in a single output. Without mixing, you may omit the double buffering, with the clocked video input and clocked video output IP cores typically only requiring a single line of buffering.

Each scaler algorithmic IP core processes a set of input video lines, producing a single output line for each input set. For each scaler algorithmic IP core, you can upscale or downscale the output line, or set the output line to the same length as the lines in the input set. Each scaler algorithmic IP core is clocked at 150 MHz and can scale input data up to 1080p (the input and output frame rates are fixed at 60 fps). The scaler algorithmic IP cores run at up to 300 MHz on Stratix IV devices, supporting larger output resolutions or higher frame rates.

The number of horizontal and vertical taps that the polyphase algorithm uses is parameterizable and determines the maximum downscale factor for each scaler algorithmic IP core. Altera recommends that, for a Lanczos two-coefficient set, the number of taps should be four times the downscale factor. Two of the scaler algorithmic IP cores have 16 and 12 taps respectively (both horizontal and vertical), allowing downscale factors of four and three. The other scaler algorithmic IP core has 4 taps and is only recommended for upscale or passthrough. This design includes three scaler algorithmic IP cores with 4, 12, and 16 taps respectively, but you may modify the design to include more scaler algorithmic IP cores with varying numbers of taps. You can configure the video line buffer with up to 16 outputs, allowing up to 16 scaler algorithmic IP cores to share a single line buffer. You can configure each scaler algorithmic IP core to have up to 64 horizontal or vertical taps, allowing downscale factors of 16.

Output Resolutions

The Multioutput Scaler Reference Design time-division multiplexes the 12-tap scaler algorithmic IP core to produce the following output resolutions:

- 1/2 the input resolution
- 1/3 the input resolution

The design time-division multiplexes per line, with the scaler algorithmic IP core switching between output lines for the two resolutions as directed by top-level scheduling. This design shares the scaler algorithmic IP core across two output video streams. However, you can share one algorithmic IP core across three or more output resolutions, if the sum of the output lines is less than or equal to the number of input lines.

Altera configures the Multioutput Scaler Reference Design to detect 1080p60 or 720p60 inputs. When the design detects a 1080p60 or 720p60 input the design produces a 1080p60 output with up to five output resolutions mixed over a 1080p test pattern base layer. For any other input format, the design disables the output. To support further input formats, you can modify the control software, which runs on the Nios II processor.
The control software configures the 4-tap scaler algorithmic IP core to always produce a 1920 × 1080 output. The 12-tap scaler algorithmic IP core produces one output with 1/2 vertical and horizontal input resolution, and a second output with 1/3 vertical and horizontal input resolution. The control software allows you to switch the output resolution of the 16-tap scaler algorithmic IP core between the following four modes:

- 1920 × 1080
- 1280 × 720
- 854 × 480
- Horizontal and vertical input resolution

You can switch between output modes and the software also allows you to enable and disable mixer layers. Each mixer layer is one output resolution. At reset, the design enables all five output resolutions. You can disable a mixer layer until only the base test pattern remains or re-enable the layers in reverse order.

**Control Interfaces**

The reference design uses IP cores from the video and image processing suite and components from the video and image processing component library. The design uses the video and image processing IP cores on the output side of the design—reading data from external DDR3 SDRAM and mixing the five output resolutions to form a single output stream. Each video and image processing IP core is a stand-alone block that processes video data on a frame-by-frame basis.

The design embeds the main control logic within each IP core. After the Nios II scheduler configures and starts the IP cores through its Avalon-MM slave interfaces, they then continually process data with no intervention.

The components from the video and image processing component library process data line-by-line. The logic for the fine grain intra-line control is embedded within each component. However, an external scheduler must send commands across an Avalon-MM message interface to drive the line-by-line actions. The Nios II scheduler provides the line-by-line control for the component library IP cores. The Nios II scheduler executes a software schedule (plain text C code) that drives commands and receives responses through its Avalon-MM master interface. The message interface unit converts your commands from the Avalon-MM standard to Avalon-ST message interface and routes them through the packet switch to the correct component. The packet switch routes responses from the component blocks back to the message interface unit, which converts them to Avalon-MM signals for the Nios II scheduler to read.

**Nios II Scheduler**

The Nios II scheduler is a Nios II processor that runs the `main.cpp` file, which contains the software for the scheduler with comments to provide a full description. A number of C macros, in the `nios_miu.h` file use the message interface unit to allow the Nios II processor to send and receive messages. The macros translate to simple memory-mapped reads or writes and describe the application programming interface (API) to allow you to use the Nios II message interface unit.
For more details on the Nios II message interface unit, refer to the *Video and Image Processing Component Library Functional Description* (available from Altera).

A loop encloses the main body of the Nios II scheduler code. The loop executes once for each input line or each output line of the largest output resolution, whichever is larger. As the video input bridge starts each new video line, it sends a response to the Nios II scheduler to indicate that a new line is waiting. The Nios II scheduler uses speculative execution to create the commands one line in advance rather than waiting for each new input line to decide which commands it should generate and send. The scheduler pushes the commands that it must send when the next input line arrives into a FIFO buffer in the message interface unit. The message interface unit holds the commands until it receives the new line response. When the scheduler receives the expected new line response, it sends a further command to the message interface unit to send the FIFO contents. If the next response is unexpected the scheduler can instruct the message interface unit to discard the FIFO contents and not send commands. The speculative generation of commands minimizes the delay in processing the input video data, which fills the input FIFO buffer in the clocked video input IP core until the scheduler sends the commands.

For each new line response, the scheduler sends a command to the video input bridge instructing it to send the data to the line buffer. The scheduler commands the line buffer to shift its current stored lines through the kernel by one line and receive the new input line. If the new line completes the kernel required for the next output line of any scaler algorithmic IP core, the scheduler also commands the line buffer to send the new kernel through the appropriate outputs. The scheduler commands the appropriate scaler algorithmic IP cores to process the kernel with the correct scaling factor. The scheduler commands the appropriate packet writers to write the resulting line to memory. The design uses the kernel creator as a hardware accelerator to determine which input line should form the center of the kernel for each output line of each output resolution.

At the end of the loop iteration for the new input line response, the loop may run again if any scaler algorithmic IP core is upscaling and needs to reuse the same kernel for another output line. In this case, the scheduler sends no command to the video input bridge and the scheduler instructs the line buffer to send the existing kernel without receiving any new data.

**Line Buffer Details**

You can parameterize the number of outputs from the video line buffer, the kernel size (number of lines) for each output, and the offset for each output relative to the oldest line stored in the buffer. For a line buffer with a total of \( N \) lines of storage, the design defines line 0 as the oldest line in the buffer and line \( (N - 1) \) as the newest. The design shares the memory that stores the video data across all the outputs, with each output selecting the required lines from the memory read output. The design constrains output 0 to use line 0. You can set all other enabled outputs to any start line greater than or equal to 0. The design defines the total size of the line buffer to the maximum of the line plus kernel size across all outputs. In this reference design, output 0 feeds the 16-tap scaler algorithmic IP core and has a kernel size of 16 lines. Output 1 feeds the 4-tap scaler algorithmic IP core and has a kernel size of 4 lines. Output 2 feeds a packet writer to create the pass-through output and only requires a single line.
Outputs 3 and 4 feed the 12-tap scaler algorithmic IP core with a packet multiplexer component (configured to select an input on a first-come first-served basis) and requires 12 lines. Table 6 shows how the kernels for the outputs are offset.

Table 6. Output Line Buffers

<table>
<thead>
<tr>
<th>Output</th>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>x</td>
</tr>
<tr>
<td>2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 6 shows the design needs 16 lines of buffering to serve all of the outputs. The 'x' marked for each output denotes the center line for that output. For a scaler algorithmic IP core with $N$ vertical taps, an input kernel of $N$ lines is required. The input kernel comprises a center line, floor$(N - 1)/2$ lines above the center line in the input frame and floor$(N/2)$ lines below the center line in the input frame. The number of the line required as the center line by the scaler algorithmic IP core to produce output line $x$ is:

$$\text{floor}(x \times \text{in\_height} / \text{out\_height})$$

where:

- $\text{in\_height}$ and $\text{out\_height}$ are the number of lines in the input and output video frames respectively.

To keep the software scheduler for the system simple, the design aligns the kernels for all the outputs (except output 4) to have a center line of 7. The scheduler tracks a single center line value for the whole line buffer and keeps all the scaler algorithmic IP cores synchronized to the same input line.

Line buffer output 4 uses line 6 as its center line, offsetting it towards older lines by one line relative to all the other kernels. This offsetting allows the design to time-division multiplex the 12-tap scaler algorithmic IP core across the 1/2 and 1/3 input resolution outputs. However, the design does not necessarily use output 3 for the 1/2 resolution output and output 4 for the 1/3 resolution output. You can always use output 3 to generate the 1/2 resolution output and, you may use this output to generate the 1/3 resolution output to save any multiplexing. However, in some cases, the same center line from the input frame may be required for both output resolutions. You can send the kernel to the scaler algorithmic IP core through output 3 with the correct center line to generate the 1/2 resolution output; however, to meet the needs of the input video rate, the kernel may need to be shifted by one line every time you send a kernel. The design now increases the center line of the kernel on output 3 by one line and is no longer correct if you try to send the kernel through output 3 to generate the 1/3 resolution output. By including output 4 in the design, with an offset one line closer to line 0, the center line for output 4 is now correct and you can use this output to send the kernel. Where the 1/3 resolution output uses a center line that the design does not use for the 1/2 resolution output, you can still use output 3 to send the data to the scaler algorithmic IP core. Table 7 and Table 8 show how the design uses outputs 3
and 4 to switch between different input resolutions.

Table 7. Kernel Center Lines for 720p Input

<table>
<thead>
<tr>
<th>Loop Iteration</th>
<th>Center Line</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>1080p</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>6</td>
<td>4</td>
</tr>
<tr>
<td>7</td>
<td>4</td>
</tr>
<tr>
<td>8</td>
<td>5</td>
</tr>
<tr>
<td>9</td>
<td>6</td>
</tr>
<tr>
<td>10</td>
<td>6</td>
</tr>
</tbody>
</table>

Table 7 shows the following points:
- The center line that the 1080p upscale output requires on each pass through the loop in the scheduler code for a 720p input.
- The lines that the 1/2 and 1/3 resolutions require.
- The 1/2 and 1/3 resolutions require the same center line for every third input line.

However, the design uses each even number input line twice (that is, the design sends the kernel) by the 4-tap scaler algorithmic IP core to generate the upscaled 1080p output. Hence, you can send the kernel to the 12-tap scaler algorithmic IP core twice through output 3 when required, by delaying the generation of the 1/3 resolution output by one line and avoiding the need to use output 4. Table 7 shows in bold when the 1/3 resolution output uses the second send of the kernel.

Table 8. Kernel Center Lines for 1080p Input (Part 1 of 2)

<table>
<thead>
<tr>
<th>Loop Iteration</th>
<th>Center Line</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>1080p</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>7</td>
<td>7</td>
</tr>
<tr>
<td>8</td>
<td>8</td>
</tr>
</tbody>
</table>
Table 8 shows an example in which the design requires output 4 and uses the same scaler algorithmic IP core to generate the two outputs. The input is 1080p and so the design must shift the kernel by one line every time it is sent, to keep up with the input video rate. The 1/2 and 1/3 output resolutions still clash on their required center line on every sixth input line, but the center line for the kernel of output 3 (which is the same as the 1080p center line in Table 8) shifts by one before the design sends the output 3 kernel for a second time. In these cases (bold in Table 8), you must use output 4. The scheduler code controls the switching between outputs 3 and 4.

### Clocks

Table 9 lists the clocks and frequencies.

Table 9. Clocks and Frequencies

<table>
<thead>
<tr>
<th>Clock Domain</th>
<th>$f_{\text{MAX}}$ (MHz)</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>sdi_rx_clk[0]</td>
<td>148.5</td>
<td>The SDI input clock.</td>
</tr>
<tr>
<td>sdi_clk148</td>
<td>148.5</td>
<td>The SDI output clock.</td>
</tr>
<tr>
<td>vip_clk</td>
<td>148.5</td>
<td>The video processing pipelines clock.</td>
</tr>
<tr>
<td>uniphy_ddr3_afi_clk</td>
<td>200.0</td>
<td>The local interface of the memory controller clock.</td>
</tr>
<tr>
<td>DDR3 clock</td>
<td>400.0</td>
<td>The DDR3 SDRAM is clocked at 400 MHz.</td>
</tr>
</tbody>
</table>

### Getting Started

This section describes the following topics:

- Hardware and Software Requirements
- Downloading and Installing the Reference Design
- Generating the Qsys System
- Compiling the Software
- Compiling the Design
- Programming a Device

### Hardware and Software Requirements

The reference design requires the following hardware:

- Stratix IV GX FPGA development board
- Terasic Transceiver SDI High-Speed Mezzanine Card (HSMC) board
- A 1080p60 or 720p60 SDI video source
The reference design requires the following software:

- Quartus II software v11.1
- Nios II EDS v11.1

**Downloading and Installing the Reference Design**

To download and install the reference design, perform the following steps:

1. Request the reference design (.zip) files from the Multioutput Scalar Reference Design web page.
2. Extract the contents of the archive file to a directory on your computer. Do not use spaces in the directory path name.

*Figure 3* shows the reference design directory structure.

**Figure 3. Directory Structure**

```
<path>
  Installation directory.
  multioutput-<version>
    Contains the multioutput scalar reference design files.
    es
      Contains the precompiled .sof for engineering sample devices.
    s4gx_pcie
      Contains the Quartus II project.
      clock_gen
        Contains the PLLs for SDI clock generation.
      ip
        Contains beta versions of IP MegaCore functions.
      sdi_dprio_siv
        Contains the SDI transceiver reconfiguration controller.
      sdi_dual
        Contains the SDI MegaCore function configurations.
    software
      Contains the Nios II application project and Nios II C++ source code.
    top
      Contains the top-level design file (s4gx_pcie.v) and interface configuration file (config.v).
        make_project.bat
          Run this script on Windows to create the project.
        make_project.sh
          Run this script on Linux to create the project.
      Multi_Scaler.qsys
        The Qsys file.
      s4gx_pcie.sdc
        The timing constraints file.
      s4gx_pcie.tcl
        Tcl file that make_project script uses to create the Quartus II project.
    s4gx_pcie.sof
      The precompiled .sof for production devices.
```

**Generating the Qsys System**

To generate the Qsys system, perform the following steps:
1. Create the Quartus II project file s4gx_pcie.qpf:
   - On Windows operating systems, run the make_project.bat script.
   - On Linux operating systems, run the make_project.sh script.
2. Open the Quartus II project file s4gx_pcie.qpf.
3. Open the Command Prompt and change to the <Quartus II installation>\quartus\sopc_builder\bin directory.
4. Type the following command to open Qsys in debug mode:
   
   qsys-edit --debug
5. In Qsys, on the File menu, click Open and select Multi_Scaler.qsys.
6. On the Component Library tab, right click on Library and click Show Hidden Components. Expand Video and Image Processing, and expand Component Library to see all video and image processing library components.
7. In Qsys, click the Generation tab.
8. Click Generate.
9. When the system generation is successful, in Qsys, on the Tools menu, click Nios II Software Build Tools for Eclipse.

Compiling the Software

To compile the software and create the onchip_memory2_0.hex file, in the Nios II Software Build Tools for Eclipse, perform the following steps:

1. In the Workspace Launcher window click Browse... and create a new workspace directory, workspace, in the project s4gx_pcie directory. Then click OK to open the workspace.
2. In the Nios II – Eclipse window, right-click in the Project Explorer tab, point to New and select Nios II Application and BSP from Template.
3. In the Nios II Application and BSP from Template window fill in the following information:
   - For SOPC Information File Name, browse to locate the Multi_Scalar.sopcinfo file.
   - For Project name, enter s4gx_pcie_controller.
   - For Templates, select Blank Project.
4. Click the Finish button.
5. In the Project Explorer tab, right-click on s4gx_pcie_controller_bsp, point to Nios II, and select Generate BSP.
6. In the Nios II – Eclipse window, right-click on the + symbol to the left of s4gx_pcie_controller to open the list of files. Right-click on the file main.cpp and select Add to Nios II Build.
7. In the Project Explorer tab, right-click on s4gx_pcie_controller in the Project Explorer tab and select Properties.
8. In the Properties for s4gx_pcie_controller window, select Nios II Application Properties and change the Optimization level to Level 3. Then click OK.

9. In the Project Explorer tab, right-click on s4gx_pcie_controller_bsp and select Properties.

10. In the Properties for s4gx_pcie_controller_bsp window, select Nios II BSP Properties and change the Optimization level to Level 3. Then click OK.

11. In the Project Explorer tab, right-click on s4gx_pcie_controller, and select Build Project.

12. In the Project Explorer tab, right-click on s4gx_pcie_controller, point to Make Targets and select Build....

13. In the Make Targets window, select mem_init_generate and then click Build.

14. The design creates the cpu_memory.hex file.

**Compiling the Design**

To compile the design in the Quartus II software and create the s4gx_pcie.sof file, on the Tools menu, click Start Compilation. When compilation completes, the Quartus II software creates the s4gx_pcie.sof file.

**Programming a Device**

To program the FPGA and set up the reference design, perform the following steps:

1. Connect the SDI HSMC board to HSMC PORT A input on the Stratix IV GX FPGA development board

2. Connect the SDI monitor cables to the SDI_OUT1 output.

3. Turn on the Stratix IV GX FPGA development board.

4. In the Quartus II software, on the Tools menu, click Programmer, to program the FPGA with the s4gx_pcie.sof file.

5. Check that LED0 flashes.

6. Connect your 1080p60 or 720p60 SDI source cable to the SDI_IN1 input. Various LEDs illuminate or flash (Table 10).

Table 10 describes the Stratix IV GX FPGA development board LEDs.

<table>
<thead>
<tr>
<th>LED</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Software heartbeat. Flashes when the software is running on the Nios II processor.</td>
</tr>
<tr>
<td>1</td>
<td>Illuminates when the output is running.</td>
</tr>
<tr>
<td>2</td>
<td>Illuminates when the input is locked.</td>
</tr>
<tr>
<td>3</td>
<td>Illuminates when the board detects an overflow.</td>
</tr>
<tr>
<td>4</td>
<td>Illuminates when the board detects an underflow.</td>
</tr>
</tbody>
</table>
### Document Revision History

Table 11 shows the revision history for this document.

<table>
<thead>
<tr>
<th>Date</th>
<th>Version</th>
<th>Changes</th>
</tr>
</thead>
<tbody>
<tr>
<td>August 2012</td>
<td>1.1</td>
<td>Corrected minor errors.</td>
</tr>
<tr>
<td>January 2012</td>
<td>1.0</td>
<td>Initial release.</td>
</tr>
</tbody>
</table>