This application note describes a 4K format conversion reference design. 4K resolution is the next major enhancement in video because of the benefits in picture clarity and realism. Many leading projector, broadcast, and camera manufacturers are shipping 4K enabled systems. Altera enables this next generation format conversion by reducing the system device count, which lowers the overall costs, reduces the cost of development, and simplifies board design. Previous systems required as many as nine off-the-shelf devices to perform 4K format conversion—four 1080p format conversion devices and five devices for serial digital interface (SDI) input and output. The 4K format conversion reference design uses less than 50% of a single Altera® Stratix® IV EP4SGX230 FPGA. With migration paths to any Altera FPGA device families, you can integrate these functions with ample headroom remaining to incorporate other video functionality and interfaces such as DisplayPort and video compression (encoding or decoding) processing.

4K resolution refers to any resolutions that have approximately 4,000 horizontal pixels on a display screen. In digital cinema, a typical resolution is 4,096 by 2,160 pixels, and in computer graphics, quad full high definition (QFHD) is 3,840 by 2,160 pixels. Typically, the processing of 4K video requires more than four times the processing capability of 1080p60 video.

Features

The reference design offers the following features:

- Support for the following inputs:
  - PAL
  - 720p60
  - 1080i50
  - 1080i60
  - 1080p60
- One QFHD output transmitted over four 3G-SDI 1080p60 outputs.
- Four video processing pipelines running at 148.5 MHz and each including:
  - A prescaler clipper to select a portion of the input video lines to be upscaled
  - A scaler with eight horizontal and eight vertical taps performing a four times upscale
  - Double buffering of the video to external DDR3 SDRAM
- System initialization and run-time configuration in software.
General Description

The 4K format conversion reference design takes a PAL, 720p, 1080i, or 1080p input over a 3G-SDI interface and upscales it to QFHD resolution output over four 3G-SDI interfaces. The Altera SDI IP core supports the 3G-SDI interfaces in the FPGA. A video server provides the SDI input; the four SDI outputs display on four separate monitors, or a quartered 4K capable monitor. The reference design was demonstrated on a single Stratix IV EP4SGX230 FPGA at International Broadcast Convention (IBC) 2010.

The reference design uses IP cores from the Video and Image Processing Suite and components from the video and image processing component library, which is a collection of components that you use to build video and image processing IP cores or reference designs. The component library allows you to create more complex systems than the Video and Image Processing Suite offers. You cannot use component library components alone—you must also use a scheduler, for example, a CPU or state machine.

For more information about the Video and Image Processing Suite, refer to the Video and Image Processing Suite User Guide.
Table 1 lists the resource utilization on a Stratix IV GX device (S4GX230).

**Table 1. Resource Usage**

<table>
<thead>
<tr>
<th>Usage</th>
<th>ALUTs</th>
<th>Logic Registers</th>
<th>Logic Utilization</th>
<th>DSP Block 18-Bit Elements</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Total Blocks</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>Memory Bits</td>
<td>Memory Implementation Bits</td>
<td>M9K</td>
<td>M144K</td>
</tr>
<tr>
<td>On device</td>
<td>51,810</td>
<td>60,616</td>
<td>77,148</td>
<td>4,824,456</td>
</tr>
<tr>
<td>Total available on device</td>
<td>182,400</td>
<td>182,400</td>
<td>182,400</td>
<td>14,625,792</td>
</tr>
<tr>
<td>Percentage used on device</td>
<td>28%</td>
<td>33%</td>
<td>42%</td>
<td>33%</td>
</tr>
</tbody>
</table>
**Functional Description**

Figure 1 on page 4 shows a block diagram of the reference design.

**Figure 1. Block Diagram**

![Block Diagram](image-url)
Table 2 describes the video input blocks. The video input takes a single SDI input, checks that it is in a supported format, and then duplicates it to the four video pipelines.

Table 2. Video Input Blocks

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clocked video input IP core</td>
<td>Video and image processing suite</td>
<td>The clocked video input converts the output of the SDI IP core into Avalon-ST Video protocol.</td>
</tr>
<tr>
<td>Video input bridge</td>
<td>Video and image processing library</td>
<td>The video input bridge alerts the scheduler to a new packet arriving on its Avalon-ST Video input and then sends it to the destination that the scheduler commands.</td>
</tr>
<tr>
<td>Duplicator</td>
<td>Video and image processing library</td>
<td>The duplicator duplicates each received input packet to all of its four outputs.</td>
</tr>
</tbody>
</table>

Table 3 describes the video pipeline blocks. The video pipeline takes a portion of the video input and upscales it before writing the result into external memory.

Table 3. Video Pipeline Blocks

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clipper algorithmic IP core</td>
<td>Video and image processing library</td>
<td>The clipper algorithmic IP core clips away most of the input line allowing only approximately a quarter of it to propagate to its output. The quarter that is propagated is different for each video pipeline.</td>
</tr>
<tr>
<td>Line buffer</td>
<td>Video and image processing library</td>
<td>The line buffer uses on-chip memory to store multiple lines and then outputs them in parallel as one packet. This reference design configures each line buffer to support four vertical taps.</td>
</tr>
<tr>
<td>Scaler algorithmic IP core</td>
<td>Video and image processing library</td>
<td>The scaler algorithmic IP core upscales the input line by a factor of two.</td>
</tr>
<tr>
<td>Packet writer</td>
<td>Video and image processing library</td>
<td>The packet writer writes the lines of output video frame (packets) into external memory.</td>
</tr>
</tbody>
</table>

Table 4 describes the video control blocks. The Nios II scheduler controls the IP cores. It allows you to use the register maps to configure, start, and stop the IP cores and also the component library components. The components require much lower-level control as they only perform tasks, such as processing input packets, when they receive a command from the scheduler.

Table 4. Video Control Blocks (Part 1 of 2)

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Scaler kernel creator</td>
<td>Video and image processing library</td>
<td>The scaler kernel creator is a hardware accelerator block that returns the input lines required to produce an output line. The scheduler uses this fact to determine which input lines need to be stored in the line buffer.</td>
</tr>
<tr>
<td>Packet switch</td>
<td>Video and image processing library</td>
<td>The packet switch routes messages to the end point specified in the destination address. This process allows the Nios II processor to send messages to any component in the reference design by altering the destination address.</td>
</tr>
</tbody>
</table>
Table 4. Video Control Blocks (Part 2 of 2)

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Nios II message interface unit</td>
<td>Video and image processing component library</td>
<td>Components use the Avalon-ST Message format to send and receive messages. The message interface unit is a memory-mapped peripheral that the Nios II processor uses to send and receive messages.</td>
</tr>
<tr>
<td>Control slave</td>
<td>Video and image processing component library</td>
<td>The control slave provides a register map interface between the Nios II controller and the Nios II scheduler. It updates scaling parameters and synchronizes the switching of the read and write sides of the DDR3 double buffer.</td>
</tr>
<tr>
<td>Nios II scheduler</td>
<td>Qsys</td>
<td>The design uses a Nios II processor to schedule the scaling subsystem in the following ways: Sends command messages to the components. Receives response messages from the components.</td>
</tr>
<tr>
<td>Nios II controller</td>
<td>Qsys</td>
<td>The design uses another Nios II processor to control the system in the following ways: Reacts to interrupts from the clocked video input triggered by changes in the input video format. Configures the IP cores through their register maps.</td>
</tr>
</tbody>
</table>

Table 5 describes the video output blocks. The video output reads the upscaled output video from external memory and feeds it to four SDI outputs.

Table 5. Video Control Blocks

<table>
<thead>
<tr>
<th>Block</th>
<th>Source</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frame reader</td>
<td>Video and image processing suite</td>
<td>The frame reader reads frames from external memory and converts them into Avalon-ST Video packets.</td>
</tr>
<tr>
<td>Gamma corrector</td>
<td>Video and image processing suite</td>
<td>The gamma corrector corrects any out of range values that occur as a result of scaling. This block brings the color values back into the acceptable SDI range of $64 \leq Y \leq 940$ and $64 \leq Cb/Cr \leq 960$.</td>
</tr>
<tr>
<td>Clocked video output</td>
<td>Video and image processing suite</td>
<td>In this reference design the clocked video outputs convert Avalon-ST Video to a format the SDI IP core can take in.</td>
</tr>
</tbody>
</table>

The DDR3 SDRAM Controller with ALTMEMPHY and the multiport front end perform the following actions:

- Perform the buffering of video to and from external DDR3 SDRAM.
- Handle the arbitration of multiple packet writers and frame reader masters on the single slave interface of the DDR3 SDRAM Controller with ALTMEMPHY.

**Video Pipelines**

The reference design consists of four video pipelines that can each process 1080p60 video. The design splits the 4K upscale processing across the four video pipelines. Extra video processing pipelines can be added to process higher resolutions—the only limit is the size of the FPGA used and the DDR3 SDRAM bandwidth that is available. Conversely, the number of video processing pipelines required is reduced as the achievable $f_{\text{MAX}}$ increases. For example, doubling the $f_{\text{MAX}}$ results in a halving the number of video processing pipelines required.
You can parameterize each video pipeline, which consists of a chain of processing functions. Avalon-ST interfaces allow you to connect any number of processing functions with Qsys. Figure 2 shows a video pipeline, which has the following features:

- A clipper algorithmic IP core selects the part of the input line that this video pipeline processes
- A line buffer to buffer multiple input lines and provide the scaler algorithmic IP core with the correct pixel kernels to produce the required output lines
- A scaler algorithmic IP core performs the upscale and produces the output lines
- A packet writer writes the output lines into DDR3 SDRAM

**Figure 2. Video Pipeline Block Diagram**
Vertical bands split the incoming video across the four pipelines. Each video pipeline processes a different vertical band of the incoming video frame. Figure 3 shows that each pipeline selects a different quarter of the line (with a small overlap) and upscale only that portion of the video.

**Figure 3. A Video Frame Split Over The Video Processing Pipelines**

![Diagram showing how video is divided and processed through four pipelines](image)
To simultaneously produce the required four SDI outputs, the design must write the 4K frame into a double buffer. Figure 4 shows buffering of the 4K frame. The design writes the 4K frame as vertical bands, then swaps the double buffer, reads out the frame as four quadrants, and sends each to a separate SDI output.

**Figure 4. Buffering of 4K Video Frame in DDR3 SDRAM**
The design then recombines the upscaled output lines, removing any overlap, to produce the 4K frame (Figure 5).

**Figure 5. 4K Frame**

![Diagram of 4K Frame](image)

**Control Interfaces**

The reference design introduces the video and image processing component library, which allows greater flexibility and control in video processing designs. The component library is a collection of common video function building blocks that build the video and image processing suite with IP cores. When you use the component library as part of a MegaCore function, the software hides the control and parameterization of the components. In the reference design, a selection of components (the clipper algorithmic IP core, line buffer, scaler algorithmic IP core, and the packet writer) create a flexible video processing pipeline. Each component has a command interface through which it receives messages from a scheduler instructing it to perform a specific function. Some example functions for a line buffer include: receive a new line, shift the lines it contains, or send a kernel of pixels to another component. You can implement the scheduler as an HDL state machine that controls a handful of components, or a CPU that runs software to control large systems of components. The reference design scheduler is a Nios II processor.

For each line the video input bridge receives, it sends a response message to the scheduler. The scheduler then sends out a message to the video input bridge, which instructs it to send the line to a particular destination. In this case, the design sends the line a duplicator, which sends copies of the line to multiple destinations. The scheduler also sends messages to each of the components in the video pipelines. The messages instruct the components which functions to perform on the line.

A software scheduler provides increased flexibility in both system debugging and modifying run-time functionality, to give a more productive design cycle and greatly reduced time-to-market.
Software Schedule

The `main_core.cpp` file contains the software schedule, comments, and a detailed description of the schedule that the Nios II scheduler uses. A number of C macros, in the `alt_vip_beta_nios_ii_miui_api.h` and `alt_vip_beta_nios_ii_miui_regs.h` files, use the message interface unit to allow the Nios II processor to send and receive messages. The macros translate to simple memory-mapped reads or writes and describe the API to allow you to use the Nios II message interface unit.

For more details on the Nios II message interface unit, refer to the Video and Image Processing Component Library Functional Description (available from Altera).

Each component in the reference design also has a set of commands that it accepts and a set of responses that it returns. The `alt_vip_common_pkg.h` file lists the commands.

The `main_top.cpp` file contains the top-level control software that monitors and configures the input and output video interfaces. The Nios II controller switches the read side of the video double buffer. The top-level control software contains detailed comments that describe its operation.

Clocks

Table 6 lists the clocks and frequencies.

<table>
<thead>
<tr>
<th>Clock Domain</th>
<th>$f_{\text{MAX}}$ (MHz)</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>sdi_rx_clk[0]</code></td>
<td>148.5</td>
<td>The SDI input clock.</td>
</tr>
<tr>
<td><code>sdi_clk148</code></td>
<td>148.5</td>
<td>The SDI output clock for the four outputs.</td>
</tr>
<tr>
<td><code>vip_clk</code></td>
<td>148.5</td>
<td>The video processing pipelines clock.</td>
</tr>
<tr>
<td><code>altmemddr_0_sysclk</code></td>
<td>200.0</td>
<td>The local interface of the memory controller clock.</td>
</tr>
<tr>
<td>DDR3 clock</td>
<td>400.0</td>
<td>The DDR3 SDRAM is clocked at 400 MHz.</td>
</tr>
</tbody>
</table>

Getting Started

This section describes the following topics:

- Hardware and Software Requirements
- Downloading and Installing the Reference Design
- Generating the Qsys System
- Compiling the Software
- Compiling the Design
- Programming a Device

Hardware and Software Requirements

The reference design requires the following hardware:

- Stratix IV GX FPGA development board
Two Terasic Transceiver SDI High-Speed Mezzanine Card (HSMC) boards
A 1080p60 SDI video source
Four 1080p60 SDI monitors or DVI monitors with SDI-to-DVI converters

The reference design requires the following software:
Quartus II software v10.1
Nios II EDS v10.1

**Downloading and Installing the Reference Design**

To download and install the reference design, follow these steps:

1. Request the reference design (.zip) files from the 4K Format Conversion Reference Design web page.
2. Extract the contents of the archive file to a directory on your computer. Do not use spaces in the directory path name.

Figure 6 shows the reference design directory structure.

**Figure 6. Directory Structure**

```
├── fourk-&version>
│    ├── es
│    │    └── Contains the precompiled .sof for engineering sample devices.
│    └── s4gx_pcie
│        └── Contains the Quartus II project.
│            ├── clock_gen
│            │    └── Contains the PLLs for SDI clock generation.
│            ├── ip
│            │    └── Contains beta versions of IP MegaCore functions.
│            ├── sdi_dprio_siv
│            │    └── Contains the SDI transceiver reconfiguration controller.
│            └── sdi_dual
│                └── Contains the SDI MegaCore function configurations.
│    └── software
│        └── Contains the Nios II application project and Nios II C++ source code.
│    └── top
│        └── Contains the top-level design file (s4gx_pcie.v) and interface configuration file (config.v).
│            ├── FourK.qsys
│            │    └── Qsys system files.
│            └── FourK_core.qsys
│                └── Qsys system file.
│            ├── make_project.bat
│            │    └── Run this script on Windows to create the project.
│            └── make_project.sh
│                └── Run this script on Linux to create the project.
│        ├── s4gx_pcie.sdc
│        │    └── The timing constraints file.
│        └── s4gx_pcie.tcl
│            └── Tcl file that make_project script uses to create the Quartus II project.
│    └── s4gx_pcie.sof
│        └── The precompiled .sof for production devices.
```
Generating the Qsys System

To generate the Qsys system, follow these steps:

1. Create the Quartus II project file s4gx_pcie.qpf:
   - On Windows operating systems, run the make_project.bat script from the Nios II EDS Command Shell by typing .\make_project.bat.
   - On Linux operating systems, run the make_project.sh script.
2. Open the Quartus II software and open the project file s4gx_pcie.qpf.
3. On the Tools menu click Qsys.
4. Open the FourK.qsys system.

   The reference design includes two Qsys systems. The FourK_core.qsys system contains the four video scaling pipelines and the components required to schedule them. The FourK.qsys system includes one instance of the FourK_core.qsys system.

5. In Qsys, click the Generation tab.
6. Click Generate.
7. When the System generation was successful message displays, start the Nios II Software Build Tools for Eclipse: on the Windows Start menu, point to Altera, click Nios II EDS <version> and click Nios II Software Build Tools for Eclipse.

Compiling the Software

This section describes how to create the FourK_top_cpu_memory.hex and FourK_FourK_core_inst_control_memory.hex files in the Nios II Software Build Tools for Eclipse. To create the FourK_top_cpu_memory.hex file, follow these steps:

1. In the Workspace Launcher window click Browse... and create a new workspace directory, workspace, in the project s4gx_pcie directory. Then click OK to open the workspace.
2. In the Nios II – Eclipse window, right-click in the Project Explorer tab, point to New and click Nios II Application and BSP from Template.
3. In the Nios II Application and BSP from Template window fill in the following information:
   - For SOPC Information File Name, browse to locate the FourK.sopcinfo file.
   - For CPU name, select top_cpu.
   - For Project name, enter s4gx_pcie_controller.
   - For Templates, select Blank Project.
4. Click Finish.
5. On the Project Explorer tab, right-click on s4gx_pcie_controller_bsp, point to Nios II, and click BSP Editor.
6. On the Main tab for stdout select lcd. Turn on enable_interrupt_stack and for interrupt_stack_memory_region_name select top_cpu_memory.
7. On the Linker Script tab for .bss, .heap, .rodata, .rwdata, .stack and .text set the Linker Region Name to top_cpu_memory. For reset set the Memory Device Name to top_cpu_memory.

8. Click Generate.

9. In the Nios II – Eclipse window, expand s4gx_pcie_controller to open the list of files. Right-click on main_top.cpp and video_standard.cpp and click Add to Nios II Build. Collapse the list of files.

10. In the Project Explorer tab, right-click on s4gx_pcie_controller and click Properties.

11. In the Properties for s4gx_pcie_controller window, select Nios II Application Properties and change the Optimization level: to Level 3. Click OK.

12. In the Project Explorer tab, right-click on the s4gx_pcie_controller_bsp, and select Properties.

13. In the Properties for s4gx_pcie_controller_bsp window, select Nios II BSP Properties and change the Optimization level: to Level 3. Click OK.

14. On the Project Explorer tab, right-click on the s4gx_pcie_controller, and click Build Project.

15. On the Project Explorer tab, right-click on the s4gx_pcie_controller, point to Make Targets and click Build....

16. In the Make Targets window, select mem_init_generate and then click Build.

17. The software creates the FourK_top_cpu_memory.hex file.

To create the FourK_FourK_core_inst_control_memory.hex file, follow these steps:

1. In the Nios II – Eclipse window, right-click in the Project Explorer tab, point to New and click Nios II Application and BSP from Template.

2. In the Nios II Application and BSP from Template window fill in the following information:

   - For SOPC Information File Name, browse to locate the FourK.sopcinfo file.
   - For CPU name, select FourK_core_inst_control_cpu.
   - For Project name, enter core_controller.
   - For Templates, select Blank Project.

3. Click Finish.

4. In the Nios II - Eclipse window, expand core_controller, to open the list of files. Right-click on main_core.cpp and click Add to Nios II Build.

5. On the Project Explorer tab, right-click on core_controller and click Properties.

6. In the Properties window, select Nios II Application Properties and change Optimization level: to Level 3. Click OK.

7. On the Project Explorer tab, right-click on core_controller_bsp, and click Properties.

8. In the Properties window, select Nios II BSP Properties and change Optimization level: to Level 3. Click OK.
9. On the Project Explorer tab, right-click on core_controller, and click Build Project.

10. On the Project Explorer tab, right-click on core_controller, point to Make Targets and click Build....

11. In the Make Targets window, select mem_init_generate and click Build.

12. The software creates the FourK_FourK_core_inst_control_memory.hex file.

**Compiling the Design**

To compile the design in the Quartus II software and create the s4gx_pcie.sof file, follow these steps:

1. On the Tools menu, click Start Compilation.

2. When compilation completes, the design creates the s4gx_pcie.sof file.

**Programming a Device**

To program the FPGA and set up the reference design, follow these steps:
1. Connect the two SDI HSMC boards as Figure 7 shows.

**Figure 7. Reference Design Setup**

2. Connect the four SDI monitor cables to the SDI_OUT1 and SDI_OUT2 outputs.
3. Turn on the Stratix IV GX FPGA development board.
4. In the Quartus II software, on the Tools menu, click **Programmer**, to program the FPGA with the s4gx_pcie.sof file.
5. Check that LED0 flashes.
6. Connect your SDI source cable to the SDI_IN1 input. When you connect an input, the LCD displays the detected format.
7. Press push button 0 (PB0) to enable or disable the edge adaptive scaling mode within the scaler algorithmic IP cores.
8. Press push button 1 (PB1) to increase the edge threshold that the edge adaptive scaling algorithm uses.
9. Press push button 2 (PB2) to decrease the edge threshold.
Table 7 describes the Stratix IV GX FPGA Development Board LEDs.

**Table 7. LEDs**

<table>
<thead>
<tr>
<th>LED</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Software heartbeat. Flashes when the software is running on the Nios II processor.</td>
</tr>
<tr>
<td>1</td>
<td>Illuminates when the output runs.</td>
</tr>
<tr>
<td>2</td>
<td>Illuminates when the SDI_IN1 input detects a supported input format.</td>
</tr>
<tr>
<td>3</td>
<td>Illuminates when the input receives overflow.</td>
</tr>
<tr>
<td>4</td>
<td>Illuminates when the output transmits underflow.</td>
</tr>
<tr>
<td>5</td>
<td>60 frames per second heartbeat. Flashes every 60 frames of the input video.</td>
</tr>
<tr>
<td>6</td>
<td>Illuminates when the output is generator locked to the input.</td>
</tr>
<tr>
<td>7</td>
<td>Illuminates when you enables edge adaptive scaling.</td>
</tr>
</tbody>
</table>

**Document Revision History**

Table 8 shows the revision history for this document.

**Table 8. Document Revision History**

<table>
<thead>
<tr>
<th>Date</th>
<th>Version</th>
<th>Changes</th>
</tr>
</thead>
<tbody>
<tr>
<td>August 2012</td>
<td>2.0</td>
<td>Updated for Qsys and the Quartus II software v12.0.</td>
</tr>
<tr>
<td>May 2011</td>
<td>1.0</td>
<td>Initial release.</td>
</tr>
</tbody>
</table>