FFT IP Core: User Guide
About This IP Core
Intel DSP IP Core Features
 Avalon^{®} Streaming (AvalonST) interfaces
 DSP Builder for Intel^{®} FPGAs ready
 Testbenches to verify the IP core
 IP functional simulation models for use in Intelsupported VHDL and Verilog HDL simulators
FFT IP Core Features
 Bitaccurate MATLAB models
 Variable streaming FFT:
 Singleprecision floatingpoint or fixedpoint representation
 Radix4, mixed radix4/2 implementations (for floatingpoint FFT), and radix2^{2} single delay feedback implementation (for fixedpoint FFT)
 Input and output orders: natural order, or digitreversed, and DCcentered (N/2 to N/2)
 Reduced memory requirements
 Support for 8 to 32bit data and twiddle width (fixedpoint FFTs)
 Fixed transform size FFT that implements block
floatingpoint FFTs and maintains the maximum dynamic range of data during processing
(not for variable streaming FFTs)
 Multiple I/O data flow options: streaming, buffered burst, and burst
 Uses embedded memory
 Maximum system clock frequency more than 300 MHz
 Optimized to use Stratix series DSP blocks and TriMatrix memory
 High throughput quadoutput radix 4 FFT engine
 Support for multiple singleoutput and quadoutput engines in parallel
 User control over optimization in DSP blocks or in speed in Stratix V devices, for streaming, buffered burst, burst, and variable streaming fixedpoint FFTs
 Avalon Streaming (AvalonST) compliant input and output interfaces
 Parameterizationspecific VHDL and Verilog HDL testbench generation
 Transform direction (FFT/IFFT) specifiable on a perblock basis
General Description
The FFT MegaCore function implements:
 Fixed transform size FFT
 Variable streaming FFT
Fixed Transform Size FFT
The fixed transform FFT accepts a two's complement format complex data vector of length N inputs, where N is the desired transform length in natural order. The function outputs the transformdomain complex vector in natural order. The FFT produces an accumulated block exponent to indicate any data scaling that has occurred during the transform to maintain precision and maximize the internal signaltonoise ratio. You can specify the transform direction on a perblock basis using an input port.
Variable Streaming FFT
The fixedpoint representation grows the data widths naturally from input through to output thereby maintaining a high SNR at the output. The single precision floatingpoint representation allows a large dynamic range of values to be represented while maintaining a high SNR at the output.
The order of the input data vector of size N can be natural or digitreversed, or N/2 to N/2 (DCcentered). The fixedpoint representation supports a natural or DCcentered order and the floating point representation supports a natural, digitreversed order. The FFT outputs the transformdomain complex vector in natural or digitreversed order. You can specify the transform direction on a perblock basis using an input port.
DSP IP Core Device Family Support
Intel offers the following device support levels for Intel FPGA IP cores:
 Advance support—the IP core is available for simulation and compilation for this device family. FPGA programming file (.pof) support is not available for Quartus Prime Pro Stratix 10 Edition Beta software and as such IP timing closure cannot be guaranteed. Timing models include initial engineering estimates of delays based on early postlayout information. The timing models are subject to change as silicon testing improves the correlation between the actual silicon and the timing models. You can use this IP core for system architecture and resource utilization studies, simulation, pinout, system latency assessments, basic timing assessments (pipeline budgeting), and I/O transfer strategy (datapath width, burst depth, I/O standards tradeoffs).
 Preliminary support—Intel verifies the IP core with preliminary timing models for this device family. The IP core meets all functional requirements, but might still be undergoing timing analysis for the device family. You can use it in production designs with caution.
 Final support—Intel verifies the IP core with final timing models for this device family. The IP core meets all functional and timing requirements for the device family. You can use it in production designs.
Device Family  Support 

Arria^{®} II GX  Final 
Arria II GZ  Final 
Arria V  Final 
Intel^{®} Arria^{®} 10  Final 
Cyclone^{®} IV  Final 
Cyclone V  Final 
Intel^{®} Cyclone^{®} 10  Final 
Intel^{®} MAX^{®} 10 FPGA  Final 
Stratix^{®} IV GT  Final 
Stratix IV GX/E  Final 
Stratix V  Final 
Intel^{®} Stratix^{®} 10  Advance 
Other device families  No support 
DSP IP Core Verification
FFT IP Core Release Information
Item  Description 

Version  17.1 
Release Date  November 2017 
Ordering Code  IPFFT 
Performance and Resource Utilization
Device  Parameters  ALM  DSP Blocks  Memory  Registers  f_{MAX} (MHz)  

Type  Length  Engines  M10K  M20K  Primary  Secondary  
Arria V  Buffered Burst  1,024  1  1,572  6  16    3,903  143  275 
Arria V  Buffered Burst  1,024  2  2,512  12  30    6,027  272  274 
Arria V  Buffered Burst  1,024  4  4,485  24  59    10,765  426  262 
Arria V  Buffered Burst  256  1  1,532  6  16    3,713  136  275 
Arria V  Buffered Burst  256  2  2,459  12  30    5,829  246  245 
Arria V  Buffered Burst  256  4  4,405  24  59    10,539  389  260 
Arria V  Buffered Burst  4,096  1  1,627  6  59    4,085  130  275 
Arria V  Buffered Burst  4,096  2  2,555  12  59    6,244  252  275 
Arria V  Buffered Burst  4,096  4  4,526  24  59    10,986  438  265 
Arria V  Burst Quad Output  1,024  1  1,565  6  8    3,807  147  273 
Arria V  Burst Quad Output  1,024  2  2,497  12  14    5,952  225  275 
Arria V  Burst Quad Output  1,024  4  4,461  24  27    10,677  347  257 
Arria V  Burst Quad Output  256  1  1,527  6  8    3,610  153  272 
Arria V  Burst Quad Output  256  2  2,474  12  14    5,768  233  275 
Arria V  Burst Quad Output  256  4  4,403  24  27    10,443  437  257 
Arria V  Burst Quad Output  4,096  1  1,597  6  27    3,949  151  275 
Arria V  Burst Quad Output  4,096  2  2,551  12  27    6,119  223  275 
Arria V  Burst Quad Output  4,096  4  4,494  24  27    10,844  392  256 
Arria V  Burst Single Output  1,024  1  672  2  6    1,488  101  275 
Arria V  Burst Single Output  1,024  2  994  4  10    2,433  182  275 
Arria V  Burst Single Output  256  1  636  2  3    1,442  95  275 
Arria V  Burst Single Output  256  2  969  4  8    2,375  152  275 
Arria V  Burst Single Output  4,096  1  702  2  19    1,522  126  270 
Arria V  Burst Single Output  4,096  2  1,001  4  25    2,521  156  275 
Arria V  Streaming  1,024  —  1,880  6  20    4,565  167  275 
Arria V  Streaming  256  —  1,647  6  20    3,838  137  275 
Arria V  Streaming  4,096  —  1,819  6  71    4,655  137  275 
Arria V  Variable Streaming Floating Point  1,024  —  11,195  48  89    18,843  748  163 
Arria V  Variable Streaming Floating Point  256  —  8,639  36  62    15,127  609  161 
Arria V  Variable Streaming Floating Point  4,096  —  13,947  60  138    22,598  854  162 
Arria V  Variable Streaming  1,024  —  2,535  11  14    6,269  179  223 
Arria V  Variable Streaming  256  —  1,913  8  8    4,798  148  229 
Arria V  Variable Streaming  4,096  —  3,232  15  31    7,762  285  210 
Cyclone V  Buffered Burst  1,024  1  1,599  6  16    3,912  114  226 
Cyclone V  Buffered Burst  1,024  2  2,506  12  30    6,078  199  219 
Cyclone V  Buffered Burst  1,024  4  4,505  24  59    10,700  421  207 
Cyclone V  Buffered Burst  256  1  1,528  6  16    3,713  115  227 
Cyclone V  Buffered Burst  256  2  2,452  12  30    5,833  211  232 
Cyclone V  Buffered Burst  256  4  4,487  24  59    10,483  424  221 
Cyclone V  Buffered Burst  4,096  1  1,649  6  59    4,060  138  223 
Cyclone V  Buffered Burst  4,096  2  2,555  12  59    6,254  199  227 
Cyclone V  Buffered Burst  4,096  4  4,576  24  59    10,980  377  214 
Cyclone V  Burst Quad Output  1,024  1  1,562  6  8    3,810  122  225 
Cyclone V  Burst Quad Output  1,024  2  2,501  12  14    5,972  196  231 
Cyclone V  Burst Quad Output  1,024  4  4,480  24  27    10,643  372  216 
Cyclone V  Burst Quad Output  256  1  1,534  6  8    3,617  120  226 
Cyclone V  Burst Quad Output  256  2  2,444  12  14    5,793  153  224 
Cyclone V  Burst Quad Output  256  4  4,443  24  27    10,402  379  223 
Cyclone V  Burst Quad Output  4,096  1  1,590  6  27    3,968  120  237 
Cyclone V  Burst Quad Output  4,096  2  2,547  12  27    6,135  209  227 
Cyclone V  Burst Quad Output  4,096  4  4,512  24  27    10,798  388  210 
Cyclone V  Burst Single Output  1,024  1  673  2  6    1,508  83  222 
Cyclone V  Burst Single Output  1,024  2  984  4  10    2,475  126  231 
Cyclone V  Burst Single Output  256  1  639  2  3    1,382  159  229 
Cyclone V  Burst Single Output  256  2  967  4  8    2,353  169  240 
Cyclone V  Burst Single Output  4,096  1  695  2  19    1,540  105  237 
Cyclone V  Burst Single Output  4,096  2  1,009  4  25    2,536  116  240 
Cyclone V  Streaming  1,024  —  1,869  6  20    4,573  132  211 
Cyclone V  Streaming  256  —  1,651  6  20    3,878  85  226 
Cyclone V  Streaming  4,096  —  1,822  6  71    4,673  124  199 
Cyclone V  Variable Streaming Floating Point  1,024  —  11,184  48  89    18,830  628  133 
Cyclone V  Variable Streaming Floating Point  256  —  8,611  36  62    15,156  467  133 
Cyclone V  Variable Streaming Floating Point  4,096  —  13,945  60  138    22,615  701  132 
Cyclone V  Variable Streaming  1,024  —  2,533  11  14    6,254  240  179 
Cyclone V  Variable Streaming  256  —  1,911  8  8    4,786  176  180 
Cyclone V  Variable Streaming  4,096  —  3,226  15  31    7,761  320  176 
Stratix V  Buffered Burst  1,024  1  1,610  6    16  4,141  107  424 
Stratix V  Buffered Burst  1,024  2  2,545  12    30  6,517  170  427 
Stratix V  Buffered Burst  1,024  4  4,554  24    59  11,687  250  366 
Stratix V  Buffered Burst  256  1  1,546  6    16  3,959  110  493 
Stratix V  Buffered Burst  256  2  2,475  12    30  6,314  134  440 
Stratix V  Buffered Burst  256  4  4,480  24    59  11,477  281  383 
Stratix V  Buffered Burst  4,096  1  1,668  6    30  4,312  122  432 
Stratix V  Buffered Burst  4,096  2  2,602  12    30  6,718  176  416 
Stratix V  Buffered Burst  4,096  4  4,623  24    59  11,876  249  392 
Stratix V  Burst Quad Output  1,024  1  1,550  6    8  4,037  115  455 
Stratix V  Burst Quad Output  1,024  2  2,444  12    14  6,417  164  433 
Stratix V  Burst Quad Output  1,024  4  4,397  24    27  11,548  330  416 
Stratix V  Burst Quad Output  256  1  1,487  6    8  3,868  83  477 
Stratix V  Burst Quad Output  256  2  2,387  12    14  6,211  164  458 
Stratix V  Burst Quad Output  256  4  4,338  24    27  11,360  307  409 
Stratix V  Burst Quad Output  4,096  1  1,593  6    14  4,222  93  448 
Stratix V  Burst Quad Output  4,096  2  2,512  12    14  6,588  154  470 
Stratix V  Burst Quad Output  4,096  4  4,468  24    27  11,773  267  403 
Stratix V  Burst Single Output  1,024  1  652  2    4  1,553  111  500 
Stratix V  Burst Single Output  1,024  2  1,011  4    8  2,687  149  476 
Stratix V  Burst Single Output  256  1  621  2    3  1,502  132  500 
Stratix V  Burst Single Output  256  2  978  4    8  2,555  173  500 
Stratix V  Burst Single Output  4,096  1  681  2    9  1,589  149  500 
Stratix V  Burst Single Output  4,096  2  1,039  4    14  2,755  161  476 
Stratix V  Streaming  1,024  —  1,896  6    20  4,814  144  490 
Stratix V  Streaming  256  —  1,604  6    20  4,062  99  449 
Stratix V  Streaming  4,096  —  1,866  6    38  4,889  118  461 
Stratix V  Variable Streaming Floating Point  1,024  —  11,607  32    87  19,031  974  355 
Stratix V  Variable Streaming Floating Point  256  —  8,850  24    59  15,297  820  374 
Stratix V  Variable Streaming Floating Point  4,096  —  14,335  40    115  22,839  1,047  325 
Stratix V  Variable Streaming  1,024  —  2,334  14    13  5,623  201  382 
Stratix V  Variable Streaming  256  —  1,801  10    8  4,443  174  365 
Stratix V  Variable Streaming  4,096  —  2,924  18    23  6,818  238  355 
FFT IP Core Getting Started
Installing and Licensing Intel FPGA IP Cores
The Intel^{®} Quartus^{®} Prime software installs IP cores in the following locations by default:
Location  Software  Platform 

<drive>:\intelFPGA_pro\quartus\ip\altera  Intel^{®} Quartus^{®} Prime Pro Edition  Windows* 
<drive>:\intelFPGA\quartus\ip\altera  Intel^{®} Quartus^{®} Prime Standard Edition  Windows 
<home directory>:/intelFPGA_pro/quartus/ip/altera  Intel^{®} Quartus^{®} Prime Pro Edition  Linux* 
<home directory>:/intelFPGA/quartus/ip/altera  Intel^{®} Quartus^{®} Prime Standard Edition  Linux 
Intel FPGA IP Evaluation Mode
 Simulate the behavior of a licensed Intel^{®} FPGA IP core in your system.
 Verify the functionality, size, and speed of the IP core quickly and easily.
 Generate timelimited device programming files for designs that include IP cores.
 Program a device with your IP core and verify your design in hardware.
Intel^{®} FPGA IP Evaluation Mode supports the following operation modes:
 Tethered—Allows running the design containing the licensed Intel^{®} FPGA IP indefinitely with a connection between your board and the host computer. Tethered mode requires a serial joint test action group (JTAG) cable connected between the JTAG port on your board and the host computer, which is running the Intel^{®} Quartus^{®} Prime Programmer for the duration of the hardware evaluation period. The Programmer only requires a minimum installation of the Intel^{®} Quartus^{®} Prime software, and requires no Intel^{®} Quartus^{®} Prime license. The host computer controls the evaluation time by sending a periodic signal to the device via the JTAG port. If all licensed IP cores in the design support tethered mode, the evaluation time runs until any IP core evaluation expires. If all of the IP cores support unlimited evaluation time, the device does not timeout.
 Untethered—Allows running the design containing the licensed IP for a limited time. The IP core reverts to untethered mode if the device disconnects from the host computer running the Intel^{®} Quartus^{®} Prime software. The IP core also reverts to untethered mode if any other licensed IP core in the design does not support tethered mode.
When the evaluation time expires for any licensed Intel^{®} FPGA IP in the design, the design stops functioning. All IP cores that use the Intel^{®} FPGA IP Evaluation Mode time out simultaneously when any IP core in the design times out. When the evaluation time expires, you must reprogram the FPGA device before continuing hardware verification. To extend use of the IP core for production, purchase a full production license for the IP core.
You must purchase the license and generate a full production license key before you can generate an unrestricted device programming file. During Intel^{®} FPGA IP Evaluation Mode, the Compiler only generates a timelimited device programming file (<project name> _time_limited.sof) that expires at the time limit.
Intel^{®} licenses IP cores on a perseat, perpetual basis. The license fee includes firstyear maintenance and support. You must renew the maintenance contract to receive updates, bug fixes, and technical support beyond the first year. You must purchase a full production license for Intel^{®} FPGA IP cores that require a production license, before generating programming files that you may use for an unlimited time. During Intel^{®} FPGA IP Evaluation Mode, the Compiler only generates a timelimited device programming file (<project name> _time_limited.sof) that expires at the time limit. To obtain your production license keys, visit the SelfService Licensing Center.
The Intel^{®} FPGA Software License Agreements govern the installation and use of licensed IP cores, the Intel^{®} Quartus^{®} Prime design software, and all unlicensed IP cores.
FFT IP Core Intel FPGA IP Evaluation Mode Timeout Behavior
For IP cores, the untethered timeout is 1 hour; the tethered timeout value is indefinite. Your design stops working after the hardware evaluation time expires. The Quartus Prime software uses Intel^{®} FPGA IP Evaluation Mode Files (.ocp) in your project directory to identify your use of the Intel^{®} FPGA IP Evaluation Mode evaluation program. After you activate the feature, do not delete these files..
When the evaluation time expires, the source_real, source_imag, and source_exp signals go low.
IP Catalog and Parameter Editor
 Filter IP Catalog to Show IP for active device family or Show IP for all device families. If you have no project open, select the Device Family in IP Catalog.
 Type in the Search field to locate any full or partial IP core name in IP Catalog.
 Rightclick an IP core name in IP Catalog to display details about supported devices, to open the IP core's installation folder, and for links to IP documentation.
 Click Search for Partner IP to access partner IP information on the web.
The parameter editor prompts you to specify an IP variation name, optional ports, and output file generation options. The parameter editor generates a toplevel Intel^{®} Quartus^{®} Prime IP file (.ip) for an IP variation in Intel^{®} Quartus^{®} Prime Pro Edition projects.
The parameter editor generates a toplevel Quartus IP file (.qip) for an IP variation in Intel^{®} Quartus^{®} Prime Standard Edition projects. These files represent the IP variation in the project, and store parameterization information.
Generating IP Cores ( Intel Quartus Prime Pro Edition)
Follow these steps to locate, instantiate, and customize an IP core in the parameter editor:
 Create or open an Intel^{®} Quartus^{®} Prime project (.qpf) to contain the instantiated IP variation.
 In the IP Catalog (Tools > IP Catalog), locate and doubleclick the name of the IP core to customize. To locate a specific component, type some or all of the component’s name in the IP Catalog search box. The New IP Variation window appears.

Specify a toplevel name for your custom IP variation. Do not
include spaces in IP variation names or paths. The parameter editor saves the IP
variation settings in a file named
<your_ip>
.ip. Click OK. The parameter editor appears.
Figure 4. IP Parameter Editor ( Intel^{®} Quartus^{®} Prime Pro Edition)

Set the parameter values in the parameter editor and view the
block diagram for the component. The Parameterization Messages tab at the bottom displays any errors
in IP parameters:
 Optionally, select preset parameter values if provided for your IP core. Presets specify initial parameter values for specific applications.
 Specify parameters defining the IP core functionality, port configurations, and devicespecific features.
 Specify options for processing the IP core files in other EDA tools.
Note: Refer to your IP core user guide for information about specific IP core parameters.  Click Generate HDL. The Generation dialog box appears.
 Specify output file generation options, and then click Generate. The synthesis and simulation files generate according to your specifications.
 To generate a simulation testbench, click Generate > Generate Testbench System. Specify testbench generation options, and then click Generate.
 To generate an HDL instantiation template that you can copy and paste into your text editor, click Generate > Show Instantiation Template.
 Click Finish. Click Yes if prompted to add files representing the IP variation to your project.

After generating
and instantiating your IP variation, make appropriate pin assignments to
connect ports.
Note: Some IP cores generate different HDL implementations according to the IP core parameters. The underlying RTL of these IP cores contains a unique hash code that prevents module name collisions between different variations of the IP core. This unique code remains consistent, given the same IP settings and software version during IP generation. This unique code can change if you edit the IP core's parameters or upgrade the IP core version. To avoid dependency on these unique codes in your simulation environment, refer to Generating a Combined Simulator Setup Script.
IP Core Generation Output ( Intel Quartus Prime Pro Edition)
File Name  Description 

<your_ip>.ip  Toplevel IP variation file that contains the parameterization of an IP core in your project. If the IP variation is part of a Platform Designer system, the parameter editor also generates a .qsys file. 
<your_ip>.cmp  The VHDL Component Declaration (.cmp) file is a text file that contains local generic and port definitions that you use in VHDL design files. 
<your_ip>_generation.rpt  IP or Platform Designer generation log file. Displays a summary of the messages during IP generation. 
<your_ip>.qgsimc (Platform Designer systems only)  Simulation caching file that compares the .qsys and .ip files with the current parameterization of the Platform Designer system and IP core. This comparison determines if Platform Designer can skip regeneration of the HDL. 
<your_ip>.qgsynth (Platform Designer systems only)  Synthesis caching file that compares the .qsys and .ip files with the current parameterization of the Platform Designer system and IP core. This comparison determines if Platform Designer can skip regeneration of the HDL. 
<your_ip>.qip  Contains all information to integrate and compile the IP component. 
<your_ip>.csv  Contains information about the upgrade status of the IP component. 
<your_ip>.bsf  A symbol representation of the IP variation for use in Block Diagram Files (.bdf). 
<your_ip>.spd  Input file that ipmakesimscript requires to generate simulation scripts. The .spd file contains a list of files you generate for simulation, along with information about memories that you initialize. 
<your_ip>.ppf  The Pin Planner File (.ppf) stores the port and node assignments for IP components you create for use with the Pin Planner. 
<your_ip>_bb.v  Use the Verilog blackbox (_bb.v) file as an empty module declaration for use as a blackbox. 
<your_ip>_inst.v or _inst.vhd  HDL example instantiation template. Copy and paste the contents of this file into your HDL file to instantiate the IP variation. 
<your_ip>.regmap  If the IP contains register information, the Intel^{®} Quartus^{®} Prime software generates the .regmap file. The .regmap file describes the register map information of master and slave interfaces. This file complements the .sopcinfo file by providing more detailed register information about the system. This file enables register display views and user customizable statistics in System Console. 
<your_ip>.svd 
Allows HPS System Debug tools to view the register maps of peripherals that connect to HPS within a Platform Designer system. During synthesis, the Intel^{®} Quartus^{®} Prime software stores the .svd files for slave interface visible to the System Console masters in the .sof file in the debug session. System Console reads this section, which Platform Designer queries for register map information. For system slaves, Platform Designer accesses the registers by name. 
<your_ip>.v <your_ip>.vhd 
HDL files that instantiate each submodule or child IP core for synthesis or simulation. 
mentor/  Contains a msim_setup.tcl script to set up and run a simulation. 
aldec/  Contains a script rivierapro_setup.tcl to setup and run a simulation. 
/synopsys/vcs /synopsys/vcsmx 
Contains a shell script vcs_setup.sh to set up and run a simulation. Contains a shell script vcsmx_setup.sh and synopsys_sim.setup file to set up and run a simulation. 
/cadence  Contains a shell script ncsim_setup.sh and other setup files to set up and run an simulation. 
/xcelium  Contains an Parallel simulator shell script xcelium_setup.sh and other setup files to set up and run a simulation. 
/submodules  Contains HDL files for the IP core submodule. 
<IP submodule>/  Platform Designer generates /synth and /sim subdirectories for each IP submodule directory that Platform Designer generates. 
Generating IP Cores ( Intel Quartus Prime Standard Edition)
 In the IP Catalog (Tools > IP Catalog), locate and doubleclick the name of the IP core to customize. The parameter editor appears.
 Specify a toplevel name and output HDL file type for your IP variation. This name identifies the IP core variation files in your project. Click OK. Do not include spaces in IP variation names or paths.
 Specify the parameters and options for your IP variation in the parameter editor. Refer to your IP core user guide for information about specific IP core parameters.

Click
Finish or
Generate (depending on the parameter editor
version). The parameter editor generates the files for your IP variation
according to your specifications. Click
Exit if prompted when generation is complete.
The parameter editor adds the toplevel
.qip file to the current project
automatically.
Note: For devices released prior to Intel^{®} Arria^{®} 10 devices, the generated .qip and .sip files must be added to your project to represent IP and Platform Designer systems. To manually add an IP variation generated with legacy parameter editor to a project, click Project > Add/Remove Files in Project and add the IP variation .qip file.
IP Core Generation Output ( Intel Quartus Prime Standard Edition)
Simulating Intel FPGA IP Cores
The Intel^{®} Quartus^{®} Prime software provides integration with many simulators and supports multiple simulation flows, including your own scripted and custom simulation flows. Whichever flow you choose, IP core simulation involves the following steps:
 Generate simulation model, testbench (or example design), and simulator setup script files.
 Set up your simulator environment and any simulation scripts.
 Compile simulation model libraries.
 Run your simulator.
Simulating the FixedTransform FFT IP Core in the MATLAB Software
The model takes a complex vector as input and it outputs the transformdomain complex vector and corresponding block exponent values. The length and direction of the transform (FFT/IFFT) are also passed as inputs to the model. If the input vector length is an integral multiple of N, the transform length, the length of the output vector(s) is equal to the length of the input vector. However, if the input vector is not an integral multiple of N, it is zeropadded to extend the length to be so. The wizard also creates the MATLAB testbench file <variation name>_tb.m. This file creates the stimuli for the MATLAB model by reading the input complex random data from generated files. If you selected Floating point data representation, the IP core generates the input data in hexadecimal format.
 Run the MATLAB software.

Simulate the desgn:

Type help <variation name>_model
at the command prompt to view the input and output vectors that are
required to run the MATLAB model as a standalone Mfunction. Create your
input vector and make a function call to <variation name>_model.
For example:
N=2048; INVERSE = 0; % 0 => FFT 1=> IFFT x = (2^12)*rand(1,N) + j*(2^12)*rand(1,N); [y,e] = <variation name>_model(x,N,INVERSE);
 Alternatively, run the provided testbench by typing the name of the testbench, <variation name>_tb at the command prompt.

Type help <variation name>_model
at the command prompt to view the input and output vectors that are
required to run the MATLAB model as a standalone Mfunction. Create your
input vector and make a function call to <variation name>_model.
For example:
Simulating the Variable Streaming FFT IP Core in the MATLAB Software
The model takes a complex vector as input and it outputs the transformdomain complex vector. The lengths and direction of the transforms (FFT/IFFT) (specified as one entry per block) are also passed as an input to the model. You must ensure that the length of the input vector is at least as large as the sum of the transform sizes for the model to function correctly. The wizard also creates the MATLAB testbench file <variation name>_tb.m. This file creates the stimuli for the MATLAB model by reading the input complex random data from the generated files.
 Run the MATLAB software.
 In the MATLAB command window, change to the working directory for your project.

Simulate the design:

Type help
<variation name>_model
at the command prompt to view the input and output vectors that are
required to run the MATLAB model as a standalone Mfunction. Create your
input vector and make a function call to
<variation
name>_model. For example:
nps=[256,2048]; inverse = [0,1]; % 0 => FFT 1=> IFFT x = (2^12)*rand(1,sum(nps)) + j*(2^12)*rand(1,sum(nps)); [y] = <variation name>_model(x,nps,inverse);

Alternaitvely, run the provided testbench by typing the name of the
testbench,
<variation name>_tb at
the command prompt.
Note: If you select digitreversed output order, you can reorder the data with the following MATLAB code:
y = y(digit_reverse(0:(FFTSIZE1), log2(FFTSIZE)) + 1);
where digit_reverse is:
function y = digit_reverse(x, n_bits) if mod(n_bits,2) z = dec2bin(x, n_bits); for i=1:2:n_bits1 p(:,i) = z(:,n_bitsi); p(:,i+1) = z(:,n_bitsi+1); end p(:,n_bits) = z(:,1); y=bin2dec(p); else y=digitrevorder(x,4); end

Type help
<variation name>_model
at the command prompt to view the input and output vectors that are
required to run the MATLAB model as a standalone Mfunction. Create your
input vector and make a function call to
<variation
name>_model. For example:
DSP Builder for Intel FPGAs Design Flow
This IP core supports DSP Builder for Intel^{®} FPGAs. Use the DSP Builder for Intel^{®} FPGAs flow if you want to create a DSP Builder for Intel^{®} FPGAs model that includes an IP core variation; use IP Catalog if you want to create an IP core variation that you can instantiate manually in your design.
FFT IP Core Functional Description
Fixed Transform FFTs
To maintain a high signaltonoise ratio throughout the transform computation, the fixed transform FFTs use a blockfloatingpoint architecture, which is a tradeoff point between fixedpoint and fullfloatingpoint architectures.
Variable Streaming FFTs
If you select the fixedpoint data representation, the FFT variation uses a radix 2^{2} single delay feedback, which is fully pipelined. If you select the floating point representation, the FFT variation uses a mixed radix4/2. For a length N transform, log_{4}(N) stages are concatenated together. The radix 2^{2} algorithm has the same multiplicative complexity of a fully pipelined radix4 FFT, but the butterfly unit retains a radix2 FFT. The radix4/2 algorithm combines radix4 and radix2 FFTs to achieve the computational advantage of the radix4 algorithm while supporting FFT computation with a wider range of transform lengths. The butterfly units use the DIF decomposition.
Fixed point representation allows for natural word growth through the pipeline. The maximum growth of each stage is 2 bits. After the complex multiplication the data is rounded down to the expanded data size using convergent rounding. The overall bit growth is less than or equal to log_{2}(N)+1.
The floating point internal data representation is singleprecision floatingpoint (32bit, IEEE 754 representation). Floatingpoint operations provide more precise computation results but are costly in hardware resources. To reduce the amount of logic required for floating point operations, the variable streaming FFT uses fused floating point kernels. The reduction in logic occurs by fusing together several floating point operations and reducing the number of normalizations that need to occur.
FixedPoint Variable Streaming FFTs
Log_{2}(N) stages each containing a single butterfly unit and a feedback delay unit that delays the incoming data by a specified number of cycles, halved at every stage. These delays effectively align the correct samples at the input of the butterfly unit for the butterfly calculations. Every second stage contains a modified radix2 butterfly whereby a trivial multiplication by j is performed before the radix2 butterfly operations.
The following scheduled operations occur in the pipeline for an FFT of length N = 16.
 For the first 8 clock cycles, the samples are fed unmodified through the butterfly unit to the delay feedback unit.
 The next 8 clock cycles perform the butterfly calculation using the data from the delay feedback unit and the incoming data. The higher order calculations are sent through to the delay feedback unit while the lower order calculations are sent to the next stage.
 The next 8 clock cycles feed the higher order calculations stored in the delay feedback unit unmodified through the butterfly unit to the next stage.
Subsequent data stages use the same principles. However, the delays in the feedback path are adjusted accordingly.
FloatingPoint Variable Streaming FFTs
The FFT has ceiling(log _{4} (N)) stages. If transform length is an integral power of four, a radix4 FFT implements all of the log _{4} (N) stages. If transform length is not an integral power of four, the FFT implements ceiling(log _{4} (N)) – 1 of the stages in a radix4, and implements the remaining stage using a radix2.
Each stage contains a single butterfly unit and a feedback delay unit. The feedback delay unit delays the incoming data by a specified number of cycles; in each stage the number of cycles of delay is one quarter of the number of cycles of delay in the previous stage. The delays align the butterfly input samples correctly for the butterfly calculations. The output of the pipeline is in indexreversed order.
FFT Processor Engines
QuadOutput FFT Engine
The FFT reads complex data samples x[k,m] from internal memory in parallel and reorders by switch (SW). Next, the radix4 butterfly processor processes the ordered samples to form the complex outputs G[k,m]. Because of the inherent mathematics of the radix4 DIF decomposition, only three complex multipliers perform the three nontrivial twiddlefactor multiplications on the outputs of the butterfly processor. To discern the maximum dynamic range of the samples, the blockfloating point units (BFPU) evaluate the four outputs in parallel. The FFT discards the appropriate LSBs and rounds and reorders the complex values before writing them back to internal memory.
SingleOutput FFT Engine
I/O Data Flow
Streaming FFT
The streaming FFT generates a design with a quad output FFT engine and the minimum number of parallel FFT engines for the required throughput.
A single FFT engine provides enough performance for up to a 1,024point streaming I/O data flow FFT.
Using the Streaming FFT
When the final sample loads, the source asserts sink_eop and sink_valid for the last data transfer.
Changing the Direction on a BlockbyBlock Basis
When the FFT completes the transform of the input block, it asserts source_valid and outputs the complex transform domain data block in natural order. The FFT function asserts source_sop to indicate the first output sample.
After N data transfers, the FFT asserts source_eop to indicate the end of the output data block
Enabling the Streaming FFT
 You must assert the sink_valid signal for the FFT to assert source_valid (and a valid data output).
 To extract the final frames of data from the FFT, you need to provide several frames where the sink_valid signal is asserted and apply the sink_sop and sink_eop signals in accordance with the AvalonST specification.
Variable Streaming
Changing Block Size
fftpts  Transform Size 

10000000000  1,024 
01000000000  512 
00100000000  256 
00010000000  128 
00001000000  64 
Changing Direction
When the FFT completes the transform of the input block, it asserts source_valid and outputs the complex transform domain data block. The FFT function asserts the source_sop to indicate the first output sample. The order of the output data depends on the output order that you select in IP Toolbench. The output of the FFT may be in natural order order.
I/O Order
Order  Description 

Natural order  The FFT requires the order of the input samples to be sequential (1, 2 …, n – 1, n) where n is the size of the current transform. 
Digit Reverse Order  The FFT requires the input samples to be in digitreversed order. 
–N/2 to N/2  The FFT requires the input samples to be in the order –N/2 to (N/2) – 1 (also known as DCcentered order) 
Similarly the output order specifies the order in which the FFT generates the output. Whether you can select Bit Reverse Order or Digit Reverse Order depends on your Data Representation (Fixed Point or Floating Point). If you select Fixed Point, the FFT variation implements the radix22 algorithm and the reverse I/O order option is Bit Reverse Order. If you select Floating Point, the FFT variation implements the mixed radix4/2 algorithm and the reverse I/O order option is Digit Reverse Order.
For sample digitreversed order, if n is a power of four, the order is radix4 digitreversed order, in which twobit digits in the sample number are units in the reverse ordering. For example, if n = 16, sample number 4 becomes the second sample in the sample stream (by reversal of the digits in 0001, the location in the sample stream, to 0100). However, in mixed radix4/2 algorithm, n need not be a power of four. If n is not a power of four, the twobit digits are grouped from the least significant bit, and the most significant bit becomes the least significant bit in the digitreversed order. For example, if n = 32, the sample number 18 (10010) in the natural ordering becomes sample number 17 (10001) in the digitreversed order.
Enabling the Variable Streaming FFT
 Assert sink_valid.
 Transfer valid data to the FFT. The FFT processes data.
FFT Behavior When sink_valid is Deasserted
Dynamically Changing the FFT Size
I/O Order
If the FFT operates in engineonly mode, the output data is available after approximately N + latency clocks cycles after the first sample was input to the FFT. Latency represents a small latency through the FFT core and depends on the transform size. For engine with bitreversal mode, the output is available after approximately 2N + latency cycles.
Buffered Burst
Enabling the Buffered Burst FFT
When the FFT completes the transform of the input block, it asserts the source_valid and outputs the complex transform domain data block in natural order .
Signals source_sop and source_eop indicate the startofpacket and endofpacket for the output block data respectively.
 Deassert the system reset.
 Asserts sink_valid to indicate to the FFT function that valid data is available for input. A successful data transfer occurs when both the sink_valid and the sink_ready are asserted.
 Load the first complex data sample into the FFT function and simultaneously asserts sink_sop to indicate the start of the input block.
 On the next clock cycle, sink_sop is deasserted and you must load the following N – 1 complex input data samples in natural order.
 On the last complex data sample, assert sink_eop.
 When you load the input block, the FFT function begins computing the transform on the stored input block. Hold the sink_ready signal high as you can transfer the first few samples of the subsequent frame into the small FIFO at the input. If this FIFO buffer is filled, the FFT deasserts the sink_ready signal. It is not mandatory to transfer samples during sink_ready cycles.
FFT Buffered Burst Data Flow Simulation Waveform
Burst
In a burst I/O data flow FFT, the FFT can process a single input block only. A small FIFO buffer at the sink of the block and sink_ready is not deasserted until this FIFO buffer is full. You can provide a small number of additional input samples associated with the subsequent input block. You don’t have to provide data to the FFT during sink_ready cycles. The burst FFT can load the rest of the subsequent FFT frame only when the previous transform is fully unloaded.
FFT IP Core Parameters
Parameter  Value  Description 

Transform Length  64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, or 65536. Variable streaming also allows 8, 16, 32, 131072, and 262144.  The transform length. For variable streaming, this value is the maximum FFT length. 
Transform Direction  Forward, reverse, bidirectional  The transform direction. 
I/O Data Flow  Streaming Variable Streaming Buffered Burst Burst  If you select Variable Streaming and Floating Point, the precision is automatically set to 32, and the reverse I/O order options are Digit Reverse Order. 
I/O Order  Bit Reverse Order, Digit Reverse Order, Natural Order, N/2 to N/2  The input and output order for data entering and leaving the FFT (variable streaming FFT only). The Digit Reverse Order option replaces the Bit Reverse Order in variable streaming floating point variations. 
Data Representation  Fixed point or single floating point, or block floating point  The internal data representation type (variable streaming FFT only), either fixed point with natural bitgrowth or single precision floating point. Floatingpoint bidirectional IP cores expect input in natural order for forward transforms and digit reverse order for reverse transforms. The output order is digit reverse order for forward transforms and natural order for reverse transforms. 
Data Width  8, 10, 12, 14, 16, 18, 20, 24, 28, 32  The data precision. The values 28 and 32 are available for variable streaming only. 
Twiddle Width  8, 10, 12, 14, 16, 18, 20, 24, 28, 32  The twiddle precision. The values 28 and 32 are available for variable streaming only. Twiddle factor precision must be less than or equal to data precision. 
The FFT IP core's advanced parameters.
Parameter  Value  Description 

FFT Engine Architecture  Quad Output, Single Output  Choose between one,
two, and four quadoutput FFT engines working in parallel. Alternatively, if
you have selected a singleoutput FFT engine architecture, you may choose to
implement one or two engines in parallel. Multiple parallel engines reduce
transform time at the expense of device resources, which allows you to
select the desired area and throughput tradeoff point. Not available for variable streaming or streaming FFTs. 
Number of Parallel FFT Engines  1, 2, 4  
DSP Block Resource Optimization  On or Off  Turn on for multiplier structure optimizations. These optimizations use different DSP block configurations to pack multiply operations and reduce DSP resource requirements. This optimization may reduce F_{MAX} because of the structure of the specific configurations of the DSP blocks when compared to the basic operation. Specifically, on Stratix V devices, this optimization may also come at the expense of accuracy. You can evaluate it using the MATLAB model provided and bit wise accurate simulation models. If you turn on DSP Block Resource Optimization and your variation has data precision between 18 and 25 bits, inclusive, and twiddle precision less than or equal to 18 bits, the FFT MegaCore function configures the DSP blocks in complex 18 x 25 multiplication mode. 
Enable Hard Floating Point Blocks  On or off  For Arria 10 devices and singlefloatingpoint FFTs only. 
FFT IP Core Interfaces and Signals
The FFT MegaCore function has a READY_LATENCY value of zero.
AvalonST Interfaces in DSP IP Cores
The input interface is an AvalonST sink and the output interface is an AvalonST source. The AvalonST interface supports packet transfers with packets interleaved across multiple channels.
AvalonST interface signals can describe traditional streaming interfaces supporting a single stream of data without knowledge of channels or packet boundaries. Such interfaces typically contain data, ready, and valid signals. AvalonST interfaces can also support more complex protocols for burst and packet transfers with packets interleaved across multiple channels. The AvalonST interface inherently synchronizes multichannel designs, which allows you to achieve efficient, timemultiplexed implementations without having to implement complex control logic.
AvalonST interfaces support backpressure, which is a flow control mechanism where a sink can signal to a source to stop sending data. The sink typically uses backpressure to stop the flow of data when its FIFO buffers are full or when it has congestion on its output.
FFT IP Core AvalonST Signals
Signal Name  Direction  AvalonST Type  Size  Description 

clk  Input  clk  1  Clock signal that clocks all internal FFT engine components. 
reset_n  Input  reset_n  1  Activelow asynchronous reset signal.This signal can be asserted asynchronously, but must remain asserted at least one clk clock cycle and must be deasserted synchronously with clk. 
sink_eop  Input  endofpacket  1  Indicates the end of the incoming FFT frame. 
sink_error  Input  error  2  Indicates an error has occurred in an upstream module, because of an illegal usage of the AvalonST protocol. The following errors are defined:

sink_imag  Input  data  data precision width  Imaginary input data, which represents a signed number of data precision bits. 
sink_ready  Output  ready  1  Asserted by the FFT engine when it can accept data. It is not mandatory to provide data to the FFT during ready cycles. 
sink_real  Input  data  data precision width  Real input data, which represents a signed number of data precision bits. 
sink_sop  Input  startofpacket  1  Indicates the start of the incoming FFT frame. 
sink_valid  Input  valid  1  Asserted when data on the data bus is valid. When sink_valid and sink_ready are asserted, a data transfer takes place.. 
sink_data  Input  data  Variable 
In Qsys systems, this AvalonSTcompliant data bus includes all the AvalonST input data signals from MSB to LSB:

source_eop  Output  endofpacket  1  Marks the end of the outgoing FFT frame. Only valid when source_valid is asserted. 
source_error  Output  error  2  Indicates an error has occurred either in an upstream module or within the FFT module (logical OR of sink_error with errors generated in the FFT). 
source_exp  Output  data  6  Streaming, burst, and buffered burst FFTs only. Signed block exponent: Accounts for scaling of internal signal values during FFT computation. 
source_imag  Output  data  (data precision width + growth)  Imaginary output data. For burst, buffered burst, streaming, and variable streaming floating point FFTs, the output data width is equal to the input data width. For variable streaming fixed point FFTs, the size of the output data is dependent on the number of stages defined for the FFT and is 2 bits per radix 2^{2} stage. 
source_ready  Input  ready  1  Asserted by the downstream module if it is able to accept data. 
source_real  Output  data  (data precision width + growth)  Real output data. For burst, buffered burst, streaming, and variable streaming floating point FFTs, the output data width is equal to the input data width. For variable streaming fixed point FFTs, the size of the output data is dependent on the number of stages defined for the FFT and is 2 bits per radix 2^{2} stage. Variable streaming fixed point FFT only. Growth is log_{2}(N)+1. 
source_sop  Output  startofpacket  1  Marks the start of the outgoing FFT frame. Only valid when source_valid is asserted. 
source_valid  Output  valid  1  Asserted by the FFT when there is valid data to output. 
source_data  Output  data  Variable 
In Qsys systems, this AvalonSTcompliant data bus includes all the AvalonST output data signals from MSB to LSB:

Component Specific Signals
Signal Name  Direction  Size  Description 

fftpts_in  Input  log_{2}(maximum number of points)  The number of points in this FFT frame. If you do not specify this value, the FFT can not be a variable length. The default behavior is for the FFT to have fixed length of maximum points. Only sampled at SOP. Always drive fftpts_in even if you are not dynamically changing the block size. For a fixed implementation drive it to match the transform length in the parameter editor. 
fftpts_out  Output  log_{2}(maximum number of points)  The number of points in this FFT frame synchronized to the AvalonST source interface. Variable streaming only. 
inverse  Input  1  Inverse FFT calculated if asserted. Only sampled at SOP. 
Incorrect usage of the AvalonST interface protocol on the sink interface results in a error on source_error. Table 3–8 defines the behavior of the FFT when an incorrect AvalonST transfer is detected. If an error occurs, the behavior of the FFT is undefined and you must reset the FFT with reset_n.
Error  source_error  Description 

Missing SOP  01  Asserted when valid goes high, but there is no start of frame. 
Missing EOP  10  Asserted if the FFT accepts N valid samples of an FFT frame, but there is no EOP signal. 
Unexpected EOP  11  Asserted if EOP is asserted before N valid samples are accepted. 
Block Floating Point Scaling
In fixedpoint FFTs, the data precision needs to be large enough to adequately represent all intermediate values throughout the transform computation. For large FFT transform sizes, an FFT fixedpoint implementation that allows for word growth can make either the data width excessive or can lead to a loss of precision.
Floatingpoint FFTs represents each number as a mantissa with an individual exponent. The improved precision is offset by demand for increased device resources.
In a blockfloating point FFT, all of the values have an independent mantissa but share a common exponent in each data block. Data is input to the FFT function as fixed point complex numbers (even though the exponent is effectively 0, you do not enter an exponent).
The blockfloating point FFT ensures full use of the data width within the FFT function and throughout the transform. After every pass through a radix4 FFT, the data width may grow up to log_{2} (42) = 2.5 bits. The data scales according to a measure of the block dynamic range on the output of the previous pass. The FFT accumulates the number of shifts and then outputs them as an exponent for the entire block. This shifting ensures that the minimum of least significant bits (LSBs) are discarded prior to the rounding of the postmultiplication output. In effect, the blockfloating point representation is as a digital automatic gain control. To yield uniform scaling across successive output blocks, you must scale the FFT function output by the final exponent.
In comparing the blockfloating point output of the FFT MegaCore function to the output of a full precision FFT from a tool like MATLAB, you must scale the output by 2 (–exponent_out) to account for the discarded LSBs during the transform.
Unlike an FFT block that uses floating point arithmetic, a blockfloatingpoint FFT block does not provide an input for exponents. Internally, the IP core represents a complex value integer pair with a single scale factor that it typically shares among other complex value integer pairs. After each stage of the FFT, the IP core detects the largest output value and scales the intermediate result to improve the precision. The exponent records the number of left or right shifts to perform the scaling. As a result, the output magnitude relative to the input level is:
output*2exponent
For example, if exponent = –3, the input samples are shifted right by three bits, and hence the magnitude of the output is output*23.
After every pass through a radix2 or radix4 engine in the FFT core, the addition and multiplication operations cause the data bits width to grow. In other words, the total data bits width from the FFT operation grows proportionally to the number of passes. The number of passes of the FFT/IFFT computation depends on the logarithm of the number of points.
A fixedpoint FFT needs a huge multiplier and memory block to accommodate the large bit width growth to represent the high dynamic range. Though floatingpoint is powerful in arithmetic operations, its power comes at the cost of higher design complexity such as a floatingpoint multiplier and a floatingpoint adder. BFP arithmetic combines the advantages of floatingpoint and fixedpoint arithmetic. BFP arithmetic offers a better signaltonoise ratio (SNR) and dynamic range than does floatingpoint and fixedpoint arithmetic with the same number of bits in the hardware implementation.
In a blockfloatingpoint FFT, the radix2 or radix4 computation of each pass shares the same hardware, the IP core reads the data from memory, passes ti through the engine, and writtes back to memory. Before entering the next pass, the IP core shifts each data sample right (an operation called "scaling") if the addition and multiplication operations produce a carryout bit. The IP core bases the number of bits that it shifts on the difference in bit growth between the data sample and the maximum data sample it detect in the previous stage. The IP core records the maximum bit growth in the exponent register. Each data sample now shares the same exponent value and data bit width to go to the next core engine. You can reuse the same core engine without incurring the expense of a larger engine to accommodate the bit growth.
The output SNR depends on how many bits of right shift occur and at what stages of the radix core computation they occur. Tthat the signaltonoise ratio is data dependent and you need to know the input signal to compute the SNR.
Possible Exponent Values
P = ceil{log_{4}N}, where N is the transform length
R = 0 if log_{2}N is even, otherwise R = 1
Single output range = (–3P+R, P+R–4)
Quad output range = (–3P+R+1, P+R–7)
These equations translate to the values in Table A–1.
N  P  Single Output Engine  Quad Output Engine  

Max ^{ }(2)  Min ^{ }(2)  Max ^{ }(2)  Min ^{ }(2)  
64  3  –9  –1  –8  –4 
128  4  –11  1  –10  –2 
256  4  –12  0  –11  –3 
512  5  –14  2  –13  –1 
1,024  5  –15  1  –14  –2 
2,048  6  –17  3  –16  0 
4,096  6  –18  2  –17  –1 
8,192  7  –20  4  –19  1 
16,384  7  –21  3  –20  0 
Note to
Table A–1
:

Implementing Scaling
 Determine the length of the resulting full scale dynamic range storage register. To get the length, add the width of the data to the number of times the data is shifted. For example, for a 16bit data, 256point Quad Output FFT/IFFT with Max = –11 and Min = –3. The Max value indicates 11 shifts to the left, so the resulting full scaled data width is 16 + 11, or 27 bits.
 Map the output data to the appropriate location within the expanded dynamic range register based upon the exponent value. To continue the above example, the 16bit output data [15..0] from the FFT/IFFT is mapped to [26..11] for an exponent of –11, to [25..10] for an exponent of –10, to [24..9] for an exponent of –9, and so on.
 Sign extend the data within the full scale register.
Example of Scaling
case (exp) 6'b110101 : //11 Set data equal to MSBs begin full_range_real_out[26:0] <= {real_in[15:0],11'b0}; full_range_imag_out[26:0] <= {imag_in[15:0],11'b0}; end 6'b110110 : //10 Equals left shift by 10 with sign extension begin full_range_real_out[26] <= {real_in[15]}; full_range_real_out[25:0] <= {real_in[15:0],10'b0}; full_range_imag_out[26] <= {imag_in[15]}; full_range_imag_out[25:0] <= {imag_in[15:0],10'b0}; end 6'b110111 : //9 Equals left shift by 9 with sign extension begin full_range_real_out[26:25] <= {real_in[15],real_in[15]}; full_range_real_out[24:0] <= {real_in[15:0],9'b0}; full_range_imag_out[26:25] <= {imag_in[15],imag_in[15]}; full_range_imag_out[24:0] <= {imag_in[15:0],9'b0}; end . . . endcase
In this example, the output provides a full scale 27bit word. You must choose how many and which bits must be carried forward in the processing chain. The choice of bits determines the absolute gain relative to the input sample level.
Figure A–1 on page A–5 demonstrates the effect of scaling for all possible values for the 256point quad output FFT with an input signal level of 0x5000. The output of the FFT is 0x280 when the exponent = –5. The figure illustrates all cases of valid exponent values of scaling to the full scale storage register [26..0]. Because the exponent is –5, you must check the register values for that column. This data is shown in the last two columns in the figure. Note that the last column represents the gain compensated data after the scaling (0x0005000), which agrees with the input data as expected. If you want to keep 16 bits for subsequent processing, you can choose the bottom 16 bits that result in 0x5000. However, if you choose a different bit range, such as the top 16 bits, the result is 0x000A. Therefore, the choice of bits affects the relative gain through the processing chain.
Because this example has 27 bits of full scale resolution and 16 bits of output resolution, choose the bottom 16 bits to maintain unity gain relative to the input signal. Choosing the LSBs is not the only solution or the correct one for all cases. The choice depends on which signal levels are important. One way to empirically select the proper range is by simulating test cases that implement expected system data. The output of the simulations must tell what range of bits to use as the output register. If the full scale data is not used (or just the MSBs), you must saturate the data to avoid wraparound problems.
Unity Gain in an IFFT+FFT Pair
BFP arithmetic does not provide an input for the exponent, so you must keep track of the exponent from the IFFT block if you are feeding the output to the FFT block immediately thereafter and divide by N at the end to acquire the original signal magnitude.
where:
x0 = Input data to IFFT
X0 = Output data from IFFT
N = number of points
data1 = IFFT output data and FFT input data
data2 = FFT output data
exp1 = IFFT output exponent
exp2 = FFT output exponent
IFFTa = IFFT
FFTa = FFT
Any scaling operation on X0 followed by truncation loses the value of exp1 and does not result in unity gain at x0. Any scaling operation must be done on X0 only when it is the final result. If the intermediate result X0 is first padded with exp1 number of zeros and then truncated or if the data bits of X0 are truncated, the scaling information is lost.
To keep unity gain, you can pas the exp1 value to the output of the FFT block. Alternatively, preserve the full precision of data1×2^{–}exp1 and use this value as input to the FFT block. The second method requires a large size for the FFT to accept the input with growing bit width from IFFT operations. The resolution required to accommodate this bit width in most cases, exceeds the maximum data width supported by the IP core.
Document Revision History
Date  Version  Changes 

2018.05.31  17.1 

2017.11.06  17.1 

2017.01.14  16.1.1 

2016.11.11  16.1  Added note about fftpts signal. 
2016.08.01  16.1S10  Added support for Stratix 10 devices 
2016.05.01  16.0  Added MATLAB simulation flow. 
2015.10.01  15.1  Added more info to sink_data and source_data signals 
2014.12.15  14.1 

August 2014  14.0 Arria 10 Edition 

June 2014  14.0 

November 2013  13.1 

November 2012  12.1  Added support for Arria V GZ devices. 
FFT IP Core User Guide Document Archive
IP Core Version  User Guide 

16.1  FFT IP Core User Guide 
15.1  FFT IP Core User Guide 
15.0  FFT IP Core User Guide 
14.1  FFT IP Core User Guide 