Intel Stratix 10 Variable Precision DSP Blocks User Guide
Intel Stratix 10 Variable Precision DSP Blocks Overview
The variableprecision digital signal processing (DSP) blocks in Intel^{®} Stratix^{®} 10 devices can support fixedpoint arithmetic and singleprecision floatingpoint arithmetic. The Intel^{®} Stratix^{®} 10 DSP blocks provide high design flexibility and are optimized to support highperformance DSP applications.
Features
The Intel^{®} Stratix^{®} 10 fixedpoint arithmetic features include:
 Highperformance, poweroptimized, and fully registered multiplication operations
 18bit and 27bit word lengths
 Two 18 x 19 multipliers or one 27 x 27 multiplier per DSP block
 Builtin addition, subtraction, and 64bit double accumulation register to combine multiplication results
 Cascading 19bit or 27bit and cascading 18bit when preadder is used to form the tapdelay line for filtering applications
 Cascading 64bit output bus to propagate output results from one block to the next block without external logic support
 Hard preadder supported in 18bit and 27bit DSP operation modes for symmetric filters
 Internal coefficient register bank in both 18bit and 27bit modes for filter implementation
 18bit and 27bit systolic finite impulse response (FIR) filters with distributed output adder
 Biased rounding support
 Multiplication, addition, subtraction, multiplyadd, and multiplysubtract
 Multiplication with accumulation capability and a dynamic accumulator reset control
 Multiplication with cascade summation and subtraction capability
 Complex multiplication
 Direct vector dot product
 Systolic vector dot product
 Sequential vector dot product
 Exception handling support using exception flags
Supported Operational Modes in Intel Stratix 10 Devices
VariablePrecision DSP Block Resource  Operation Mode  Supported Operation Instance  PreAdder Support  Coefficient Support  Input Cascade Support  Chainin Support  Chainout Support 

1 variable precision DSP block  Fixedpoint independent 18 x 19 multiplication  2 ^{1}  Yes  Yes  Yes ^{2}  No  No 
Fixedpoint independent 27 x 27 multiplication  1  Yes  Yes  Yes ^{3}  Yes  Yes  
Fixedpoint two 18 x 19 multiplier adder mode  1  Yes  Yes  Yes^{2}  Yes  Yes  
Fixedpoint 18 x 18 multiplier adder summed with 36bit input  1  No  No  No  Yes  Yes  
Fixedpoint 18 x 19 systolic mode  1  Yes  Yes  Yes^{2}  Yes  Yes  
1 variable precision DSP block  Floatingpoint multiplication mode  1  No  No  No  No  Yes 
Floatingpoint adder or subtract mode  1  No  No  No  No  Yes  
Floatingpoint multiplier adder or subtract mode  1  No  No  No  Yes  Yes  
Floatingpoint multiplier accumulate mode  1  No  No  No  No  Yes  
Floatingpoint vector one mode  1  No  No  No  Yes  Yes  
Floatingpoint vector two mode  1  No  No  No  Yes  Yes  
2 Variable precision DSP blocks  Fixedpoint complex 18x19 multiplication  1  No  No  No  No  No 
4 Variable precision DSP blocks  Floatingpoint complex multiplication  1  No  No  No  No  No 
VariablePrecision DSP Block Resource  Operation Mode  Dynamic ACCUMULATE  Dynamic LOADCONST  Dynamic SUB  Dynamic NEGATE 

1 variable precision DSP block  Fixedpoint independent 18 x 19 multiplication  No  No  No  No 
Fixedpoint independent 27 x 27 multiplication  Yes  Yes  No  Yes  
Fixedpoint two 18 x 19 multiplier adder mode  Yes  Yes  Yes  Yes  
Fixedpoint 18 x 18 multiplier adder summed with 36bit input  Yes  Yes  Yes  Yes  
Fixedpoint 18 x 19 systolic mode  Yes  Yes  Yes  Yes  
Floatingpoint multiplication mode  No  No  No  No  
Floatingpoint adder or subtract mode  No  No  No  No  
Floatingpoint multiplier adder or subtract mode  No  No  No  No  
Floatingpoint multiplier accumulate mode  Yes  No  No  No  
Floatingpoint vector one mode  No  No  No  No  
Floatingpoint vector two mode  No  No  No  No  
2 variable precision DSP blocks  Fixedpoint complex 18 x 19 multiplication  No  No  No  No 
4 Variable precision DSP blocks  Floatingpoint complex multiplication  No  No  No  No 
Resources
Product Line 
Number of Variableprecision DSP Block 
Independent Input and Output Number of Multiplications Operator 
SinglePrecision FloatingPoint Multiplier  SinglePrecision FloatingPoint Adders 
18 x 19 Multiplier Adder Sum Mode 
18 x 18 Multiplier Adder Summed with 36 bit Input 


18 x 19 Multiplier 
27 x 27 Multiplier 

GX 400/ SX 400  648  1,296  648  648  648  648  648 
GX 650/ SX 650  1,152  2,304  1,152  1,152  1,152  1,152  1,152 
GX 850/ SX 850  2,016  4,032  2,016  2,016  2,016  2,016  2,016 
GX 1100/ SX 1100  2,592  5,184  2,592  2,592  2,592  2,592  2,592 
GX 1650/ SX 1650  3,145  6,290  3,145  3,145  3,145  3,145  3,145 
GX 2100/ SX 2100  3,744  7,488  3,744  3,744  3,744  3,744  3,744 
GX 2500/ SX 2500  5,011  10,022  5,011  5,011  5,011  5,011  5,011 
GX 2800/ SX 2800  5,760  11,520  5,760  5,760  5,760  5,760  5,760 
GX 4500/ SX 4500  1,980  3,960  1,980  1,980  1,980  1,980  1,980 
GX 5500/ SX 5500  1,980  3,960  1,980  1,980  1,980  1,980  1,980 
TX 1650  3,326  6,652  3,326  3,326  3,326  3,326  3,326 
TX 2100  3,960  7,920  3,960  3,960  3,960  3,960  3,960 
TX 2500  5,011  10,022  5,011  5,011  5,011  5,011  5,011 
TX 2800  5,760  11,520  5,760  5,760  5,760  5,760  5,760 
MX 1100  2,592  5,184  2,592  2,592  2,592  2,592  2,592 
MX 1650  3,326  6,652  3,326  3,326  3,326  3,326  3,326 
MX 2100  3,960  7,920  3,960  3,960  3,960  3,960  3,960 
Block Architecture Overview
DSP Implementations  Block Architecture 

FixedPoint Arithmetic 

FloatingPoint Arithmetic 

Input Register Bank for FixedPoint and FloatingPoint Arithmetic
FixedPoint Arithmetic  FloatingPoint Arithmetic 



All the registers in the DSP blocks are positiveedge triggered and cleared on power up. Each multiplier operand can feed an input register or a multiplier directly, bypassing the input registers.
 CLK[2..0]
 ENA[2..0]
 CLR[0]
Pipeline Registers for FixedPoint and FloatingPoint Arithmetic
In addition to the input and output registers, there are 2 columns of pipeline registers for fixedpoint arithmetic. Pipeline registers are used to get the maximum Fmax performance. The pipeline registers can be bypassed if high Fmax is not needed.
 CLK[2..0]
 ENA[2..0]
 CLR[1]
Floatingpoint arithmetic has 3 latency layers of pipeline registers. You can bypass all latency layers of the pipeline registers or use any one, two or three layers of pipeline registers.
Preadder for FixedPoint Arithmetic
Each variable precision DSP block has two 19bit preadders. You can configure these preadders in the following configurations:
 18bit (signed or unsigned) addition or 18bit (signed) subtraction for 18 x 19 mode
 26bit addition or subtraction for 27 x 27 mode
For 18 x 19 mode, when both preadders within the same DSP block are used, they must share the same operation type (either addition or subtraction).
Internal Coefficient for FixedPoint Arithmetic
The Intel^{®} Stratix^{®} 10 variable precision DSP block has the flexibility of selecting the multiplicand from either the dynamic input or the internal coefficient.
The internal coefficient can support up to eight constant coefficients for the multiplicands in 18bit and 27bit modes. When you enable the internal coefficient feature, COEFSELA/COEFSELB are used to control the selection of the coefficient multiplexer.
Multipliers for FixedPoint and FloatingPoint Arithmetic
A single variable precision DSP block can perform many multiplications in parallel, depending on the data width of the multiplier and implementation.
There are two multipliers per variable precision DSP block. You can configure these two multipliers in several operational modes:
FixedPoint Arithmetic  FloatingPoint Arithmetic 



Adder or Subtractor for FixedPoint and FloatingPoint Arithmetic
Depending on the operational mode, you can use the adder or subtractor as follows:
 One 38bit adder for fixedpoint arithmetic addition and subtraction between two multipliers within a DSP block.
 One floatingpoint arithmetic single precision adder or subtractor.
Operation  Description  SUB Signal 

Addition  Adds the results of the two multipliers within one DP block.  0 
Subtraction  Subtracts the results between two multipliers within the same DSP block.  1 
The dynamic SUB port is not supported in floatingpoint arithmetic.
Accumulator, Chainout Adder, and Preload Constant for FixedPoint Arithmetic
The Intel^{®} Stratix^{®} 10 variable precision DSP block supports accumulator and adder up to 64 bits for fixedpoint arithmetic.
The following signals can dynamically control the function of the accumulator and the chainout adder:
 NEGATE
 LOADCONST
 ACCUMULATE
The accumulator and chainout adder features are not available in two fixedpoint arithmetic independent 18 x 19 modes.
Function  Description  NEGATE  LOADCONST  ACCUMULATE 

Zeroing  Disables the accumulator.  0  0  0 
Preload  The result is always added to the preload value. Only one bit of the 64bit preload value can be “1”. You can use this function to round the DSP result to any position of the 64bit result.  0  1  0 
Accumulation  Adds the current result to the previous accumulate result.  0  X  1 
Decimation + Accumulation  This function takes the current result, converts it into two’s complement, and adds it to the previous result.  1  X  1 
Decimation + Chainout Adder  This function takes the current result, converts it into two’s complement, and adds it to the output of previous DSP block.  1  0  0 
Systolic Register for FixedPoint Arithmetic
There are two sets of systolic registers per variable precision DSP block and each set supports up to 44 bits chain in and chain out adder. If the variable precision DSP block is not configured in fixedpoint arithmetic systolic FIR mode, both sets of systolic registers are bypassed.
The first set of systolic registers consists of 18bit and 19bit registers that are used to register the 18bit and 19bit inputs of the upper multiplier, respectively.
The second set of systolic registers are used to delay the chainin input from the previous variable precision DSP block.
 The input and output register must be enabled when using systolic registers.
 First and second pipeline registers are optional when using systolic registers. If second pipeline is enabled, use the same clock as the input systolic register.
 The chainin systolic register always has the same clock source as the output register.
 All registers are recommended to use the same clock source to ensure correct systolic operation.
Double Accumulation Register for FixedPoint Arithmetic
The accumulator supports double accumulation by enabling the 64bit double accumulation registers located between the output register bank and the accumulator feedback path.
If the double accumulation register is enabled, an extra clock cycle delay is added into the feedback path of the accumulator.
This register has the same CLK, ENA, and CLR settings as the output register bank.
By enabling this register, you can have two accumulator channels using the same number of variable precision DSP block. This is useful when processing interleaved complex data (I, Q).
Output Register Bank for FixedPoint and FloatingPoint Arithmetic
The positive edge of the clock signal triggers the 74bit bypassable output register bank and is cleared after power up.
The following variable precision DSP block signals control the output register per variable precision DSP block:
 CLK[2..0]
 ENA[2..0]
 CLR[1]
Exception Handling for FloatingPoint Arithmetic
The Intel^{®} Stratix^{®} 10 floatingpoint arithmetic supports exception handling for the multiplier and adder blocks.
Exception Flags  Width  Description 

Multiplication  
mult_overflow  1 
This signal indicates if the multiplier result is a larger value compared to the maximum presentable value. 1: If the multiplier result is a larger value compared to the maximum representable value and the result is cast to infinity. 0: If the multiplier result is not larger than the maximum presentable value. This signal is not available in Adder or Subtract Mode. 
mult_underflow  1 
This signal indicates if the multiplier result is a smaller value compared to the minimum presentable value. 1: If the multiplier result is a smaller value compared to the minimum representable value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. This signal is not available in Adder or Subtract Mode. 
mult_inexact  1 
This signal indicates if the multiplier result is an exact representation. 1: If the multiplier result is:
0: If the multiplier result does not meet any of the criteria above. This signal is not available in Adder or Subtract Mode. 
mult_invalid  1 
This signal indicates if the multiplier operation is illdefined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. This signal is not available in Adder or Subtract Mode. 
Addition  
adder_overflow  1 
This signal indicates if the adder result is a larger value compared to the maximum representable value. 1: If the adder result is a larger value compared to the maximum presentable value and the result is cast to infinity. 0: If the adder result is not larger than the maximum presentable value. This signal is not available in Multiplication Mode. 
adder_underflow  1 
This signal indicates if the adder result is a smaller value compared to the minimum presentable value. 1: If the adder result is a smaller value compared to the minimum representable value and the result is flushed to zero. 0: If the adder result is a larger than the minimum representable value. This signal is not available in Multiplication Mode. 
adder_inexact  1 
This signal indicates if the adder result is an exact representation. 1: If the adder result is:
0: If the adder result does not meet any of the criteria above. This signal is not available in Multiplication Mode. 
adder_invalid  1 
This signal indicates if the adder operation is illdefined and produces an invalid result. 1: If the adder result is invalid and cast to qNaN. 0: If the adder result is not an invalid number. This signal is not available in Multiplication Mode. 
Input A  Input B  Result 
^{4}
Flags Overflow/Underflow/Inexact/Invalid 

Normalized  Normalized  Normalized value  0/0/0/0 
Normalized (rounded) value  0/0/1/0  
Positive/negative infinity value  1/0/1/0  
Subnormal (denormal) value  0/1/1/0  
0 or Subnormal (denormal)  Normalized  0 value  0/0/0/0 
Positive/negative infinity  Normalized  Positive/negative infinity value  0/0/0/0 
Quiet Not A Number (qNaN)  Normalized  qNaN value  0/0/0/0 
0 or Subnormal (denormal)  0 or Subnormal (denormal)  0 value  0/0/0/0 
Positive/negative infinity  0 or Subnormal (denormal)  qNaN value  0/0/0/1 
Quiet Not A Number (qNaN)  0 or Subnormal (denormal)  qNaN value  0/0/0/0 
Positive/negative infinity  Positive/negative Infinity  Positive/negative infinity value  0/0/0/0 
Quiet Not A Number (qNaN)  Positive/negative Infinity  qNaN value  0/0/0/0 
Quiet Not A Number (qNaN)  Quiet Not A Number (qNaN)  qNaN value  0/0/0/0 
Input A  Input B  Result : 
^{4}
Flags Overflow/Underflow/Inexact/Invalid 

Normalized  Normalized  Normalized value  0/0/0/0 
Normalized (rounded) value  0/0/1/0  
Positive/negative infinity value  1/0/1/0  
0 value Sign bit = 0 
0/0/0/0  
Subnormal (denormal) value The sign is preserved 
0/1/1/0  
0 or Subnormal (denormal)  Normalized  Input b  0/0/0/0 
Positive/negative infinity  Normalized  Positive/negative infinity value  0/0/0/0 
Quiet Not A Number (qNaN)  Normalized  qNaN value  0/0/0/0 
0 or Subnormal (denormal)  0 or Subnormal (denormal)  0 value For (0 + (0)) equation, sign bit = 1. For any other equation, sign bit = 0. 
0/0/0/0 
Positive/negative infinity  0 or Subnormal (denormal)  Positive/negative infinity value  0/0/0/0 
Quiet Not A Number (qNaN)  0 or Subnormal (denormal)  qNaN value  0/0/0/0 
Positive/negative infinity  Positive/negative infinity 
qNaN value for invalid cases Positive/negative infinity value for valid cases 
0/0/0/1 for invalid cases 0/0/0/0 for valid cases Valid cases are:

Quiet Not A Number (qNaN)  Positive/negative infinity  qNaN value  0/0/0/0 
Quiet Not A Number (qNaN)  Quiet Not A Number (qNaN)  qNaN value  0/0/0/0 
Operational Mode Descriptions
This section describes how you can configure the Intel^{®} Stratix^{®} 10 variable precision DSP block to efficiently support the fixedpoint arithmetic and floatingpoint arithmetic operational modes.
FixedPoint Arithmetic  FloatingPoint Arithmetic 



Operational Modes for FixedPoint Arithmetic
Independent Multiplier Mode
In independent input and output multiplier mode, the variable precision DSP blocks perform individual multiplication operations for general purpose multipliers.
Configuration  Multipliers per Block 

18 (unsigned) x 18 (unsigned)  2 
18 (signed) x 19 (signed)  2 
27 (signed or unsigned) x 27 (signed or unsigned)  1 
18 × 18 or 18 × 19 Independent Multiplier
The 18 × 18 or 18 × 19 independent multiplier mode uses the following equations:
resulta = ax * ay
resultb = bx * by
In this figure, the variables are defined as follows:
 n = 19 and m = 37 for 18 × 19 signed operands
 n = 18 and m = 36 for 18 × 18 unsigned operands
27 × 27 Independent Multiplier
The 27 x 27 independent multiplier mode uses the equation of resulta = ay * ax.
Multiplier Adder Sum Mode
The multiplier adder sum mode uses the equations:
 resulta = (bx * by) + (ax * ay) to calculate the sum of the two 18 x 19 multiplications.
 resulta = (bx * by)  (ax * ay) to calculate the difference of the two 18 x 19 multiplications.
In this figure, the variable is defined as follows:
 n = 19 for 18 × 19 signed operands
 n = 18 for 18 × 18 unsigned operands
Set the SUB dynamic control signal to high to calculate the difference of the two 18 × 19 multiplications.
Independent Complex Multiplier
The Intel^{®} Stratix^{®} 10 devices support the 18 × 19 complex multiplier mode using two fixedpoint arithmetic multiplier adder sum mode.
The imaginary part [(a × d) + (b × c)] is implemented in the first variableprecision DSP block, while the real part [(a × c)  (b × d)] is implemented in the second variableprecision DSP block.
18 × 19 Multiplication Summed with 36Bit Input Mode
Intel^{®} Stratix^{®} 10 variable precision DSP blocks support one 18 × 19 multiplication summed to a 36bit input.
 resulta = (ax * ay) + by to sum the 18 x 19 multiplication with 36bit input.
 resulta = (ax * ay)  by to subtract the 18 x 19 multiplication with 36bit input.
Use the upper multiplier to provide the input for an 18 × 19 multiplication, while the bottom multiplier is bypassed. The by[17..0] and bx[35..18] signals are concatenated to produce a 36bit input.
Use the SUB dynamic control signal to control the adder to perform addition or subtraction operation.
In this figure, the variable is defined as follows:
 n = 19 for 18 × 19 signed operands
 n = 18 for 18 × 18 unsigned operands
Systolic FIR Mode
The basic structure of a FIR filter consists of a series of multiplications followed by an addition.
Depending on the number of taps and the input sizes, the delay through chaining a high number of adders can become quite large. To overcome the delay performance issue, the systolic form is used with additional delay elements placed per tap to increase the performance at the cost of increased latency.
Intel^{®} Stratix^{®} 10 variable precision DSP blocks support the following systolic FIR structures:
 18bit
 27bit
In systolic FIR mode, the input of the multiplier can come from four different sets of sources:
 Two dynamic inputs
 One dynamic input and one coefficient input
 One coefficient input and one preadder output
 One dynamic input and one preadder output
Mapping Systolic Mode User View to Variable Precision Block Architecture View
The following figure shows implementation of the systolic FIR filter (a) using the Intel^{®} Stratix^{®} 10 variable precision DSP blocks (d) by retiming the register and restructuring the adder. Register B can be retimed into systolic registers at the chainin, ay and ax input paths as shown in (b). The end result of the register retiming is shown in (c). The location of the adder is then restructured to sum both the multipliers output. The adder result is send to chainout adder to sum with the chainin value from the previous DSP block as shown in (d).
18bit Systolic FIR Mode
In 18bit systolic FIR mode, the adders are configured as dual 44bit adders, thereby giving 7 bits of overhead when using an 18 x 19 operation mode, resulting 37bit result. This allows a total sixteen 18 x 19 multipliers or eight Intel^{®} Stratix^{®} 10 variable precision DSP blocks to be cascaded as systolic FIR structure.
27Bit Systolic FIR Mode
In 27bit systolic FIR mode, the chainout adder or accumulator is configured for a 64bit operation, providing 10 bits of overhead when using a 27bit data (54bit products). This allows a total of eleven 27 x 27 multipliers or eleven Intel^{®} Stratix^{®} 10 variable precision DSP blocks to be cascaded as systolic FIR structure.
The 27bit systolic FIR mode allows the implementation of one stage systolic filter per DSP block. Systolic registers are not required in this mode.
Operational Modes for FloatingPoint Arithmetic
Single FloatingPoint Arithmetic Functions
 Multiplication mode
 Adder or subtract mode
 Multiply accumulate mode
Multiplication Mode
This mode allows you to apply basic floatingpoint multiplication equation:
result = ay*az
 mult_invalid
 mult_inexact
 mult_overflow
 mult_underflow
Adder or Subtract Mode
This mode allows you to apply following equations:
result = ax+ay
result = ayax
 adder_invalid
 adder_inexact
 adder_overflow
 adder_underflow
Multiply Accumulate Mode
This mode performs floatingpoint multiplication followed by floatingpoint addition or subtraction with the previous multiplication result.
When ACCUMULATE signal is high, this mode uses the equation of result = (ay*az) +/ previous value.
When ACCUMULATE signal is low, this mode uses the equation of result = (ay*az).
 mult_invalid
 mult_inexact
 mult_overflow
 mult_underflow
 adder_invalid
 adder_inexact
 adder_overflow
 adder_underflow
Multiple FloatingPoint Arithmetic Functions
 Multiplyadd or multiplysubtract mode which uses single floatingpoint arithmetic DSP if the chainin parameter is turn off
 Vector one mode
 Vector two mode
 Direct vector dot product
 Complex multiplication
MultiplyAdd or MultiplySubtract Mode
This mode performs floatingpoint multiplication followed by floatingpoint addition or floatingpoint subtraction. The chainin parameter allows you to enable a multiplechain mode.
Chainin Parameter  MultiplyAdd Mode  MultiplySubtract Mode 

Disable  result = (ay*az) + ax  result = (ay*az)  ax 
Enable  result = (ay*az) + chainin  result = (ay*az)  chainin 
 mult_invalid
 mult_inexact
 mult_overflow
 mult_underflow
 adder_invalid
 adder_inexact
 adder_overflow
 adder_underflow
Vector One Mode
This mode performs floatingpoint multiplication followed by floatingpoint addition or subtraction with the chainin input from the previous variable DSP Block. Input ax is directly fed into chainout.
Chainin Parameter  Vector One with FloatingPoint Addition  Vector One with FloatingPoint Subtraction 

Disable 
result = ay * az Chainout = ax 
result = ay * az Chainout = ax 
Enable 
result = (ay * az) + chainin Chainout = ax 
result = (ay * az)  chainin Chainout = ax 
 mult_invalid
 mult_inexact
 mult_overflow
 mult_underflow
 adder_invalid
 adder_inexact
 adder_overflow
 adder_underflow
Vector Two Mode
This mode performs floatingpoint multiplication where the multiplication result is directly fed to chainout. The chainin input from the previous variable DSP Block is then added or subtracted from input ax as the output result.
Chainin Parameter  Vector Two with FloatingPoint Addition  Vector Two with FloatingPoint Subtraction 

Disable 
result = ax Chainout = ay * az 
result = ax Chainout = ay * az 
Enable 
result = ax + chainin Chainout = ay * az 
result = ax  chainin Chainout = ay * az 
 mult_invalid
 mult_inexact
 mult_overflow
 mult_underflow
 adder_invalid
 adder_inexact
 adder_overflow
 adder_underflow
Direct Vector Dot Product
 Multiplyadd and subtract mode with chainin parameter turned on
 Vector one
 Vector two
Complex Multiplication
The Intel^{®} Stratix^{®} 10 devices support the floatingpoint arithmetic single precision complex multiplier using four Intel^{®} Stratix^{®} 10 variableprecision DSP blocks.
The imaginary part [(a × d) + (b × c)] is implemented in the first two variableprecision DSP blocks, while the real part [(a × c)  (b × d)] is implemented in the next two variableprecision DSP blocks.
Design Considerations
You should consider the following elements in your design:
DSP Functions  Design Elements 

Fixedpoint arithmetic 

Floatingpoint arithmetic 

Internal Coefficient and PreAdder for FixedPoint Arithmetic
In both 18bit and 27bit modes, you can use the coefficient feature and preadder feature independently.
When preadder feature is enabled in 18bit modes, you must enable both top and bottom preadder.
When internal coefficient feature is enabled in 18bit modes, you must enable both top and bottom coefficient.
Accumulator for FixedPoint Arithmetic
The accumulator in the Intel^{®} Stratix^{®} 10 devices supports double accumulation by enabling the 64bit double accumulation registers located between the output register bank and the accumulator.
Chainout Adder
FixedPoint Arithmetic  FloatingPoint Arithmetic 

You can use the output chaining path to add results from another DSP block. Support for all operational modes except for 18 x 18 or 18 x 19 independent multiplier and 27 x 27 independent multiplier modes. 
You can use the output chaining path to add results from another DSP block. Support for certain operation modes:

Input Cascade for FixedPoint Arithmetic
The input register bank in Intel^{®} Stratix^{®} 10 variable precision DSP block supports input cascade feature. This feature provides the capability of cascading the input bus within a DSP block and to another DSP block.
 The top multiplier Y input drives the bottom multiplier Y input within a DSP block
 The bottom multiplier Y input of the first DSP block drives the top multiplier Y input of the subsequent DSP block
For 27 × 27 mode, the multiplier Y input of the first DSP block drives the multiplier Y input of the subsequent DSP block. This feature is not supported with preadder enabled.
There are two delay registers that you can use to balance the latency requirements when you use both the input cascade and chainout features in fixedpoint arithmetic 18 x 19 mode. These are the top delay registers and bottom delay registers. The ay input register must be enabled when top delay register is enabled. The clock source for both registers must be the same. Similarly, the by input register must be enabled when bottom delay register is enabled. The clock source for both registers must be the same.
The delay registers are only supported in 18 x 18 or 18 x 19 independent multiplier, multiplier adder sum mode and 18bit systolic FIR mode.
Intel Stratix 10 Variable Precision DSP Blocks Implementation Guide
The Intel^{®} Quartus^{®} Prime software contains tools for you to create and compile your design, and configure your device.
You can prepare for device migration, set pin assignments, define placement restrictions, setup timing constraints, and customize IP cores using the Intel^{®} Quartus^{®} Prime software.
Native Fixed Point DSP Intel Stratix 10 FPGA IP Core References
The Native Fixed Point DSP Intel^{®} Stratix^{®} 10 FPGA IP core instantiates and controls a single Intel^{®} Stratix^{®} 10 Variable Precision DSP block.
 18 × 18 full mode
 18 × 18 full top mode
 18 × 18 sumof2 mode
 18 × 18 plus 36 mode
 18 × 18 systolic mode
 27 × 27 mode
Supported Operational Modes
Operational Modes  Description 

18 × 18 Full Mode 
This mode operates as two independent 18 (signed) × 19 (signed) or 18 (unsigned) × 18 (unsigned) multipliers with 37bit output. This mode applies the following equations:

18 × 18 Full Top Mode  This mode operates as a single 18 (signed) x 19(signed) or 18
(unsigned) x 18 (unsigned) multiplier with 37bit output. This
mode applies the following equation:

18 × 18 Sum of Two Mode  This mode operates as sum of two 18 × 19 multiplication. This mode applies
the equations of:
The resulta output bus can support up to 64 bits when you enable accumulator or chainout adder. 
18 × 18 Plus 36 Mode 
This mode operates as one 18 × 19 multiplication summed to a 36bit input. This mode applies the equation of resulta = (ax * ay) + (bx, by). When the input bus is less than 36bit in this mode, you are required to provide the necessary signed extension to fill up the 36bit input. When you enable the accumulator, the resulta output bus can support up to 64 bits. 
18 × 18 Systolic Mode 
This mode operates as 18bit systolic FIR. Enable the input systolic register and the output register when using this operational mode. When you enable the chainoout adder, the chainout and chainin width can support up to 44 bits. When you enable the accumulator, the resulta output bus can support up to 64 bits. 
27 × 27 Mode 
This mode operates as one independent 27(signed/unsigned) × 27(signed/unsigned) multiplier. This mode applies the equation of resulta = ax * ay. The resulta output bus can support up to 64 bits when you enable accumulator or chainout adder. 
Maximum Input Data Width for FixedPoint Arithmetic
Operation Mode  Maximum Input Data Width  

ax  ay  az  bx  by  bz  COEFSELA  COEFSELB  
Without Preadder or Internal Coefficient  
m18×18_full 
18 (signed) 18 (unsigned) 
19 (signed) 18 (unsigned) 
Not used 
18 (signed) 18 (unsigned) 
19 (signed) 18 (unsigned) 
Not used  Not used  Not used 
m18x18_full_top 
18 (signed) 18 (unsigned) 
19 (signed) 18 (unsigned) 
Not used  Not used  Not used  Not used  Not used  Not used 
m18×18_sumof2 
18 (signed) 18 (unsigned) ^{6} 
19 (signed) 18 (unsigned) 
Not used 
18 (signed) 18 (unsigned)^{6} 
19 (signed) 18 (unsigned) 
Not used  Not used  Not used 
m18×18_systolic 
18 (signed) 18 (unsigned)^{6} 
19 (signed) 18 (unsigned) 
Not used 
18 (signed) 18 (unsigned)^{6} 
19 (signed) 18 (unsigned) 
Not used  Not used  Not used 
m18×18_plus36 
18 (signed) 18 (unsigned) 
19 (signed) 18 (unsigned) 
Not used 
18 (signed) 18 (unsigned) 
18 (unsigned) ^{7} 
Not used  Not used  Not used 
m27×27 
27 (signed) 27 (unsigned) ^{8} 
27 (signed) 27 (unsigned) 
Not used  Not used  Not used  Not used  Not used  Not used 
With Preadder Feature Only  
m18×18_full 
18 (signed) 18 (unsigned) 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
18 (signed) 18 (unsigned) 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used  Not used 
m18x18_full_top 
18 (signed) 18 (unsigned) 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used  Not used  Not used  Not used  Not used 
m18×18_sumof2 
18 (signed) 18 (unsigned)^{6} 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
18 (signed) 18 (unsigned)^{6} 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used  Not used 
m18×18_systolic 
18 (signed) 18 (unsigned)^{6} 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
18 (signed) 18 (unsigned)^{6} 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used  Not used 
m27×27 
27 (signed) 27 (unsigned)^{8} 
26 (signed) 26 (unsigned) 
26 (signed) 26 (unsigned) 
Not used  Not used  Not used  Not used  Not used 
With Internal Coefficient Feature Only  
m18×18_full  Not used 
19 (signed) 18 (unsigned) 
Not used  Not used 
19 (signed) 18 (unsigned) 
Not used  3  3 
m18x18_full_top  Not used 
19 (signed) 18 (unsigned) 
Not used  Not used  Not used  Not used  3  Not used 
m18×18_sumof2  Not used 
19 (signed) 18 (unsigned) 
Not used  Not used 
19 (signed) 18 (unsigned) 
Not used  3  3 
m18×18_systolic  Not used 
19 (signed) 18 (unsigned) 
Not used  Not used 
19 (signed) 18 (unsigned) 
Not used  3  3 
m27×27  Not used 
27 (signed) 27 (unsigned) 
Not used  Not used  Not used  Not used  3  Not used 
With Preadder and Internal Coefficient Features  
m18×18_full  Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
3  3 
m18x18_full_top  Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used  Not used  Not used  3  Not used 
m18×18_sumof2  Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
3  3 
m18×18_systolic  Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
Not used 
18 (signed) 17 (unsigned) 
18 (signed) 17 (unsigned) 
3  3 
m27×27  Not used 
26 (signed) 26 (unsigned) 
26 (signed) 26 (unsigned) 
Not used  Not used  Not used  3  Not used 
Using Less Than 36Bit Operand In 18 x 18 Plus 36 Mode Example
This example shows how to configure the Native Fixed Point DSP Intel^{®} Stratix^{®} 10 FPGA IP core to use 18 × 18 Plus 36 operational mode with a signed 12bit input data of 101010101010 (binary) instead of a 36bit operand.
 Set Representation format for bottom multiplier x operand to signed.
 Set Representation format for bottom multiplier y operand to unsigned.
 Set 'bx' input bus width to 18.
 Set 'by' input bus width to 18.

Provide
18bit
signed representation
data,
example,'111111111111111111',
to bx input bus.
This step is to perform sign extension. The initial 12 bits input is extended to 36 bits with bx representing the most significant 18 bits.
 Provide data 18bit signed representation data, example, '111111101010101010', to by input bus.
Parameterizing Native Fixed Point DSP IP Core
 In Intel^{®} Quartus^{®} Prime Pro Edition, create a new project that targets a Intel^{®} Stratix^{®} 10 device.

In IP Catalog, click Library > DSP > Primitive DSP >
Native Fixed Point DSP
.
The Native Fixed Point DSP IP Core IP parameter editor opens.
 In the New IP Variation dialog box, enter an Entity Name and click OK.
 Under Parameters, select the operation mode, multiplier configuration, clear signal, port width, and internal coefficient configurations according to the variant of your IP core
 In the DSP Block View, switch the clock of each valid register.
 Click the input and output ports in the GUI to select your desired inputs and outputs.
 Click the Preadder symbols in the GUI to select addition or subtraction.
 Click the Top delay register Bottom delay register and symbols in the GUI to enable the delay registers.
 Click the multiplexer symbols in the GUI to enable the preadder modules and the internal coefficient modules.
 Click the clken port symbols to create clock enable signal for each valid register.
 Click the clr port symbols to create clear signal for each valid register.
 Click Generate HDL.
 Click Finish.
Native Fixed Point DSP Intel Stratix 10 FPGA IP Parameters
Parameter  IP Generated Parameter  Value  Default Value  Description 

Operation Mode  
Select the Operation Mode  operation_mode 
m18×18_full m18×18_full_top m18×18_sumof2 m18×18_plus36 m18×18_systolic m27×27 
m18×18_full  Select the desired operational mode. 
Multiplier Configuration  
Representation format for AX input bus  signed_max 
signed unsigned 
unsigned  Specify the representation format for the top multiplier x operand. 
Representation format for AY/AZ input buses  signed_may 
signed unsigned 
unsigned 
Specify the representation format for the top multiplier y operand. 
Representation format for BX input bus  signed_mbx 
signed unsigned 
unsigned  Specify the representation format for the bottom multiplier x operand. 
Representation format for BY/BZ input buses  signed_mby 
signed unsigned 
unsigned  Specify the representation format for the bottom
multiplier y operand. Always select unsigned for m18×18_plus36 . 
Clear Signal Setting  
Type of clear signal  clear_type 
none aclr sclr 
none 
Select aclr to use asynchronous clear signal type for all registers. Select sclr to use synchronous clear signal type for all registers. 
Port Width Setting  
How wide should AX input bus be?  ax_width  1–27  18  Specify the width of ax input bus. Refer to Maximum Input Data Width for FixedPoint Arithmetic. 
How wide should BX input bus be?  bx_width  1–18  18  Specify the width of bx input bus. Set this parameter to 0 when using m18x18_full_top mode. Refer to Maximum Input Data Width for FixedPoint Arithmetic. 
How wide should AY input bus be?  ay_scan_in_width  1–27  18  Specify the width of ay or scanin input
bus. Refer to Maximum Input Data Width for FixedPoint Arithmetic. 
How wide should BY input bus be?  by_width  1–19  18  Specify the width of by input bus. Set this parameter to 0 when using m18x18_full_top mode. Refer to Maximum Input Data Width for FixedPoint Arithmetic. 
How wide should AZ input bus be?  az_width  018  0  Specify the width of az input bus. Refer to Maximum Input Data Width for FixedPoint Arithmetic. 
How wide should BZ input bus be?  bz_width  0–18  0  Specify the width of bz input bus. Set this parameter to 0 when using m18x18_full_top mode. Refer to Maximum Input Data Width for FixedPoint Arithmetic. 
How wide should result A width?  result_a_width  1–64  37  Specify the width of resulta output bus. 
How wide should result B width?  result_b_width  1–37  37  Specify the width of resultb output bus. This parameter is supported only in m18x18_full mode. 
How wide should result scanout port (1)  scan_out_width  1–27  0  Specify the width of scanout output bus. 
Parameter  Value  Default Value  Description 

loadconst 
Disable Enable 
Disable 
Click the port symbol to enable loadconst port and its input register. 
accumulate port (2) 
Disable Enable 
Disable  Click the port symbol to enable accumlate port and its input register. 
negate port (3) 
Disable Enable 
Disable  Click the port symbol to enable negate port and its input register. 
sub port (4) 
Disable Enable 
Disable  Click the port symbol to enable sub port and its input register. 
Top delay register (5) 
Disable Enable 
Disable  Click to enable the top delay register for
ay input bus. This feature is not supported in m18×18_plus36 and m27x27 operational mode. 
Bottom delay register (6) 
Disable Enable 
Disable  Click to enable bottom delay register for
by input bus. This feature is not supported in m18×18_plus36, m18x18_top_full, and m27x27 operational mode. 
Scanout output bus (7) 
Disable Enable 
Disable  Click to enable scanout output bus. 
Input cascade for ay input (8) 
Disable Enable 
Disble 
Click to enable input cascade module for ay input. When you enable input cascade module, the Stratix 10 Native Fixed Point DSP IP core uses the scanin input signals as input instead of ay input signal. 
Input cascade for by input (9) 
Disable Enable 
Disable 
Click to enable input cascade module for by input. When you enable input cascade module, the Stratix 10 Native Fixed Point DSP IP core uses the ay input signals as input instead of by input signal. 
Register clock (10) 
None Clock 0 Clock 1 Clock 2 
Clock 0 
To bypass any register, switch the register clock to None. Switch the register clock to:

Top preadder (11) 
Disable Enable 
Disable 
Click to enable top preadder module. This uses az input bus as one of the operand source. To use preadder feature, both top and bottom preadder modules must be enabled. 
Top Preadder operation (12) 
+  
+  Click to switch the operation of top preadder between addition and subtraction. 
Top coefficient module (13) 
Disable Enable 
Disable 
Click to enable top internal coefficient module. To use internal coefficient feature, both top and bottom internal coefficient modules must be enabled. 
Bottom preadder (14) 
Disable Enable 
Disable 
Click to enable bottom preadder module. This uses bz input bus as one of the operand source. To use preadder feature, both top and bottom preadder modules must be enabled. 
Bottom coefficient module (15) 
Disable Enable 
Disable 
Click to enable bottom internal coefficient module. To use internal coefficient feature, both top and bottom internal coefficient modules must be enabled. 
Bottom Preadder operation (16) 
+  
+  Click to switch the operation of bottom preadder between addition and subtraction. 
Chainin input bus (17) 
Disable Enable 
Disable  Click to enable Chainin input bus. 
Clock enable for clock 0 (18) 
Disable Enable 
Disable  Click to create clock enable signal for clock 0. 
Clock enable for clock 1 (19) 
Disable Enable 
Disable  Click to create clock enable signal for clock 1. 
Clock enable for clock 2 (20) 
Disable Enable 
Disable  Click to create clock enable signal for clock 2. 
Clear signal for input registers (21) 
Disable Enable 
Disable  Click to create Clr[0] signal for all input registers. Use the Type of clear signal parameter to select asynchronous clear or synchronous clear for the input registers. 
Clear signal for output and pipeline registers (22) 
Disable Enable 
Disable  Click to create Clr[1] signal for all output and pipeline registers.
Use the Type of clear signal parameter to select asynchronous clear or synchronous clear for the output and pipeline registers. 
Double accumulator module (23) 
Disable Enable 
Disable  Click to enable double accumulator feature. 
Chainout output bus (24) 
Disable Enable 
Disable  Click to enable Chainout output bus. 
Parameter  IP Generated Parameter  Value  Default Value  Description 

Load Const Setting  
What is the value for loadconst?  load_const_value  0  63  0  Specify the preset constant value. This value can be 2^{N} where N is the preset constant value. 
Coefficient A Storage Configuration  
Coef_a_0  coef_a_0  Integer  0  Specify the coefficient values for
ax input bus. For 18bit operation mode, the maximum input value is 2^{18}  1. For 27bit operation, the maximum value is 2^{27}  1. 
Coef_a_1  coef_a_1  
Coef_a_2  coef_a_2  
Coef_a_3  coef_a_3  
Coef_a_4  coef_a_4  
Coef_a_5  coef_a_5  
Coef_a_6  coef_a_6  
Coef_a_7  coef_a_7  
Coefficient B Storage Configuration  
Coef_b_0  coef_a_0  Integer  0  Specify the coefficient values for
ax input bus. Set coefficient values to more than 67108864 when operand is set to unsigned and negate is enabled. 
Coef_b_1  coef_a_1  
Coef_b_2  coef_a_2  
Coef_b_3  coef_a_3  
Coef_b_4  coef_a_4  
Coef_b_5  coef_a_5  
Coef_b_6  coef_a_6  
Coef_b_7  coef_a_7 
Signals
The following figure shows the input and output signals of the Native Fixed Point DSP Intel^{®} Stratix^{®} 10 FPGA IP core.
Signal Name  Type  Width  Description 

ax[26:0]  Input  27  Input data bus to top multiplier.
This signal is not available when internal coefficient feature is enabled. 
ay[26:0]  Input  27  Input data bus to top multiplier. When preadder is enabled, these signals are served as input to the top preadder. 
az[25:0]  Input  26 
These signal are input to the top preadder. These signals are only available when preadder is enabled and not available in m18x18_plus36 operational mode. 
bx[17:0]  Input  18  Input data bus to bottom multiplier. These signals are not available in m27×27operational mode and when internal coefficient feature is enabled. 
by[18:0]  Input  19  Input data bus to bottom multiplier. When preadder is enabled, these signals serve as input signals to the bottom preadder. These signals are not available in m27×27 operational mode. 
bz[17:0]  Input  18 
These signals are input signals to the bottom preadder. These signals are only available when preadder is enabled. These signals are not available in m18x18_plus36 and m27×27 operational modes. 
Signal Name  Type  Width  Description 

resulta[63:0]  Output  64  Output data bus from top multiplier. Only in m18×18_full mode, these signals support up to 37 bits. 
resultb[36:0]  Output  37  Output data bus from bottom multiplier. These signals are only available in m18×18_full operational mode. 
Signal Name  Type  Width  Description 

clk[2:0]  Input  3  Input clock
for
all registers. These clock are only available if any of the input registers, pipeline registers or output register is set to Clock0 or Clock1 or Clock2.

ena[2:0]  Input  3  Clock enable for clk[2:0]. These signals are activeHigh.

clr[1:0]  Input  2  These signals can be asynchronous or synchronous clear input signals
for all registers. You may select the type of clear input signal
using Type of CLEAR signal
parameter. These signals are activeHigh. Use clr[0] for all input registers and use clr[1] for all pipeline and output registers. By default, this signal is deasserted. 
Signal Name  Type  Width  Description 

sub  Input  1  Dynamic input signal to control the operation of the adder module.
By default, this signal is deasserted. You can assert or deassert this signal during runtime. This signal is not available in m18x18_full, m18x18_full_top, and m27x27 operational modes. 
negate  Input  1  Dynamic input signal to control the operation of the chainout adder
module.
By default, this signal is deasserted. You can assert or deassert this signal during runtime. This signal is not available in m18x18_full and m18x18_full_topoperational modes. 
accumulate  Input  1  Input signal to enable or disable the accumulator feature.
By default, this signal is deasserted. You can assert or deassert this signal during runtime. This signal is not available in m18x18_full and m18x18_full_topoperational modes. 
loadconst  Input  1  Input signal to enable or disable the load constant feature.
By default, this signal is deasserted. You can assert or deassert this signal during runtime. This signal is not available in m18x18_full and m18x18_full_top operational modes. 
Signal Name  Type  Width  Description 

coefsela[2:0]  Input  3  Input selection signals for 8 coefficient values defined by user for
the top multiplier. The coefficient values are stored in the
internal memory and specified by parameters coef_a_0 to coef_a_7.
These signals are only available when the internal coefficient feature is enabled. These signals are not available in m18x18_plus36 operational mode. 
coefselb[2:0]  Input  3  Input selection signals for 8 coefficient values defined by user for
the bottom multiplier. The coefficient values are stored in the
internal memory and specified by parameters coef_b_0 to coef_b_7.
These signals are only available when the internal coefficient feature is enabled. These signals are not available in m18x18_full, m18x18_plus36 and m27x27 operational modes. 
Signal Name  Type  Width  Description 

scanin[26:0]  Input  27  Input data bus for input cascade module. Connect these signals to the scanout signals from the preceding DSP core. 
scanout[26:0]  Ouput  27  Output data bus of the input cascade module. Connect these signals to the scanin signals of the next DSP core. 
Signal Name  Type  Width  Description 

chainin[63:0]  Input  64  Input data bus for output cascade module. Connect these signals to the chainout signals from the preceding DSP core. In 18 x 18 systolic mode, only 44 bits of output cascade is supported. 
chainout[63:0]  Output  64  Output data bus of the output cascade module. Connect these signals to the chainin signals of the next DSP core. In 18 x 18 systolic mode, only 44 bits of output cascade is supported. 
Multiply Adder IP Core References
The Multiply Adder Intel^{®} FPGA IP core allows you to implement a multiplieradder.^{9}
The following figure shows the ports for the Multiply Adder Intel^{®} FPGA IP core.
A multiplieradder accepts pairs of inputs, multiplies the values together and then adds to or subtracts from the products of all other pairs.
The DSP block uses 18 × 19bit input multipliers to process data with widths up to 18 bits and 27 × 27 bit input multipliers to process data with widths between 18 to 27 bits. For data with widths more than 27 bits, the DSP block uses partial products algorithm to process the data and 27 × 27bit input multiplier to process data with widths between 18 to 27 bits.
The registers and extra pipeline registers for the following signals are also placed inside the DSP block:
 Data input
 Signed or unsigned select
 Add or subtract select
 Products of multipliers
In the case of the output result, the first register is placed in the DSP block. However the extra latency registers are placed in logic elements outside the block. Peripheral to the DSP block, including data inputs to the multiplier, control signal inputs, and outputs of the adder, use regular routing to communicate with the rest of the device. All connections in the function use dedicated routing inside the DSP block. This dedicated routing includes the shift register chains when you select the option to shift a multiplier's registered input data from one multiplier to an adjacent multiplier.
Features
The Multiply Adder Intel^{®} FPGA IP core offers the following features:
 Generates a multiplier
to perform multiplication operations of two numbers Note: When building multipliers larger than the natively supported size there may/will be a performance impact resulting from the partial production implementation.
 Supports data widths of 1– 256 bits
 Supports signed and unsigned data representation format
 Supports pipelining with configurable input latency
 Provides an option to dynamically switch between signed and unsigned data support
 Provides an option to dynamically switch between add and subtract operation
 Supports optional asynchronous and synchronous clear and clock enable input ports
 Supports systolic delay register mode
 Supports preadder with 8 preload coefficients per multiplier
 Supports preload constant to complement accumulator feedback
Preadder
With preadder, additions or subtractions are done prior to feeding the multiplier.
There are five preadder modes:
 Simple mode
 Coefficient mode
 Input mode
 Square mode
 Constant mode
Preadder Simple Mode
In this mode, both operands derive from the input ports and preadder is not used or bypassed. This is the default mode.
Preadder Coefficient Mode
In this mode, one multiplier operand derives from the preadder, and the other operand derives from the internal coefficient storage. The coefficient storage allows up to 8 preset constants. The coefficient selection signals are coefsel[0..3].
This mode is expressed in the following equation.
The following shows the preadder coefficient mode of a multiplier.
Preadder Input Mode
In this mode, one multiplier operand derives from the preadder, and the other operand derives from the datac[] input port.
This mode is expressed in the following equation.
The following shows the preadder input mode of a multiplier.
Preadder Square Mode
This mode is expressed in the following equation.
The following shows the preadder square mode of two multipliers.
Preadder Constant Mode
In this mode, one multiplier operand derives from the input port, and the other operand derives from the internal coefficient storage. The coefficient storage allows up to 8 preset constants. The coefficient selection signals are coefsel[0..3].
This mode is expressed in the following equation.
The following figure shows the preadder constant mode of a multiplier.
Systolic Delay Register
In a systolic architecture, the input data is fed into a cascade of registers acting as a data buffer. Each register delivers an input sample to a multiplier where it is multiplied by the respective coefficient. The chain adder stores the gradually combined results from the multiplier and the previously registered result from the chainin[] input port to form the final result. Each multiplyadd element must be delayed by a single cycle so that the results synchronize appropriately when added together. Each successive delay is used to address both the coefficient memory and the data buffer of their respective multiplyadd elements. For example, a single delay for the second multiply add element, two delays for the third multiplyadd element, and so on.
x(t) represents the results from a continuous stream of input samples and y(t) represents the summation of a set of input samples, and in time, multiplied by their respective coefficients. Both the input and output results flow from left to right. The c(0) to c(N1) denotes the coefficients. The systolic delay registers are denoted by S^{1}, whereas the ^{–1} represents a single clock delay. Systolic delay registers are added at the inputs and outputs for pipelining in a way that ensures the results from the multiplier operand and the accumulated sums stay in synch. This processing element is replicated to form a circuit that computes the filtering function. This function is expressed in the following equation.
N represents the number of cycles of data that has entered into the accumulator, y(t) represents the output at time t, A(t) represents the input at time t, and B(i) are the coefficients. The t and i in the equation correspond to a particular instant in time, so to compute the output sample y(t) at time t, a group of input samples at N different points in time, or A(n), A(n1), A(n2), … A(nN+1) is required. The group of N input samples are multiplied by N coefficients and summed together to form the final result y.
The systolic register architecture is available only for sumof2 and sumof4 modes.
The following figure shows the systolic delay register implementation of 2 multipliers.
The sum of two multipliers is expressed in the following equation.
The following figure shows the systolic delay register implementation of 4 multipliers.
The sum of four multipliers is expressed in the following equation.
The following lists the advantages of systolic register implementation:
 Reduces DSP resource usage
 Enables efficient mapping in the DSP block using the chain adder structure
Preload Constant
The preload constant controls the accumulator operand and complements the accumulator feedback. The valid LOADCONST_VALUE ranges from 0–64. The constant value is equal to 2^{ N }, where N = LOADCONST_VALUE. When the LOADCONST_VALUE is set to 64, the constant value is equal to 0. This function can be used as biased rounding.
The following figure shows the preload constant implementation.
Double Accumulator
The double accumulator feature adds an additional register in the accumulator feedback path that process the interleaved complex data (I, Q) . The double accumulator register follows the output register, which includes the clock, clock enable, and aclr. The additional accumulator register returns result with a onecycle delay. This feature enables you to have two accumulator channels with the same resource count.
The following figure shows the double accumulator implementation.
Parameters
You can customize the Multiply Adder Intel^{®} FPGA IP core by specifying the parameters using the parameter editor in the Intel^{®} Quartus^{®} Prime software.
General Tab
Parameter  Value  Default Value  Description 

What is the number of multipliers? 
1  4 
1  Number of multipliers to be added together. Values are 1 up to 4. 
How wide should the A input buses be?  1  256  16  Specify the width of the dataa[] port. 
How wide should the B input buses be?  1  256  16  Specify the width of the datab[] port. 
How wide should the 'result' output bus be?  1  256  32  Specify the width of the result[] port. 
Create an associated clock enable for each clock 
On Off 
Off  Select this option to create clock enable for each clock. 
Extra Modes
Parameter  Value  Default Value  Description 

Outputs Configuration  
Register output of the adder unit 
On Off 
Off 
Turn on this option to enable output register of the adder module. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the clock source for output registers. You must select Register output of the adder unit to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
adder output register. You must select Register output of the adder unit to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the adder
output register. You must select Register output of the adder unit to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
Adder Operation  
What operation should be performed on outputs of the first pair of multipliers? 
ADD, SUB, VARIABLE 
ADD 
Select addition or subtraction operation to perform for the outputs between the first and second multipliers.
When VARIABLE value is selected:
You must select more than two multipliers to enable this parameter. 
Register 'addnsub1' input 
On Off 
Off  Turn
on this option to enable input register for addnsub1 port. You must select VARIABLE for What operation should be performed on outputs of the first pair of multipliers to enable this parameter. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to specify the input
clock signal for addnsub1 register. You must select Register 'addnsub1' input to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
addnsub1 register. You must select Register 'addnsub1' input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
addnsub1 register. You must select Register 'addnsub1' input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What operation should be performed on outputs of the second pair of multipliers? 
ADD, SUB, VARIABLE 
ADD 
Select addition or subtraction operation to perform for the outputs between the third and fourth multipliers.
When VARIABLE value is selected:
You must select the value 4 for What is the number of multipliers? to enable this parameter. 
Register 'addnsub3' input 
On Off 
Off  Turn
on this option to enable input register for addnsub3 signal. You must select VARIABLE for What operation should be performed on outputs of the second pair of multipliers to enable this parameter. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to specify the input
clock signal for addnsub3 register. You must select Register 'addnsub3' input to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
addnsub3 register. You must select Register 'addnsub3' input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
addnsub3 register. You must select Register 'addnsub3' input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
Polarity  
Enable ‘use_subadd’ 
On Off 
Off 
Turn on this option to reverse the function of addnsub input port. When this option is turned on, do the following:

Multipliers Tab
Parameter  Value  Default Value  Description 

What is the representation format for Multipliers A inputs? 
SIGNED, UNSIGNED, VARIABLE 
UNSIGNED  Specify the representation format for the multiplier A input. 
Register ‘signa’ input 
On Off 
Off  Select this option to enable signa register. You must select VARIABLE value for What is the representation format for Multipliers A inputs? parameter to enable this option. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the input clock signal for signa register. You must select Register ‘signa’ input to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
signa register. You must select Register ‘signa’ input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
signa register. You must select Register ‘signa’ input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the representation format for Multipliers B inputs? 
SIGNED, UNSIGNED, VARIABLE 
UNSIGNED  Specify the representation format for the multiplier B input. 
Register ‘signb’ input 
On Off 
Off  Turn
on this option to enable signb register. You must select VARIABLE value for What is the representation format for Multipliers B inputs? parameter to enable this option. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the input clock signal for signb register. You must select Register ‘signb’ input to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
signb register. You must select Register ‘signb’ input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
signb register. You must select Register ‘signb’ input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
Input Configuration  
Register input A of the multiplier 
On Off 
Off  Turn on this option to enable input register for dataa input bus. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the register input clock signal for dataa input bus. You must select Register input A of the multiplier to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the register asynchronous clear source for
the dataa input bus. You must select Register input A of the multiplier to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the register synchronous clear source for
the dataa input bus. You must select Register input A of the multiplier to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
Register input B of the multiplier 
On Off 
Off  Turn on this option to enable input register for datab input bus. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the register input clock signal for datab input bus. You must select Register input B of the multiplier to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the register asynchronous clear source for
the datab input bus. You must select Register input B of the multiplier to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the register synchronous clear source for
the datab input bus. You must select Register input B of the multiplier to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the input A of the multiplier connected to? 
Multiplier input Scan chain input 
Multiplier input  Select the input source for input A of the
multiplier. Select Multiplier input to use dataa input bus as the source to the multiplier. Select Scan chain input to use scanin input bus as the source to the multiplier and enable the scanout output bus. This parameter is available when you select 2, 3 or 4 for What is the number of multipliers? parameter. 
Scanout A Register Configuration  
Register output of the scan chain 
On Off 
Off  Turn
on this option to enable output register for scanouta output bus. You must select Scan chain input for What is the input A of the multiplier connected to? parameter to enable this option. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the register input clock signal for scanouta output bus. You must turn on Register output of the scan chain parameter to enable this option. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the register asynchronous clear source for
the scanouta output bus. You must turn on Register output of the scan chain parameter to enable this option. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the register synchronous clear source for
the scanouta output bus. You must select Register output of the scan chain parameter to enable this option. The IP core supports either asynchronous or synchronous clear but not both. 
Preadder Tab
Parameter  Value  Default Value  Description 

Select preadder mode 
SIMPLE, COEF, INPUT, SQUARE, CONSTANT 
SIMPLE 
Specifies the operation mode for preadder module. SIMPLE: This mode bypass the preadder. This is the default mode. COEF: This mode uses the output of the preadder and coefsel input bus as the inputs to the multiplier. INPUT: This mode uses the output of the preadder and datac input bus as the inputs to the multiplier. SQUARE: This mode uses the output of the preadder as both the inputs to the multiplier. CONSTANT: This mode uses dataa input bus with preadder bypassed and coefsel input bus as the inputs to the multiplier. 
Select preadder direction 
ADD, SUB 
ADD  Specifies the operation of the preadder. To enable this parameter, select the following for
Select preadder mode:

How wide should the C input buses be?  1  256  16  Specifies the number of bits for C input bus. You must select INPUT for Select preadder mode to enable this parameter. 
Data C Input Register Configuration  
Register datac input 
On Off 
On  Turn on this option to enable input register for
datac input bus. You must set INPUT to Select preadder mode parameter to enable this option. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to specify the input
clock signal for datac input register.
You must select Register datac input to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
datac input register. You must select Register datac input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
datac input register. You must select Register datac input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
Coefficients  
How wide should the coef width be?  1  27  18  Specifies the number of bits for coefsel input bus. You must select COEF or CONSTANT for preadder mode to enable this parameter. 
Coef Register Configuration  
Register the coefsel input 
On Off 
Checked  Select this option to enable input register for
coefsel input bus. You must select COEF or CONSTANT for preadder mode to enable this parameter. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to specify the input
clock signal for coefsel input
register. You must select Register the coefsel input to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
coefsel input register. You must select Register the coefsel input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
What is the source for synchronous clear input 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
coefsel input register. You must select Register the coefsel input to enable this parameter. The IP core supports either asynchronous or synchronous clear but not both. 
Coefficient_0 Configuration  0x00000 – 0xFFFFFFF  0x00000000  Specifies the coefficient values for this first
multiplier. The number of bits must be the same as specified in How wide should the coef width be? parameter. You must select COEF or CONSTANT for preadder mode to enable this parameter. 
Coefficient_1 Configuration  0x00000 – 0xFFFFFFF  0x00000000  Specifies the coefficient values for this second
multiplier. The number of bits must be the same as specified in How wide should the coef width be? parameter. You must select COEF or CONSTANT for preadder mode to enable this parameter. 
Coefficient_2 Configuration  0x00000 – 0xFFFFFFF  0x00000000  Specifies the coefficient values for this third
multiplier. The number of bits must be the same as specified in How wide should the coef width be? parameter. You must select COEF or CONSTANT for preadder mode to enable this parameter. 
Coefficient_3 Configuration  0x00000 – 0xFFFFFFF  0x00000000  Specifies the coefficient values for this fourth
multiplier. The number of bits must be the same as specified in How wide should the coef width be? parameter. You must select COEF or CONSTANT for preadder mode to enable this parameter. 
Accumulator Tab
Parameter  Value  Default Value  Description 

Enable accumulator? 
YES, NO 
NO  Select YES
to enable the accumulator. You must select Register output of adder unit when using accumulator feature. 
What is the accumulator operation type? 
ADD, SUB 
ADD  Specifies the operation of the accumulator:
You must select YES for Enable accumulator? parameter to enable this option. 
Preload Constant  
Enable preload constant 
On Off 
Off  Enable the accum_sload or sload_accum signals and the registers
input to dynamically select the input to the accumulator. When accum_sload is low or sload_accum is high, the multiplier output is feed into the accumulator. When accum_sload is high or sload_accum is low, a user specified preload constant is feed into the accumulator. You must select YES for Enable accumulator parameter to enable this option. 
What is the input of accumulate port connected to? 
ACCUM_SLOAD, SLOAD_ACCUM 
ACCUM_SLOAD  Specifies the behavior of accum_sload/sload_accum
signal. ACCUM_SLOAD: Drive accum_sload low to load the multiplier output to the accumulator. SLOAD_ACCUM: Drive sload_accum high to load the multiplier output to the accumulator. You must select Enable preload constant option to enable this parameter. 
Select value for preload constant  0  64  64  Specify the preset constant value. This value can be 2^{N} where N is the preset constant value. N=64 represents a constant zero. You must select Enable preload constant option to enable this parameter. 
What is the source for clock input? 
Clock0 Clock1 Clock2 
Clock0  Select Clock0
, Clock1 or
Clock2 to specify the input
clock signal for accum_sload/sload_accum register. You must select Enable preload constant option to enable this parameter. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
accum_sload/sload_accum register. You must select Enable preload constant option to enable this parameter. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
accum_sload/sload_accum register. You must select Enable preload constant option to enable this parameter. 
Enable double accumulator 
TRUE FALSE 
FALSE  To enable or disable the double accumulator feature. 
Systolic/Chainout Tab
Parameter  Value  Default Value  Description 

Enable chainout adder 
YES, NO 
NO  Select YES to enable chainout adder module. 
What is the chainout adder operation type? 
ADD, SUB 
ADD  Specifies the chainout adder operation. For subtraction operation, SIGNED must be selected for What is the representation format for Multipliers A inputs? and What is the representation format for Multipliers B inputs? in the Multipliers Tab. 
Enable ‘negate’ input for chainout adder? 
PORT_USED, PORT_UNUSED 
PORT_UNUSED  Select PORT_USED to enable negate input signal. This parameter is invalid when chainout adder is disabled. 
Register ‘negate’ input? 
UNREGISTERED, CLOCK0, CLOCK1, CLOCK2, CLOCK3 
UNREGISTERED  To enable the input register for negate input signal and specifies the
input clock signal for negate
register. Select UNREGISTERED if the negate input register to is not needed This parameter is invalid when you select:

What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
negate register. This parameter is invalid when you select:

What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
negate register. This parameter is invalid when you select:

Systolic Delay  
Enable systolic delay registers 
On Off 
Off  Select this option to enable systolic mode. This parameter is available when you select 2, or 4 for What is the number of multipliers? parameter. You must enable the Register output of the adder unit to use the systolic delay registers. 
What is the source for clock input? 
CLOCK0, CLOCK1, CLOCK2, 
CLOCK0  Specifies the input clock signal for systolic delay
register. You must select enable systolic delay registers to enable this option. 
What is the source for asynchronous clear input? 
NONE ACLR0 ACLR1 
NONE  Specifies the asynchronous clear source for the
systolic delay register. You must select enable systolic delay registers to enable this option. 
What is the source for synchronous clear input? 
NONE SCLR0 SCLR1 
NONE  Specifies the synchronous clear source for the
systolic delay register. You must select enable systolic delay registers to enable this option. 
Pipelining Tab
Parameter  IP Generated Parameter  Value  Default Value  Description 

Pipelining Configuration  
Do you want to add pipeline register to the input?  gui_pipelining 
No, Yes 
No  Select Yes
to enable an additional level of pipeline register to the input
signals. You must specify a value greater than 0 for Please specify the number of latency clock cycles parameter. 
Please specify the number of latency clock cycles  latency  Any value greater than 0  0  Specifies the desired latency in clock cycles. One level of pipeline register = 1 latency in clock cycle. You must select YES for Do you want to add pipeline register to the input? to enable this option. 
What is the source for clock input?  gui_input_latency_clock 
CLOCK0, CLOCK1, CLOCK2 
CLOCK0  Select Clock0
, Clock1 or
Clock2 to enable and
specify the pipeline register input clock signal. You must select YES for Do you want to add pipeline register to the input? to enable this option. 
What is the source for asynchronous clear input?  gui_input_latency_aclr 
NONE ACLR0 ACLR1 
NONE  Specifies the register asynchronous clear source for
the additional pipeline register. You must select YES for Do you want to add pipeline register to the input? to enable this option. 
What is the source for synchronous clear input?  gui_input_latency_sclr 
NONE SCLR0 SCLR1 
NONE  Specifies the register synchronous clear source for
the additional pipeline register. You must select YES for Do you want to add pipeline register to the input? to enable this option. 
Signals
The following tables list the input and output signals of the Multiply Adder Intel^{®} FPGA IP core.
Signal  Required  Description 

dataa_0[]/dataa_1[]/dataa_2[]/dataa_3[]  Yes  Data input to the multiplier. Input port [NUMBER_OF_MULTIPLIERS * WIDTH_A  1 … 0] wide 
datab_0[]/datab_1[]/datab_2[]/datab_3[]  Yes  Data input to the multiplier. Input signal [NUMBER_OF_MULTIPLIERS * WIDTH_B  1 … 0] wide 
datac_0[] /datac_1[]/datac_2[]/datac_3[]  No  Data input to the multiplier. Input signal [NUMBER_OF_MULTIPLIERS * WIDTH_C  1, … 0] wide Select INPUT for Select preadder mode parameter to enable these signals. 
clock[1:0]  No  Clock input port to the corresponding register. This signal can be used by any register in the IP core. 
aclr[1:0]  No  Asynchronous clear input to the corresponding register. 
sclr[1:0]  No  Synchronous clear input to the corresponding register. 
ena[1:0]  No  Enable signal input to the corresponding register. 
signa  No  Specifies the numerical representation of the multiplier input A. If
the signa
signal
is high, the multiplier treats the multiplier input A
signal
as a signed number. If the signa
signal
is low, the multiplier treats the multiplier input A
signal
as an unsigned number.
Select VARIABLE for What is the representation format for Multipliers A inputs parameter to enable this signal. 
signb  No  Specifies the numerical representation of the multiplier input B signal. If the signb signal is high, the multiplier treats the multiplier input B signal as a signed two's complement number. If the signb signal is low, the multiplier treats the multiplier input B signal as an unsigned number. 
scanina[]  No  Input for scan chain A. Input signal [WIDTH_A  1, ... 0] wide. When the INPUT_SOURCE_A parameter has a value of SCANA, the scanina[] signal is required. 
accum_sload  No  Dynamically specifies whether the accumulator value is constant. If the accum_sload signal is low, then the multiplier output is loaded into the accumulator. Do not use accum_sload and sload_accum simultaneously. 
sload_accum  No  Dynamically specifies whether the accumulator value is constant. If the sload_accum signal is high, then the multiplier output is loaded into the accumulator. Do not use accum_sload and sload_accum simultaneously. 
chainin[]  No  Adder result input bus from the preceding stage. Input signal [WIDTH_CHAININ  1, … 0] wide. 
addnsub1  No  Perform addition or subtraction to the outputs from the first pair of multipliers. Input 1 to addnsub1 signal to add the outputs from the first pair of multipliers. Input 0 to addnsub1 signal to subtract the outputs from the first pair of multipliers. 
addnsub3  No  Perform addition or subtraction to the outputs from the first pair of multipliers. Input 1 to addnsub3 signal to add the outputs from the second pair of multipliers. Input 0 to addnsub3 signal to subtract the outputs from the first pair of multipliers. 
coefsel0[]  No  Coefficient input signal[0:3] to the first multiplier. 
coefsel1[]  No  Coefficient input signal[0:3]to the second multiplier. 
coefsel2[]  No  Coefficient input signal[0:3]to the third multiplier. 
coefsel3[]  No  Coefficient input signal [0:3] to the fourth multiplier. 
Signal  Required  Description 

result []  Yes  Multiplier output signal. Output signal [WIDTH_RESULT  1 … 0] wide 
scanouta []  No  Output of scan chain A. Output signal
[WIDTH_A  1..0] wide.
Select more than 2 for numbers of multipliers and choose Scan chain input for What is the input A of the multiplier connected to parameter to enable this signal. 
ALTMULT_COMPLEX Intel FPGA IP Core Reference
You can use the ALTMULT_COMPLEX Intel^{®} FPGA IP core to implement the complex multiplier by instantiating two multipliers.
Features
The ALTMULT_COMPLEX Intel^{®} FPGA IP core offers the following features:
 Generates a multiplier
to perform multiplication operations of two complex numbers Note: When building multipliers larger than the natively supported size there may/will be a performance impact resulting from the partial products calculations..
 Supports data width of 1–256 bits
 Supports signed and unsigned data representation format
 Supports pipelining with configurable output latency
 Supports optional asynchronous and synchronous clear and clock enable input ports
Complex Multiplication
Complex numbers are numbers in the form of the following equation:
a + ib
Where:
 a and b are real numbers
 i is an imaginary unit that equals the square root of 1.
Parameters
Parameter  Value  Default Value  Description 

General  
How wide should the A input buses be?  1–256  18  Specifies the number of bits for dataa_imag and dataa_real input buses. 
How wide should the B input buses be?  1–256  18  Specifies the number of bits for datab_imag and datab_real input buses. 
How wide should the ‘result’ output bus be?  1–256  36  Specifies the number of bits for ‘result’ output bus. 
Input Representation  
What is the representation format for A inputs? 
Signed, Unsigned 
Signed  Specifies the representation format for A inputs. Only Signed representation format is supported in Intel^{®} Stratix^{®} 10 devices. 
What is the representation format for B inputs? 
Signed, Unsigned 
Signed  Specifies the representation format for B inputs. Only Signed representation format is supported in Intel^{®} Stratix^{®} 10 devices. 
Implementation Style  
Which implementation style should be used? 
Automatically select a style for best tradeoff for the current settings Canonical. (Minimize the number of simple multipliers) Conventional. (Minimize the use of logic cells) 
Automatically select a style for best tradeoff for the current settings  Intel^{®} Stratix^{®} 10 device supports only Automatically select a style for best tradeoff for the current settings style. Intel^{®} Quartus^{®} Prime software will determine the best implementation based on the selected device family and input width. 
Pipelining  
Output latency  0  11  4  Specifies the number of clock cycles for output latency. 
Create a Clear input? 
NONE ACLR SCLR 
NONE  Select this option to create aclr or sclr signal for the complex multiplier. 
Create a Clock Enable input? 
On Off 
Off  Select this option to create ena signal for the complex multiplier clock. 
Signals
Signal  Required  Description 

aclr  No  Asynchronous clear for the complex multiplier. When the aclr signal is asserted high, the function is asynchronously cleared. 
sclr  No  Synchronous clear for the complex multiplier. When the sclr signal is asserted high, the function is asynchronously cleared. 
clock  Yes  Clock input to the ALTMULT_COMPLEX function. 
dataa_imag[]  Yes  Imaginary input value for the data A signal of the complex multiplier. The size of the input signal depends on the WIDTH_A parameter value. 
dataa_real[]  Yes  Real input value for the data A signal of the complex multiplier. The size of the input signal depends on the WIDTH_A parameter value. 
datab_imag[]  Yes  Imaginary input value for the data B signal of the complex multiplier. The size of the input signal depends on the WIDTH_B parameter value. 
datab_real[]  Yes  Real input value for the data B signal of the complex multiplier. The size of the input signal depends on the WIDTH_B parameter value. 
ena  No  Active high clock enable for the clock signal of the complex multiplier. 
complex  No  Optional input to enable dynamic switching
between 36 × 36 normal model and 18 × 18 complex mode.
This input is only available in Stratix V devices. In the GUI, this parameter is referred as Dynamic Complex Mode. 
Signal  Required  Description 

result_imag  Yes  Imaginary output value of the multiplier. The size of the output signal depends on the WIDTH_RESULT parameter value. 
result_real  Yes  Real output value of the multiplier. The size of the output signal depends on the WIDTH_RESULT parameter value. 
LPM_MULT Intel FPGA IP Core References
The LPM_MULT Intel^{®} FPGA IP core implements a multiplier to multiply two input data values to produce a product as an output.
Features
 Generates a multiplier that multiplies two input data values
 Supports data width of 1–256 bits
 Supports signed and unsigned data representation format
 Supports area or speed optimization
 Supports pipelining with configurable output latency
 Provides an option
for implementation in dedicated digital signal processing (DSP) block
circuitry or logic elements (LEs) Note: When building multipliers larger than the natively supported size there may/will be a performance impact resulting from the cascading of the DSP blocks.
 Supports optional asynchronous and synchronous clear and clock enable input ports
Parameters
You can customize the Intel^{®} Stratix^{®} 10 LPM_MULT Intel^{®} FPGA IP core by specifying the parameters using the IP Parameter Editor in the Intel^{®} Quartus^{®} Prime software.
General Tab
Parameter  Value  Default Value  Description 

Multiplier Configuration  
Type 
Multiply 'dataa' input by 'datab' input Multiply 'dataa' input by itself (squaring operation) 
Multiply 'dataa' input by 'datab' input  Select the desired configuration for the multiplier. 
Data Port Widths  
Dataa width  1  256 bits  8 bits  Specify the width of the dataa[] port. 
Datab width  1  256 bits  8 bits  Specify the width of the datab[] port. 
How should the width of the 'result' output be determined?  
Type 
Automatically calculate the width Restrict the width 
Automatically calculate the width  Select the desired method to determine the width of the result[] port. 
Value  1  512 bits  16 bits  Specify the width of the result[] port. This value will only be effective if you select Restrict the width in the Type parameter. 
Result width  1  512 bits  —  Displays the effective width of the result[] port. 
General 2 Tab
Parameter  Value  Default Value  Description 

Datab Input  
Does the 'datab' input bus have a constant value? 
No Yes 
No  Select Yes to specify the constant value of the ‘datab’ input bus, if any. 
Value  Any value greater than 0  0  Specify the constant value of datab[] port. 
Multiplication Type  
Which type of multiplication do you want? 
Unsigned Signed 
Unsigned  Specify the representation format for both dataa[] and datab[] inputs. 
Implementation Style  
Which multiplier implementation should be used? 
Use the default implementation Use the dedicated multiplier circuitry (Not available for all families) Use logic elements 
Use the default implementation 
Select the desired method to determine the width of the result[] port. When SCLR is selected for Clear Signal Type parameter, only Use the dedicated multiplier circuitry (Not available for all families) option is available. 
Pipelining Tab
Parameter  Value  Default Value  Description 

Do you want to pipeline the function?  
Pipeline 
No Yes 
No  Select Yes to enable pipeline register to the multiplier's output. Enabling the pipeline register adds extra latency to the output. 
Latency  Any value greater than 0.  1  Specify the desired output latency in clock cycle. 
Clear Signal Type 
NONE ACLR SCLR 
NONE  Specify the type of reset for the pipeline register. Select NONE if you do not use any pipeline register. Select ACLR to use asynchronous clear for the pipeline register. This generates ACLR port. Select SCLR to use synchronous clear for the pipeline register. This generates SCLR port. 
Create a 'clken' clock enable clock 
Off On 
Off 
Specifies active high clock enable for the clock port of the pipeline register 
What type of optimization do you want?  
Type 
Default Speed Area 
Default  Specify the desired optimization for the IP
core. Select Default to let Intel^{®} Quartus^{®} Prime software to determine the best optiomization for the IP core. 
Signals
Signal Name  Required  Description 

dataa[]  Yes  Data input.
The size of the input signal depends on the Dataa width parameter value. 
datab[]  Yes  Data input.
The size of the input signal depends on the Datab width parameter value. 
clock  No  Clock input for pipelined usage.
For Latency values other than 1 (default), the clock signal must be enabled. 
clken  No  Clock enable for pipelined usage. When the clken signal is asserted high, the adder/subtractor operation takes place. When the signal is low, no operation occurs. If omitted, the default value is 1. 
aclr  No  Asynchronous clear signal used at any time to reset the pipeline to all 0s, asynchronously to the clock signal. The pipeline initializes to an undefined (X) logic level. The outputs are a consistent, but nonzero value. 
sclr  No  Synchronous clear signal used at any time to reset the pipeline to all 0s, synchronously to the clock signal. The pipeline initializes to an undefined (X) logic level. The outputs are a consistent, but nonzero value. 
signal Name  Required  Description 

result[]  Yes  Data output.
The size of the output signals depends on the Result width parameter. 
Native Floating Point DSP Intel Stratix 10 FPGA IP References
The Native Floating Point DSP Intel^{®} Stratix^{®} 10 FPGA IP instantiates and controls a single Intel^{®} Stratix^{®} 10 Variable Precision DSP block.
Native Floating Point DSP Intel Stratix 10 FPGA IP Core Supported Operational Modes
Operational Modes  Description  Supported Exception Flags 

Multiply mode 
This mode performs single precision multiplication operation. This mode applies the following equation:


Add mode  This mode performs single precision addition or subtraction
operation. This mode applies the
following
equations:


Multiply Add mode 
This mode performs single precision multiplication, followed by addition or subtraction operations. This mode applies the
following equations:


Multiply Accumulate mode 
This mode performs floatingpoint multiplication followed by floatingpoint addition or subtraction with the previous multiplication result. This mode applies the
following
equations:


Vector Mode 1 
This mode performs floatingpoint multiplication followed by floatingpoint addition or subtraction with the chainin input from the previous variable DSP Block. This mode applies the
following
equations:


Vector Mode 2  This mode performs floatingpoint multiplication where the
multiplication result is directly fed to chainout.
The chainin input from the previous variable DSP
Block is then added or subtracted from input Ax as
the output result. This mode applies the following equations:

Parameterizing the Native Floating Point DSP Intel Stratix 10 FPGA IP
 In Intel^{®} Quartus^{®} Prime Pro Edition,create a new project that targets a Intel^{®} Stratix^{®} 10 device.

In IP Catalog, click Library > DSP > Primitive DSP >
Native Floating Point DSP
Intel^{®}
Stratix^{®} 10 FPGA IP.
The Native Floating Point DSP Intel^{®} Stratix^{®} 10 FPGA IP Core IP parameter editor opens.
 In the New IP Variation dialog box, enter an Entity Name and click OK.
 Under Parameters, select the DSP Template and the View you want for your IP core
 In the DSP Block View, switch the clock or reset of each valid register.
 For Multiply Add or Vector Mode 1, click the Chain In multiplexer in the GUI to select input from chainin port or Ax port.
 Click the Adder symbol in the GUI to select addition or subtraction.
 Click the Chain Out multiplexer in the GUI to enable chainout port.
 Click Generate HDL.
 Click Finish.
Native Floating Point DSP Intel Stratix 10 FPGA IP Parameters
Parameter  Value  Default Value  Description 

DSP Template 
Multiply Add Multiply Add Multiply Accumulate Vector Mode 1 Vector Mode 2 
Multiply 
Select the desired operational mode for the DSP block. The selected operation is reflected in the DSP Block View. 
View 
Register Enables Register Clears 
Register Enables 
Options to select clocking scheme or reset scheme for registers view. The selected operation is reflected in the DSP Block View. Select Register Enables for DSP Block View to show registers clocking scheme. You can change the clocks for each of the registers in this view. Select Register Clears for DSP Block View to show registers reset scheme. Turn on Use Single Clear to change the registers reset scheme. 
Clear Type 
None Synchronous Asynchronous 
Synchronous 
Options to select reset type for all registers. Select None to not reset the registers. Select Synchronous use synchronous clear signal type for all registers. Select Asynchronous to use asynchronous clear signal type for all registers. 
Single Clear  On or off  Off 
Turn on this parameter if you want a single reset to reset all the registers in the DSP block. Turn off this parameter to use different reset ports to reset the registers. This parameter is disable when you select None for Clear Type. 
Connect Exception Flags 
On Off 
Off 
Click this parameter to use and generate exception flags output ports for the DSP block. When you turn off this parameter, the IP core does not generate exception flags output ports. 
DSP View Block.  
Chain In Multiplexer (1) 
Enable Disable 
Disable 
Click the multiplexer to enable chainin port. 
Chain Out Multiplexer (2) 
Disable Enable 
Disable 
Click the multiplexer to enable chainout port. 
Adder (3) 
+  
+ 
Click the Adder symbol to select addition or subtraction mode. 
Register Clock (4) 
None Clock 0 Clock 1 Clock 2 
Clock 0 
To bypass any register, switch the register clock to None. Switch the register clock to:
You can only change these settings when you select Register Enables in View parameter. 
Register Clear (4) 
Clear 0 Clear 1 
Clear 0 for input registers Clear 1 for output and pipeline registers 
This view shows the IP core reset scheme. Clear 0 uses clr[0] signal. Clear 1 uses clr[1] signal. All input registers use clr[0] reset signal. All output and pipeline registers use clr[1] reset signal. 
Native Floating Point DSP Intel Stratix 10 FPGA IP Core Signals
The figure shows the input and output signals of the Native Floating Point DSP Intel^{®} Stratix^{®} 10 FPGA IP core.
Signal Name  Type  Width  Default  Description 

ax[31:0]  Input  32  Low  Input data bus to the multiplier. Available
in:

ay[31:0]  Input  32  Low  Input data bus to the multiplier. Available in all floatingpoint operational modes. 
az[31:0]  Input  32  Low 
Input data bus to the multiplier. Available in:

chainin[31:0]  Input  32  Low 
Connect these signals to the chainout signals from the preceding floatingpoint DSP IP core. 
clk[2:0]  Input  3  Low  Input clock signals for all registers. These clock signals are only available if any of the input registers, pipeline registers, or output register is set to Clock0 or Clock1 or Clock2. 
ena[2:0]  Input  3  High  Clock enable for clk[2:0]. These signals are activeHigh.

clr[1:0]  Input  2  Low 
These signals are activehigh. Use clr[0] for all input registers and use clr[1] for all pipeline and output registers. 
accumulate  Input  1  Low  Input signal to enable or disable the
accumulator feature.
You can assert or deassert this signal during runtime. Available in Multiply Accumulate mode. 
chainout[31:0]  Output  32  —  Connect these signals to the chainin signals of the next floatingpoint DSP IP core. 
result[31:0]  Output  32  —  Output data bus from IP core. 
mult_overflow  Output  1 
This signal indicates if the multiplier result is a larger value compared to the maximum presentable value. 1: If the multiplier result is a larger value compared to the maximum representable value and the result is cast to infinity. 0: If the multiplier result is not larger than the maximum presentable value. Not available in Adder mode. 

mult_underflow  Output  1  — 
This signal indicates if the multiplier result is a smaller value compared to the minimum presentable value. 1: If the multiplier result is a smaller value compared to the minimum representable value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. Not available in Adder mode. 
mult_inexact  Output  1  — 
This signal indicates if the multiplier result is an exact representation. 1: If the multiplier result is:
0: If the multiplier result does not meet any of the criteria above. Not available in Adder mode. 
mult_invalid  Output  1  — 
This signal indicates if the multiplier operation is illdefined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. Not available in Adder mode. 
adder_overflow  Output  1  — 
This signal indicates if the adder result is a larger value compared to the maximum representable value. 1: If the adder result is a larger value compared to the maximum presentable value and the result is cast to infinity. 0: If the multiplier result is not larger than the maximum presentable value. Not available in Multiply mode. 
adder_underflow  Output  1  — 
This signal indicates if the adder result is a smaller value compared to the minimum presentable value. 1: If the multiplier result is a smaller value compared to the minimum representable value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. Not available in Multiply mode. 
adder_inexact  Output  1  — 
This signal indicates if the adder result is an exact representation. 1: If the adder result is:
0: If the multiplier result does not meet any of the criteria above. Not available in Multiply mode. 
adder_invalid  Output  1  — 
This signal indicates if the adder operation is illdefined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. Not available in Multiply mode. 
Intel Stratix 10 Variable Precision DSP Blocks User Guide Document Archives
IP Core Version  User Guide 

17.1  Intel Stratix 10 Variable Precision DSP Blocks User Guide 
Document Revision History for Intel Stratix 10 Variable Precision DSP Blocks User Guide
Document Version  Intel^{®} Quartus^{®} Prime Version  Changes 

2018.09.24  18.1 

2018.05.07  18.0 

Date  Version  Changes 

November 2017  2017.11.06 

May 2017  2017.05.08  Updated the behavior description of the sload_accum and accum_sload signals in the ALTERA_MULT_ADD Input Signals table. 
October 2016  2016.10.31  Initial release. 