10.3.4. CSR Map and Descriptor Queue
The CSR interface uses a 32-bit data path in which all accesses are aligned to 32 bits; however the address is a byte address. The size of the CSR address space is 2048 bytes (11 bit addressable). The regions within the CSR address space are listed in the table that follows.
Feature |
Base Address |
|---|---|
0x000 |
|
0x200 |
|
0x210 |
|
0x220 |
|
0x240 |
|
0x250 |
|
| DMA License Register | 0x260 |
| DMA Transaction Counters | 0x264 |
| Model Update Registers | 0x300 |
Register and Bit Attribute Definitions
The following notation describes the CSR registers.
Attribute |
Expansion |
Description |
|---|---|---|
RW |
Read/Write |
This bit can be read or written by software. |
RO |
Read Only |
The bit is set by hardware only. Software can only read this bit. Writes have no effect. |
RW1C |
Read/Write 1to Clear |
Software can read or clear this bit. Software must write 1 to clear this bit. Writing zero to an RW1C bit has no effect. A multibit RW1C field can exist. In that case, all bits in the field are cleared if a 1 is written to any of the bits. |
RsvdZ |
Reserved and zero |
Reserved for future RW1C implementations. When you write to a register with RsvdZ bits, only write zeros to these bits. |
Discovery ROM
The discovery ROM stores metadata. The metadata includes a hash for the architecture that the IP corresponds to and the FPGA AI Suite version that was used to create the IP.
The host runtime can use this information to determine whether the incoming inference job can be run on the IP instances. For example, if the architectures do not match each other, then inference is not possible.
The layout of the discovery ROM is as follows:
| Base Byte Address | Length (in bytes) | Feature |
|---|---|---|
0x000 |
16 |
Hash of the Architecture Description File (.arch) |
0x010 |
32 |
Human-readable FPGA AI Suite version string |
Interrupt Control
The interrupt control feature registers are as follows:
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
ICR |
0x000 |
RW1C |
DMA Interrupt control register |
IMR |
0x004 |
RW |
DMA Interrupt mask register |
The DMA optionally generates level sensitive interrupt signals in response to various events.
The hardware sets the corresponding bit within the ICR register whenever such an event occurs.
An interrupt is generated upon a 0-to-1 transition of a bit within ICR only if the corresponding bit in the IMR is set to one. A 0-to-1 transition of a bit within the IMR also generates an interrupt if the corresponding bit within the ICR is set to 1.
Field |
Bit |
Description |
|---|---|---|
Reserved |
31:2 |
RsvdZ (Reserved; software must write 0) |
Inference_complete |
1 |
Indicates that an inference request has completed |
Error |
0 |
Indicates that an error condition has been triggered |
Field |
Bit |
Description |
|---|---|---|
Reserved |
31:2 |
RsvdZ (Reserved; software must write 0) |
Inference_complete_mask |
1 |
Set to one to enable interrupt generation on inference completion |
Error_mask |
0 |
Set to one to enable interrupt generation on error condition |
DMA Descriptor Queue
The DMA contains a single descriptor FIFO for enqueuing inference requests. Descriptors potentially require multiple register writes and are added to the queue upon writing to the desc_input_output_base_addr register.
The desc_cfg_filter_base_addr and desc_cfg_num_words are registers that hold their value.
If you already enqueued a DMA descriptor and want to enqueue another descriptor with the same values for the desc_cfg_filter_base_addr and desc_cfg_num_words registers, then write to the desc_input_output_base_addr register.
If you want to change the desc_cfg_filter_base_addr and desc_cfg_num_words registers for the next descriptor, then you must set new values before writing to the desc_input_output_base_addr register.
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
desc_cfg_filter_base_addr |
0x000 |
RW |
Base address pointer for the configuration buffer and for the filter buffer. The filters are located at desc_cfg_filter_base_addr + desc_cfg_num_words, which is encoded in the address provided to the filter reader as configuration data. Must be aligned to a multiple of the DDR word size. |
desc_cfg_num_words - 2 |
0x004 |
RW |
Length of the configuration buffer - 2, in config words (64 bits – 32 for instruction, 32 for data) |
desc_input_output_base_addr |
0x008 |
RW |
Base address pointer for the input feature data and output inference results (written to an offset from the base address). Must be aligned to a multiple of the DDR word size. Writing to this register enqueues a descriptor into the internal DMA descriptor queue. |
| desc_diagnostics | 0x00C |
RO |
This register is useful for debugging. Production software should not need to read from this. Bit 0: Asserts if the descriptor queue overflows; this is a sticky bit which only clears after reset. Bit 1: Descriptor queue is full or almost full. Bit 2: Asserts if the inference limit for an unlicensed IP is reached. When asserted, inference requests are rejected. All other bits are reserved. |
DMA Control Registers
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
Intermediate_ddr_base_address |
0x000 |
RW |
Base address for the DDR intermediate data. This is a shared address across all graphs. Only required to be set once upon startup. Must be aligned to a multiple of the DDR word size. |
Inference_completion_count |
0x004 |
RO |
Number of inference request completions by the FPGA AI Suite IP. |
| IP_reset | 0x008 | RW | Write any non-zero value to this address to trigger a reset of the FPGA AI Suite IP. The value is automatically cleared upon reset. Reading from this register always returns 0. |
| Activate_streaming | 0x00C | RW | When streaming is enabled in the architecture, writing "1" to this register makes the FPGA AI Suite IP begin queuing descriptors and start listening for streaming inputs. Writing "0" stops queuing descriptors and turns off the input streaming interface. |
Performance Registers
Hardware counters are provided to measure how many clock cycles that the IP is active. A job is considered active after the first word of its descriptor is read from the descriptor queue. A job is considered finished just before the done interrupt is raised and the completion count is updated.
The IP and supporting host form an elastic pipeline in which multiple jobs can be in flight. The IP tracks both the overall latency (for example, the length of time required to process 100 jobs) as well as the average latency for each of those jobs. The hardware tracks the total latency of every job but knowing the total number of jobs software can compute the average.
64-bit counters mitigate against overflow. There is no synchronization between reading the lower or upper 32 bits of a counter, therefore the software should not read the counters while the IP is active.
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
Total clocks active (lower 32 bits) |
0x000 |
RO |
On each clock cycle, if any IP job is active, increment the counter by 1. |
Total clocks active (upper 32 bits) |
0x004 |
RO |
Same as above. |
Total clocks for all jobs (lower 32 bits) |
0x008 |
RO |
On each clock cycle, if there are N IP jobs active, increment the counter by N. |
Total clocks for all jobs (upper 32 bits) |
0x00C |
RO |
Same as above. |
Debug Network Registers
The debug network has the following registers available from the CSR:
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
| DLA_DMA_CSR_OFFSET_DEBUG_NETWORK_ADDR | 0x000 |
RO |
Address that the debug network uses to issue a read request. |
| DLA_DMA_CSR_OFFSET_DEBUG_NETWORK_VALID | 0x004 |
RO |
Indicates that a read response has been received from the debug network. |
| DLA_DMA_CSR_OFFSET_DEBUG_NETWORK_DATA | 0x008 |
RO |
Data from debug network. |
DMA License Register
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
license_flag |
0x000 |
RO |
Indicates whether the IP is licensed:
|
DMA Transaction Counters
Hardware counters are provided to measure the number of data words accessed by the DMA from the external DDR memory.
The counter values are separated into input feature reads, input weights and biases reads, and output feature writes. The width of each memory word in bytes matches the dma/ddr_data_bytes value in the architecture description file.
Register |
Offset |
Attribute |
Description |
|---|---|---|---|
| Total number of input feature words read by the FPGA AI Suite IP (lower 32 bits) |
0x000 |
RO |
This counter is incremented by 1 for every input feature word transferred from the external memory to the IP DMA on the AXI4 read bus. |
| Total number of input feature words read by the FPGA AI Suite IP (upper 32 bits) |
0x004 |
RO |
Same as above. |
| Total number of input filter and biases words read by the FPGA AI Suite IP (lower 32 bits) |
0x008 |
RO |
This counter is incremented by 1 for every filter-bias word transferred from the external memory to the IP DMA on the AXI4 read bus. |
| Total number of input filter and biases words read by the FPGA AI Suite IP (upper 32 bits) |
0x00C |
RO |
Same as above. |
| Total number of output feature words written by the FPGA AI Suite IP (lower 32 bits) |
0x010 | RO | This counter is incremented by 1 for every feature word written to the external memory by the IP DMA on the AXI4 write bus. |
| Total number of output feature words written by the FPGA AI Suite IP (upper 32 bits) |
0x00C | RO | Same as above. |
Model Update Registers
| Register | Offset | Attribute | Description |
|---|---|---|---|
| MODEL_UPDATE_WORD_0 | 0x000 | W | 32-bit chunk of a scratchpad or configuration word, index 0 (least significant) |
| MODEL_UPDATE_WORD_1 | 0x004 | W | Same as above, index 1 |
| MODEL_UPDATE_WORD_2 | 0x008 | W | Same as above, index 2 |
| MODEL_UPDATE_WORD_3 | 0x00C | W | Same as above, index 3 |
| MODEL_UPDATE_WORD_4 | 0x010 | W | Same as above, index 4 |
| MODEL_UPDATE_WORD_5 | 0x014 | W | Same as above, index 5 |
| MODEL_UPDATE_WORD_6 | 0x018 | W | Same as above, index 6 |
| MODEL_UPDATE_WORD_7 | 0x01C | W | Same as above, index 7 |
| MODEL_UPDATE_WORD_8 | 0x020 | W | Same as above, index 8 |
| MODEL_UPDATE_WORD_9 | 0x024 | W | Same as above, index 9 |
| MODEL_UPDATE_WORD_10 | 0x028 | W | Same as above, index 10 |
| MODEL_UPDATE_WORD_11 | 0x02C | W | Same as above, index 11 |
| MODEL_UPDATE_WORD_12 | 0x030 | W | Same as above, index 12 |
| MODEL_UPDATE_WORD_13 | 0x034 | W | Same as above, index 13 |
| MODEL_UPDATE_WORD_14 | 0x038 | W | Same as above, index 14 |
| MODEL_UPDATE_WORD_15 | 0x03C | W | Same as above, index 15 |
| MODEL_UPDATE_WORD_16 | 0x040 | W | Same as above, index 16 |
| MODEL_UPDATE_WORD_17 | 0x044 | W | Same as above, index 17 |
| MODEL_UPDATE_WORD_18 | 0x048 | W | Same as above, index 18 |
| MODEL_UPDATE_WORD_19 | 0x04C | W | Same as above, index 19 |
| MODEL_UPDATE_WORD_20 | 0x050 | W | Same as above, index 20 |
| MODEL_UPDATE_WORD_21 | 0x054 | W | Same as above, index 21 |
| MODEL_UPDATE_WORD_22 | 0x058 | W | Same as above, index 22 |
| MODEL_UPDATE_WORD_23 | 0x05C | W | Same as above, index 23 |
| MODEL_UPDATE_WORD_24 | 0x060 | W | Same as above, index 24 |
| MODEL_UPDATE_WORD_25 | 0x064 | W | Same as above, index 25 |
| MODEL_UPDATE_WORD_26 | 0x068 | W | Same as above, index 26 |
| MODEL_UPDATE_WORD_27 | 0x06C | W | Same as above, index 27 |
| MODEL_UPDATE_WORD_28 | 0x070 | W | Same as above, index 28 |
| MODEL_UPDATE_WORD_29 | 0x074 | W | Same as above, index 29 |
| MODEL_UPDATE_WORD_30 | 0x078 | W | Same as above, index 30 |
| MODEL_UPDATE_WORD_31 | 0x07C | W | Same as above, index 31 |
| MODEL_UPDATE_CONTROL | 0x080 | W | Type of word and target address of the update |
To see how to use these registers to update the DDR-free model on the FPGA device, refer to Updating Hostless DDR-Free MIF Files Through the CSR