Intel Acceleration Stack for Intel® Xeon® CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual

ID 683193
Date 11/04/2019
Public
Document Table of Contents

1.3.13.1.2. Memory Consistency Explained

CCI-P can re-order requests to the same and different addresses. It does not implement logic to identify data hazards for requests to same address.

Two Writes to the Same VC

Memory may see two writes to the same VC in a different order from their execution, unless the second write request was generated after the first write response was received. This is commonly known as a write after write (WaW) hazard.

The table below shows two writes to the same VC when the second write is executed after the first write is received.

Table 34.  Two Writes to Same VC, Only One Outstanding
AFU Processor

VH1: Write1 Addr=X, Data=A

Resp 1

VH1: Write2 Addr=X, Data=B

Resp 2

Read1 Addr=X, Data = A

Read2 Addr=X, Data = B

AFU writes to address X twice on same VC, but it only sends the second write after the first write is received. This ensures that the first write was sent out on the link, before the next one goes out. The CCI-P guarantees that these writes are seen by the Processor in the order that they were issued. The processor sees Data A, followed by Data B when reading from address X multiple times.

Use a WrFence instead to enforce ordering between writes to same VC. Note that WrFence has stronger semantics, it stalls processing all writes after the fence until all previous writes have completed.

Two Writes to Different VCs

The table below shows two writes to different VCs may be committed to memory in a different order than they were issued.

Table 35.  Write Out of Order Commit
AFU Processor

VH1: Write1 X, Data=A

VL0: Write2 X, Data=B

Read1 X, Data = B

Read2 X, Data = A

AFU writes to X twice, Data=A over VH1 and Data=B over VL0. The processor polls on address X and may see updates to X in reverse order; that is, the CPU may see Data=B, followed by Data=A. In summary, the write order seen by the processor may be different from the order in which AFU completed the writes. Writes to separate channels have no ordering rules and as a result you should broadcast a write fence to VA to synchronize across them.

The table below shows the use of WrFence to enforce write ordering.

Table 36.  Use WrFence to Enforce Write Ordering
AFU Processor

VH1: Write1 Addr=X, Data=A

VA: WrFence

VL0: Write2 Addr=X, Data=B

Read1 Addr=X, Data = A

Read2 Addr=X, Data = B

This time the AFU adds a VA WrFence between the two writes. The WrFence ensures that the writes become visible to the processor before the WrFence followed by the writes after the WrFence. Hence, the processor sees Data=A and then Data=B. The WrFence was issued to VA, because the writes to be serialized were sent on different VCs.

Two Reads from Different VCs

Issuing reads to different VCs may complete out of order; the last read response may return old data.

The table below shows how reads from the same address over different VCs may result in re-ordering.

Table 37.  Read Re-Ordering to Same Address, Different VCs
Processor AFU

Store addr=X, Data=A

Store addr=X, Data=B

Request Response
VH1: Read1 Addr=X
VL0: Read2 Addr=X
VL0: Resp2 Addr=X, Data=B
VH1: Resp1 Addr=X, Data=A
Processor writes X=1 and then X=2. The AFU reads address X twice over different VCs. Read1 was sent on VH1 and Read2 on VL0. The FIU may re-order the responses and return data out of order. AFU may see X=2, followed by X=1. This is different from the processor write order.

Two Reads from the Same VC

Reads to the same VC may complete out of order; the last read response always returns the most recent data. The last read response may correspond to an older read request as shown in the following table.
Note: VA reads behave like two reads from different VCs.
The following table shows how reads from the same address over the same VC may result in re-ordering. However, the AFU sees updates in the same order in which they were written.
Table 38.  Read Re-Ordering to Same Address, Same VC
Processor AFU

Store Addr=X, Data=A

Store Addr=X, Data=B

Request Response
VL0: Read1 Addr=X
VL0: Read2 Addr=X
VL0: Resp2 Addr=X, Data=A
VL0: Resp1 Addr=X, Data=B

Processor writes X=1 and then X=2. The AFU reads address X twice over the same VC. Both Read1 and Read2 are sent to VL0. The FIU may still re-order the read responses, but the CCI-P standard guarantees to return the newest data last; that is, the AFU sees updates to address X in the order in which processor writes to it.

When using VA, FIU may return data out of order, because VA requests may be directed to VL0, VH0 or VH1.

Read-After-Write from Same VC

CCI-P standard does not order read and write requests to even the same address. The AFU must explicitly resolve such dependencies.

Read-After-Write from Different VCs

The AFU cannot resolve a read-after-write dependency when different VCs are used.

Write-after-Read to Same or Different VCs

CCI-P does not order write after read requests even when they are to the same address. The AFU must explicitly resolve such dependencies. The AFU must send the write request only after read response is received.

Transaction Ordering Example Scenarios

Transactions to the Same Address—More than one outstanding read/write requests to an address results in non-deterministic behavior.
  • Example 1: Two writes to same address X can be completed out of order. The final value at address X is non-deterministic. To enforce ordering add a WrFence between the write requests. Or, wait for the response from the first write to return before issuing the second write if the same virtual channel is accessed.
  • Example 2: Two reads from same address X, may be completed out of order. This is not a data hazard, but an AFU developer should make no ordering assumptions. The second read response received contains the latest data stored at address X assuming both reads are issued to the same virtual channel.
  • Example 3: Write to address X, followed by read from address X. It is non-deterministic; that is, the read returns the new data (data after the write) or the old data (data before the write) at address X. To ensure the latest data is read wait for the write response to return before issuing the read to address X using the same virtual channel.
  • Example 4: Read followed by write to address X. It is non-deterministic; that is, the read returns the new data (data after the write) or the old data (data before the write) at address X.

    Use the read responses to resolve read dependencies.

Transactions to Different Addresses—Read/write requests to different addresses may be completed out of order.
  • Example 1: AFU writes the data to address Z and then wants to notify the SW thread by updating a value of flag at address X.

    To implement this, the AFU must use a write fence between write to Z and write to X. The write fence ensures that Z is globally visible before write to X is processed.

  • Example 2: AFU reads data starting from address Z and then wants to notify a software thread by updating the value of flag at address X.

    To implement this, the AFU must perform the read from Z, wait for all the read responses and then perform the write to X.