Who’s at Fault? A Look at Post-Quantum Cryptography and Fault...

Author: Daniel Dinu, INT31 Offensive Security Researcher

Introduction

Post-Quantum Cryptography (PQC), also known as quantum-resistant cryptography, is a popular topic in many conversations, from security conferences to internal meetings at companies. The term describes those cryptographic algorithms designed to withstand attacks by quantum computers in addition to attacks by classical computers. Motivated by the possible realization of powerful quantum computers that could break existing public-key cryptographic algorithms, such as the Elliptic Curve Digital Signature Algorithm (ECDSA), the National Institute of Standards and Technology (NIST) started a standardization process with a call for proposals for public-key PQC algorithms in December 2016. After a public selection process that spanned three rounds and took about six years, NIST announced that the first four algorithms would be standardized.

Selected Proposal	Federal Information Processing Standard (FIPS)
CRYSTALS-Kyber	203: Module-Lattice-Based Key-Encapsulation Mechanism (ML-KEM)
CRYSTALS-Dilithium	204: Module-Lattice-Based Digital Signature Algorithm (ML-DSA)
SPHINCS⁺	205: Stateless Hash-Based Digital Signature Algorithm (SLH-DSA)
Falcon	206: Fast-Fourier Transform Over NTRU-Lattice-Based Digital Signature Algorithm (FN-DSA)
HQC	TBA

Shortly after NIST announced the first four PQC algorithms to be standardized, the National Security Agency (NSA) included the primary algorithms, namely ML-KEM and ML-DSA, in the Commercial National Security Algorithm Suite 2.0 (CNSA 2.0), which comprises the algorithms approved for use in National Security Systems (NSSs) and presented the transition timeline. Meanwhile, NIST added a fifth algorithm to the list of algorithms to be standardized, and others may follow as a result of the ongoing selection process.

With the transition to PQC algorithms in full swing in organizations worldwide, one important aspect is the secure implementation of the new algorithms to be deployed. This post will discuss the importance of secure implementations of PQC, with a focus on fault injection attacks and countermeasures. After reading it, you can determine the best answer to the overarching question depending on your organization's particular use cases of PQC implementations.

Implementation Attacks

Although the PQC algorithms selected by NIST have a strong theoretical basis and an encouraging cryptanalytic history, an attacker may try to break their security through implementation attacks, a class of attacks that exploit the implementation characteristics of a cryptographic algorithm and the physical characteristics of the device that executes the algorithm. The goal of an attacker is to recover a sensitive value, such as a part of the secret key, or to bypass certain operations, such as digital signature verification.

Implementation attacks are commonly classified using two criteria. The first classification splits attacks into active and passive categories, depending on whether the attacker directly interacts with the target or simply observes its behavior. The second one is based on the level of intrusion and has three groups, namely non-invasive, semi-invasive, and invasive attacks. The more invasive an attack, the more expensive it is expected to be. Implementation attacks are considered physical attacks because they require possession of or proximity to the target. However, some of these attacks can be performed remotely as well.

Fault attacks are active, typically non-invasive or semi-invasive, attacks that perturb the execution of a device to modify its normal operation. This may result in modification of the program execution flow, such as skipping an instruction or changing an intermediate value. The latter may produce a different output that can help an attacker learn something about a secret value processed by the device.

ML-DSA

The Module-Lattice Digital Signature Algorithm (ML-DSA) is the primary digital signature algorithm standardized by NIST in FIPS 204 and approved by the National Security Agency (NSA) for use in National Security Systems (NSSs). It is based on the CRYSTALS-Dilithium proposal, a lattice scheme. The NIST standard includes three parameter sets, which correspond to different cryptographic strengths, namely category 2, 3, and 5, and are referred to as ML-DSA-44, ML-DSA-65, and ML-DSA-87, respectively.

Like any digital signature algorithm, ML-DSA provides three primitives for key generation, signature generation, and signature verification. Key generation is used to create a key pair that consists of a private key and its corresponding public key. The signing operation takes as input a message and the private key and produces a signature. Given a message, a signature, and the public key, the verification operation returns a value indicating whether the signature is valid.

All the above operations can be a target for implementation attacks, especially when not properly implemented or protected. In this post, we will consider an attacker whose goal is to forge signatures, and we focus on the signature generation shown below. One can notice that this algorithm is more complex than existing public-key algorithms.

The gist of it is that a signature is a pair (c,z), which contains a vector of polynomials z that is computed as z=y+cs₁, where y is a vector of polynomials, c is a challenge polynomial and s₁ is a secret vector of polynomials (and hence part of the secret key). The coefficients of the polynomials in the secret vector are from a small interval (for example, [-2,2] for ML-DSA-44). The coefficients of c are from the set {-1,0,1} and a fixed number (for example, 39 for ML-DSA-44) of them are different from zero. The coefficients of the polynomials in y have a much larger range (for example, [-2¹⁷+1,2¹⁷] for ML-DSA-44) and therefore they mask the product cs₁ which otherwise would allow recovery of the secret vector s₁ from a signature (c,z). Since the knowledge of y allows retrieval of the secret key vector s₁, it must be treated as a sensitive variable.

A simplified graphical representation of the algorithm is given in the figure below. The public key pk consists of two components, shown as green diamonds, namely the matrix of polynomials A and the vector of polynomials t. The secret key sk consists of a bitstream K and two vectors of polynomials with small coefficients s₁and s₂, each shown as a red hexagon. Finally, the components of a signature σ=(c,z) are shown as gray triangles.

Loop Abort Attack and Countermeasures

The vector of polynomials y is generated using the ExpandMask function (line 11 of the signature generation algorithm). Looking at the two algorithms below (line 5 of ExpandMask and lines 3 – 5 of BitUnpack), you can see that the coefficients of the polynomials in vector y are generated sequentially.

Assuming the vector of polynomials is initialized with zeros before the coefficients are generated, a fault attacker may try to skip the assignment of the coefficients (see below an excerpt from the reference implementation). If they succeed, some of the coefficients in y will be set to zero. This attack was described and demonstrated in Loop-Abort Faults on Lattice-Based Fiat–Shamir and Hash-and-Sign Signatures and Loop-Abort Faults on Lattice-Based Signature Schemes and Key Exchange Protocols. Having one or a few signatures with some or all coefficients of y set to zero, the attacker can recover the value of s₁ using lattice reduction techniques. In the simplest case, if all coefficients of the polynomials in vector y are equal to zero, then z=cs₁ can be solved for s₁. In the slightly more complicated case, when only the high-degree coefficients of the polynomials in vector y are set to zero, the attacker uses a faulty signature to solve z=ε+cs₁ for s₁, where ε is an error, using lattice reduction techniques.

To prevent this attack, one can use several approaches. One countermeasure is to check the value of the loop counter at the end of the loop or to add redundancy. Another mitigation is to check the degree or the entropy of the polynomials in vector y. The authors of the papers describing this attack also recommended shuffling. Next, we will see why shuffling is not an effective countermeasure for this operation.

A subsequent paper, Loop Aborts Strike Back: Defeating Fault Countermeasures in Lattice Signatures with ILP, showed that an attack is possible even if the shuffling countermeasure is applied. A fault on a shuffled implementation results in sparse polynomials in vector y instead of low-degree polynomials. If one collects enough faulty signatures, they can solve a system of equations for s₁ using Integer Linear Programming (ILP).

A memory zeroing fault attack on a hardware implementation can achieve a similar outcome to a loop abort fault attack on a software implementation.

All other previously mentioned countermeasures, except shuffling, are effective against these attacks if properly implemented. In addition to those, one can randomly initialize the content of y or simply use whatever is in memory at that time instead of initializing the vector with zeros. Another mitigation is to store the coefficients of each polynomial and their complements in memory.

Forging Signatures

Although the secret key sk has three components, knowledge of s₁ is enough to forge signatures (i.e., to create a valid signature for an arbitrary message). Signature forgery algorithms using the recovered secret vector s₁were described in the Differential Fault Attacks on Deterministic Lattice Signatures and Exploiting Determinism in Lattice-based Signatures.

Putting all together, a successful attack has the following three steps:

Fault the generation of the coefficients of polynomials in vector y (for one or multiple signature generation operations).
Solve the resulting system of equations for secret vector s₁.
Forge signatures using the recovered secret vector s₁ for any arbitrary message m.

An attacker can repeat the last step multiple times until the corresponding digital certificate is revoked. An implementation of these steps is available at ML-DSA FIA Demo. You can use this code to explore the details of each step and run experiments with different parameters and countermeasures.

Our Approach

Our research team has been studying the security of PQC implementation in the context of the NIST standardization process, and we have contributed to the community in various ways (for example, papers, panels, tutorials, workshops). All our learnings and insights help influence the design and implementation of Intel products. Our products use various detection and prevention techniques, and we explore opportunities to incorporate the countermeasures mentioned in this post and other protection methods. For example, the Intel® Converged Security and Management Engine (Intel® CSME) benefits from an additional layer of protection provided by the Tunable Replica Circuit (TRC), which can detect and react to certain fault injection attacks. TRC uses hardware-based sensors to detect circuit-based timing failures resulting from a fault injection attack. This solution was tested and validated before it was first delivered to the 12^th Gen Intel^® Core™ processor family. For more information, see also the evolution of Intel^® Security Engines.

We have evaluated countermeasures against physical side-channel attacks, such as protections for major building blocks like the Secure Hash Algorithm-3 (SHA-3) and polynomial multiplication. We have also conducted analyses of the PQC algorithms to identify the appropriate countermeasures against fault injection and side-channel attacks. Finally, we are continuing our work in this field and look forward to an industry-wide successful and secure deployment of PQC.

Conclusion

This post covered the importance of secure implementations of PQC, a crucial aspect of a successful transition to the new algorithms. We focused on a fault attack that targets the generation of the coefficients of the polynomials in vector y in the signature generation operation. The resulting system of equations containing faulty signatures can be solved using lattice reduction techniques or integer linear programming (ILP) to recover the secret vector s₁. Having the secret vector s₁, the attacker can forge digital signatures for arbitrary messages. Finally, effective countermeasures were discussed.

We cover only one of the multiple implementation aspects of the signing operation. Compared to the public-key algorithms currently in use, the PQC algorithms are more complex and consequently susceptible to more attack vectors. Properly implementing and protecting the large number of PQC algorithms selected by NIST for standardization is an additional challenge due to each algorithm's complexity. By now, you should have your answer to the overarching question. Moving forward, do not overlook the importance of secure implementation, keep an eye on the field, and be prepared to adapt as new attacks are demonstrated.

Share Your Feedback

We want to hear from you. Send comments, questions, and feedback to the INT31 team.

About the Author

Daniel Dinu is an offensive security researcher on the INT31 team within the Intel Product Assurance and Security (IPAS) organization. His primary focus is the security of cryptographic algorithms against implementation attacks, including Post-Quantum Cryptography (PQC). Daniel co-organized a tutorial about implementation attacks and countermeasures for PQC at the Design Automation Conference 2024, has been a contributor at past IEEE International Symposium on Hardware Oriented Security and Trust (HOST) events, and co-authored several publications on the topic.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Who’s at Fault? A Look at Post-Quantum Cryptography and Fault Injection Attacks