Encryption/Decryption Ciphers Used for data at rest or data in motion for AI and HPC Servers

In this post I want to discuss specifically AES flavors of block ciphers and stream ciphers. These are widely used in block cipher mode of secure communication in IPSEC and TLS protocols, it is also used for file system and disk encryption besides that most importantly used with authenticated encryption in cloud services for accesses involving data and APIs. Important property is that this cipher (AES-GCM) can be parallelized for authentication and encryption. Ensures confidentiality, data integrity, and authenticity simultaneously.

For data center applications supporting AI servers, AES in its authenticated encryption modes like AES-GCM or AES-CTR with separate integrity mechanisms is generally more desired.Some key attributes

  • Offers encryption (confidentiality) and authentication (integrity), which are components of secure data handling.
  • It can be parallelized and is hardware-accelerated in many modern CPUs, supporting high throughput for large-scale data processing.
  • It is supported in network protocols, storage encryption, and cloud services frequently utilized in data centers.
  • Efficient on data-at-rest and data-in-transit, making it ideal for AI workloads involving large model files and datasets

Block ciphers processes data in fixed block size (like 64bits or 128bits at a time) whereas stream ciphers process it in bits and bytes granularity. Examples of Block ciphers with authentication are AES-GCM and AES-GCM-SIV whereas examples of stream ciphers with authentication is ChaCha20-Poly1305. In general Block Ciphers are slower compared to stream ciphers but provide stronger diffusion and security. These are more suitable for storage and secure communication whereas stream ciphers are fast and designed for use in real time or streaming data encryption. I will be discussing AES-GCM in this post since it is extensively used and suited for AI servers in data center.

It should be noted that AES-GCM can be replaced with AES-CTR (streaming cipher) + HMAC (authentication), Also in case of storage another AES Cipher called AES-XTS is used which does not provide any authentication.

Besides Block and Stream ciphers there are post quantum ciphers being productized like ML-KEM (module lattice key encapsulation methods-lattice based) and HQC (Hamming Quasi Cyclic error correcting code) taken from FIPS-203. The AES is still interesting though falls in pre quantum ciphers since qubits required in quantum computers (Shor’s algorithm) to break this cipher is far out in time and besides some of its flavors can be used both for Data in Motion as well as Data at Rest.

AES GCM needs Encryption Mode (AES in Counter Mode with GHASH Authentication) and Decryption Mode (Verification and Decryption with GHASH Authentication Check). Uses the AES block cipher with a 128-bit block size, operating in counter mode (CTR) for encryption combined with a Galois field multiplication-based authentication called GHASH to produce an authentication tag.

AES-GCM Encryption can be described as follows:

Input: Plaintext P split into blocks P[1], …, P[m], Associated Data A (authenticated only), Key K, Initialization Vector IV

Output: Ciphertext C[1…m], Authentication Tag T

Steps:

  • H = AES_Encrypt(K, 0^128)  #Create hash subkey from AES of zero block
  • J0 = Generate_IV_Block (IV) #Set up initial counter block
  • For i = 1 to m:
    • //Increment Counter (J0, i): Increments the starting counter J0 //to form the ith counter block
      • CTR_i = Increment Counter (J0, i)
      • //AES Encrypt (K, block): AES encryption of a 128-bit block //with key K.
      • S_i = AES Encrypt (K, CTR_i)
      • // Encrypt by XORing plaintext with keystream
      • C[i] = P[i] XOR S_i   
  • //Now for GHASH (H, A, C): A polynomial hash function over the finite field GF (2^128) processes associated data //(A) and ciphertext (C) to produce an authentication value.
  • X = GHASH (H, A, C)  
    • // Generate authentication tag
  • T = First_128_bits (AES_Encrypt(K, J0) XOR X)
  • Return (C, T)

AES-GCM Decryption can be described as follows:

Input: Ciphertext C[1…m], Associated Data A, Key K,Initialization Vector IV, Authentication Tag T

Output: Plaintext P[1…m] or an error if authentication fails

Steps:

//use same hash subkey as used in encryption

  • H = AES_Encrypt (K, 0^128)
  • //IV should be unique per encryption to maintain security
  • J0 = Generate_IV_Block (IV)
  • // Compute authentication tag:
  • X = GHASH (H, A, C)
  • //tag T ensures both data confidentiality and integrity/authenticity
  • T_computed = First_128_bits (AES_Encrypt (K, J0) XOR X)
  • If T_computed != T:
    • return Authentication_Error
  • For i = 1 to m:
    • CTR_i = Increment Counter (J0, i)
    • S_i = AES Encrypt (K, CTR_i)

// XOR ciphertext with keystream to get plaintext

  • P[i] = C[i] XOR S_i
  • Return P

AES-GCM can be implemented parallel, a very useful property because GHASH function, used for authentication, involves Galois field (GF (2^128)) multiplications which is inherently sequential, but it can be parallelized by dividing the data into multiple sub-blocks and performing several multiplications concurrently. These partial results are then combined (XORed) to produce the final authentication tag. Also, AES encrypts sequential counter blocks, which are independent of one another. This independence allows multiple blocks to be encrypted simultaneously without waiting for previous blocks to finish.

Option to do encryption/decryptions implementation using ISA extensions:

X86: AES-GCM can leverage AVX-512 vector instructions, which enable SIMD (Single Instruction Multiple Data) parallelism. VAES (Vector AES) instructions within AVX-512 accelerate multiple AES blocks in parallel, improving throughput significantly for bulk encryption or decryption. The vectorized approach provides AES-GCM’s parallelism in counter mode encryption of blocks, enabling many blocks processed simultaneously. Vectorized instructions also accelerate the GHASH operation (using carry-less multiply PCLMULQDQ), which is essential for GCM authentication. AES-NI scalar extensions can be combined along with vector extension with some care for AES-GCM implementation.

ARMv8: NEON SIMD + EOR3 (optional) can be used to parallelize AES-CTR and GHASH calculations. Besides these AESE (AES single round encryption), AESD (AES single round decryption), AESMC (MixColumns transform operate on 128-bit registers), PMULL (essential polynomial multiplication carry-less instructions for GHASH computation) scalar extension used to accelerate AES rounds and GHASH needed for AES-GCM. Note, EOR3 instruction (three-way XOR) used to optimize Galois field operations and reduce instruction count whereas NEON SIMD engine can process multiple 128-bit AES blocks in parallel using vector registers.

RISC V: I am still trying to explore RISC V support for it but RISC-V vector cryptography extension (Zvk series) provides vectorized AES instructions to exploit data-level parallelism. Vector instructions enable parallel processing of multiple AES blocks simultaneously, enhancing throughput for bulk encryption in counter mode. The vector extensions include instructions to accelerate the polynomial carry-less multiplication essential for GHASH, enabling highly efficient and parallelizable authentication.

Next post I will discuss PQ (post quantum) mechanisms like ML-KEM and HQC which are used for applications like Blockchain servers for Post Quantum TLS protocols which are resistant to attacks by quantum computers.