PIGEON: A High Throughput Framework for Private Inference of Neural Networks using Secure Multiparty Computation

Published in 2025 25th Privacy Enhancing Technologies Symposium (PETS), 2025

Recommended citation: Christopher Harth-Kitzerow, Yongqin Wang, Rachit Rajat, Georg Carle, Murali Annavaram, "PIGEON: A High Throughput Framework for Private Inference of Neural Networks using Secure Multiparty Computation," 2025 25th Privacy Enhancing Technologies Symposium (PETS). https://eprint.iacr.org/2024/1371

Privacy-Preserving Machine Learning (PPML) is one of the most relevant use cases for Secure Multiparty Computation (MPC). While private training of large neural networks such as VGG16 or ResNet50 on state-of-the-art datasets such as Imagenet is still out of reach, given the performance overhead of MPC, GPU-based MPC frameworks are starting to achieve practical runtimes for private inference. However, we show that in contrast to plaintext machine learning, the usage of GPU acceleration for both linear (e.g. convolutions) and nonlinear neural network layers (e.g. ReLU) is actually counterproductive in PPML: While GPUs effectively accelerate linear layers compared to CPU-based MPC implementations, the MPC circuits required to evaluate non-linear layers introduce memory overhead and frequent data movement between the GPU and the CPU to handle network communication. This results in slow ReLU performance and high GPU memory requirements in state-of-theart GPU-based PPML frameworks, hindering them from scaling to multiple images per second inference throughput and more than eight images per batch on ImageNet.

To overcome these limitations, we propose PIGEON, an opensource 1 framework for Private Inference of Neural Networks. PIGEON utilizes a novel ABG programming model that switches between Arithmetic vectorization and Bitslicing on the CPU for non-linear layers depending on the MPC-specific computation required while offloading linear layers to the GPU.

Compared to the state-of-the-art PPML framework Piranha, PIGEON achieves two orders of magnitude improvements in ReLU throughput, reduces peak GPU memory utilization by one order of magnitude, and scales better with large batch sizes. This translates to one to two orders of magnitude improvements in throughput for large ImageNet batch sizes (e.g. 192) and more than 70% saturation of a 25 Gbit/s network.