Cracking the KernelCTF Proof of Work in Under 0.23 Seconds with AVX512IFMA for a $51,000 Bounty

Beating the kernelCTF Proof of Work with AVX512IFMA for $51,000 In May 2025, Timothy Herchen, along with his teammates William Liu (FizzBuzz101) and Savy Dicanosa (Syst3mFailure) from the Crusaders of Rust, discovered a use-after-free bug in Linux's packet scheduler. This bugfix patch provided additional details on the vulnerability. William found the bug while fuzzing Linux for his master's thesis. Their goal was to submit this bug to Google's kernelCTF competition for a potential $51,000 bounty. However, winning the bounty required solving the challenge within a tight window, as the first team to submit the flag received the payout, while any subsequent submissions were marked as duplicates. The Submission Process The kernelCTF submission process involves several steps: Connect to the server at 12:00:00 UTC. Solve the proof of work (PoW), a function designed to take a few seconds. Wait for the instance to boot, which takes approximately 2.5 seconds. Upload and run the exploit to secure the flag, optimized to take around 0.55 seconds. Submit the flag to a Google Form, with the timestamp determining the winner. Professional teams had aggressively optimized their submission processes, and in the previous round, the winning team managed to submit the flag 4.5 seconds after the server went live, suggesting a sub-second PoW solve time. This raised the stakes significantly for Timothy and his team. Optimizing the Sloth VDF Solver The proof of work used in the competition is a verifiable delay function (VDF) called "sloth," introduced by Lenstra and Wesolowski in 2015. Timothy's initial effort focused on optimizing the 1280-bit modular squaring operation, translating the function from Python to C++ and leveraging the Mersenne modulus to simplify the computation. This reduced the solve time from 4 seconds to 1.9 seconds on his M1 MacBook Pro. Further optimizations, including static linking of GMP and running on a faster Intel Ice Lake laptop, pushed the time down to 1.4 seconds. However, achieving a sub-second PoW solve time required more advanced techniques. Timothy turned to the AVX512 Integer Fused Multiply-Add (AVX512IFMA) extension, which is designed to speed up big-integer arithmetic. He wrote a custom C++ implementation using AVX512IFMA to handle the modular squaring and the double summation, significantly reducing the number of required multiplications and shuffles. Key Implementations Radix Conversion: Converted the 1280-bit integer to a 52-bit representation, suitable for AVX512IFMA operations. This required padding and careful handling of limb boundaries. Accumulation Optimization: Used fourteen accumulators (seven for low and seven for high halves) to keep the multiply-add instructions in flight, addressing the 4-cycle latency issue. Inline Assembly: Employed inline assembly to force the use of memory broadcast operands, preventing register spilling and improving performance. Unaligned Loads Simulation: Utilized the valignq instruction to simulate unaligned loads, avoiding store-forwarding stalls. Final PoW Solve Time By implementing these optimizations, Timothy and his team reduced the PoW solve time to 0.21 seconds on a rented Ryzen 9950X server. The total submission time, including VM boot and exploit execution, was 3.6 seconds—the fastest submission ever recorded in the kernelCTF competition. Show Time On May 16, 2025, Timothy's friends prepared for the submission. They used a Zen 5 Google Cloud server located in the Netherlands to minimize network latency. Just before 5:00 a.m. PST, they tested the system with a dummy flag, ensuring everything was functioning correctly. At the designated time, they connected to the server, solved the PoW, ran the exploit, retrieved the flag, and submitted it to the Google Form. They succeeded, and their submission was confirmed by the kernelCTF organizers later that day. Industry Insights and Company Profiles The success of Timothy's team highlights the importance of advanced optimization techniques in competitive cybersecurity and technology challenges. Companies like Google offer significant bounties to incentivize such discoveries, as they help identify and fix critical vulnerabilities. The use of AVX512IFMA demonstrates the power of specialized hardware extensions in accelerating computationally intensive tasks, which can be crucial in time-sensitive scenarios. However, on May 28, 2025, kernelCTF organizer koczkatamas announced the removal of the PoW, citing the competitive advantage it gave to those with access to specialized hardware. While this ends the era of PoW-based challenges, it levels the playing field and focuses more on the quality and reliability of exploits. Timothy credits the success to his team's collaborative efforts and the optimization skills taught by his CSE 260 professor, Bryan Chin. His experience in competitive programming and familiarity with AVX512IFMA played a pivotal role in their breakthrough. Conclusion This victory showcases the intersection of advanced computing techniques and cybersecurity. Timothy and his team's ability to optimize the PoW solver using AVX512IFMA not only secured them a significant financial reward but also contributed to the advancement of secure software practices. Their work stands as a testament to the ingenuity and expertise of the cybersecurity community, and it sets a new benchmark for future competitions.

Cracking the KernelCTF Proof of Work in Under 0.23 Seconds with AVX512IFMA for a $51,000 Bounty

Related Links