All Notes

C++ low-latency design patterns

Last updated

Jan 01, 2026

Low-Latency Design Patterns - Key Sections for HFT/Quant Dev Interviews

Some interesting points worth diving deeper into from: C++ design patterns for low-latency applications including high-frequency trading

Section 2: Background

  • 2.3 Design Patterns: 13 optimization techniques overview (cache warming, constexpr, SIMD, lock-free, etc.)
  • 2.4 LMAX Disruptor: Lock-free inter-thread communication avoiding context switches
  • 2.6 Cache Analysis: L1/L2/L3 hierarchy, cache hits/misses fundamentals
  • 2.7 Networking: Kernel bypass, FPGAs, colocation strategies

Section 3: Low-Latency Programming Repository

3.1 Compile-Time Features

  • Cache Warming (90% improvement): Pre-load hot path for trade signals
  • Compile-time Dispatch (26%): Templates vs virtual functions
  • Constexpr (90%): Move computations to compile-time
  • Inlining (20.5%): Eliminate function call overhead

3.2 Optimization Techniques

  • Loop Unrolling (72%): Reduce loop control overhead
  • Short-circuiting (50%): Early boolean expression termination
  • Slowpath Removal (12%): Separate error handling from hot path
  • Branch Reduction (36%): Minimize branch misprediction penalties
  • Prefetching (23.5%): Hint CPU about future data needs

3.3 Data Handling

  • Signed vs Unsigned (12%): Assembly-level optimization
  • Float/Double Mixing (52%): Avoid implicit type conversions

3.4 Concurrency

  • SIMD Instructions (49%): Parallel data processing with AVX2/SSE
  • Lock-Free Programming (63%): Atomic operations, CAS patterns

3.5 System Programming

  • Kernel Bypass (7x reduction): DPDK for direct network I/O

Section 4: Pairs Trading Strategy

  • 4.2 Cointegration: Statistical tests (Engle-Granger, ADF) for pair selection
  • 4.3 Methodology: Z-score based signal generation
  • 4.4 Optimization: Combined techniques achieve 87% latency reduction (517μs → 65μs)
  • 4.5 Results: Links speed to profitability—78% reduction in adverse selection exposure

Section 5: Disruptor Pattern

  • 5.1 Core Concepts: Ring buffer, sequencer, wait strategies
  • 5.3 Performance: 38-55% faster than standard queue, scales better with load
  • 5.4 Results: Avoids lock contention, better cache utilization, predictable memory access

Section 6: Evaluation

  • 6.2 Trading Evaluation: Statistical validation (t-tests), cache analysis showing instruction count vs miss rate trade-offs
  • 6.3 Disruptor Evaluation: 2x speed improvement, statistically significant (p < 10⁻²³)

Other notes about Low Latency and/or C++