All Notes
C++ low-latency design patterns
Reading time
2 mins
Low-Latency Design Patterns - Key Sections for HFT/Quant Dev Interviews
Some interesting points worth diving deeper into from: C++ design patterns for low-latency applications including high-frequency trading
Section 2: Background
- 2.3 Design Patterns: 13 optimization techniques overview (cache warming, constexpr, SIMD, lock-free, etc.)
- 2.4 LMAX Disruptor: Lock-free inter-thread communication avoiding context switches
- 2.6 Cache Analysis: L1/L2/L3 hierarchy, cache hits/misses fundamentals
- 2.7 Networking: Kernel bypass, FPGAs, colocation strategies
Section 3: Low-Latency Programming Repository
3.1 Compile-Time Features
- Cache Warming (90% improvement): Pre-load hot path for trade signals
- Compile-time Dispatch (26%): Templates vs virtual functions
- Constexpr (90%): Move computations to compile-time
- Inlining (20.5%): Eliminate function call overhead
3.2 Optimization Techniques
- Loop Unrolling (72%): Reduce loop control overhead
- Short-circuiting (50%): Early boolean expression termination
- Slowpath Removal (12%): Separate error handling from hot path
- Branch Reduction (36%): Minimize branch misprediction penalties
- Prefetching (23.5%): Hint CPU about future data needs
3.3 Data Handling
- Signed vs Unsigned (12%): Assembly-level optimization
- Float/Double Mixing (52%): Avoid implicit type conversions
3.4 Concurrency
- SIMD Instructions (49%): Parallel data processing with AVX2/SSE
- Lock-Free Programming (63%): Atomic operations, CAS patterns
3.5 System Programming
- Kernel Bypass (7x reduction): DPDK for direct network I/O
Section 4: Pairs Trading Strategy
- 4.2 Cointegration: Statistical tests (Engle-Granger, ADF) for pair selection
- 4.3 Methodology: Z-score based signal generation
- 4.4 Optimization: Combined techniques achieve 87% latency reduction (517μs → 65μs)
- 4.5 Results: Links speed to profitability—78% reduction in adverse selection exposure
Section 5: Disruptor Pattern
- 5.1 Core Concepts: Ring buffer, sequencer, wait strategies
- 5.3 Performance: 38-55% faster than standard queue, scales better with load
- 5.4 Results: Avoids lock contention, better cache utilization, predictable memory access
Section 6: Evaluation
- 6.2 Trading Evaluation: Statistical validation (t-tests), cache analysis showing instruction count vs miss rate trade-offs
- 6.3 Disruptor Evaluation: 2x speed improvement, statistically significant (p < 10⁻²³)
Other notes about Low Latency and/or C++
- 🌿What every programmer should know about memory
What every programmer should know about memory
- 🌲Part 1. Thread Pools
Thread pooling with c++20 primitives
- 🌲Part 2. Work Stealing Thread Pools
Work Stealing Thread Poools C++20
- 🌿Developing a deep learning framework
Developing a deep learning framework
- 🌱MPMC Queue
MPMC Queue
- 🌱Atomics
Atomics
- 🌿SPSC Queue
SPSC Thread-Safe Queue
- 🌿Implementing STL's std::shared_ptr
Implementing STL's std::shared_ptr
- 🌿Implementing STL's std::unique_ptr
Implementing STL's std::unique_ptr
- 🌿Implementing STL's std::vector
A brief description of what this note covers
- 🌿Type Erasure in C++
Type Erasure in C++