
Inside a modern crypto trading hub, the matching engine and pricing feeds run on separate but synchronized hardware threads. The matching engine uses a price-time priority queue, often implemented in FPGA or kernel-bypass networking, to process orders in under 100 nanoseconds. Simultaneously, the low-latency pricing feed aggregates data from 10+ exchanges, normalizes it, and pushes delta updates via multicast UDP. The terminal’s middleware bridges both streams, ensuring that the order book displayed reflects the same state as the matching engine, with a skew of less than 50 microseconds.
This parallel operation eliminates race conditions. The feed uses a consolidated tape protocol, timestamping each price change with a hardware clock synchronized to NTP. The matching engine, meanwhile, locks only the specific price level being modified, allowing concurrent order insertion across different levels. The result is a terminal where a trader sees real-time prices and can execute a market order that hits the matching engine before the next feed update arrives.
The pricing feed operates as a distributed data pipeline. Each exchange’s WebSocket or FIX feed is parsed by a dedicated core, converting raw binary or JSON into a unified schema. This schema includes bid/ask, volume, and sequence numbers. The feed then applies a smoothing algorithm to discard outliers and stale ticks, reducing noise without adding latency. The entire process, from exchange receipt to terminal display, targets under 1 millisecond.
Instead of sending full snapshots, the feed transmits only deltas-changes in price, volume, or order count. The terminal maintains a local copy of the order book, applying these deltas incrementally. A sequence gap detection mechanism triggers a full snapshot request if packets drop. This delta approach cuts bandwidth usage by 90% and keeps the terminal’s cache coherent with the hub’s master book.
The matching engine uses a lock-free ring buffer for incoming orders. Each order is tagged with a microsecond timestamp and a trader ID. The engine scans the opposite side of the book for matches using a binary search on price levels, then executes FIFO within each level. For high-frequency strategies, the engine supports “immediate-or-cancel” (IOC) orders that bypass the book entirely, matching against existing liquidity and returning the remainder instantly. This reduces queue wait time for scalpers.
To handle simultaneous feed updates, the matching engine employs a “read-copy-update” (RCU) mechanism for the order book. When a price feed update arrives, the engine creates a temporary copy of the affected level, applies the match, and then atomically swaps the pointer. This ensures that the feed’s view and the engine’s state never diverge, even under 1 million orders per second. The terminal then displays the resulting trade prints with a latency of under 200 microseconds.
The terminal’s GUI runs on a separate thread that polls both the feed and engine states every 10 milliseconds. It uses a double-buffering technique: one buffer renders the current order book, while the other receives updates. The terminal also provides a “latency gauge” showing the delta between the last feed timestamp and the last engine timestamp, giving traders real-time insight into data freshness. If the delta exceeds 1 millisecond, the terminal highlights the affected prices in red, alerting the user to potential stale data.
For colocated traders, the hub offers a raw feed via RDMA (Remote Direct Memory Access), bypassing the terminal’s network stack. This reduces end-to-end latency to under 10 microseconds. The matching engine, in this scenario, can process orders directly from the trader’s FPGA, creating a closed-loop system where price discovery and execution happen on the same clock cycle.
The terminal uses a lock-free ring buffer and read-copy-update (RCU) mechanism to isolate feed updates from engine operations, ensuring no race conditions.
The terminal detects sequence gaps and requests a full order book snapshot from the hub, then resumes delta updates.
Yes, the terminal displays a per-order latency metric, measured from the moment the order leaves the trader’s machine to when it enters the matching engine.
No, the hub normalizes them, but exchanges with slower feeds are marked with a latency flag in the terminal.
Alex K.
This article clarified how the feed and engine sync. I use this hub daily, and now I understand why my IOC orders execute so fast.
Maria L.
The RDMA option is a game-changer. My colocated setup now sees sub-10 microsecond latency. The explanation of RCU was spot on.
James T.
I run a market-making bot. The delta propagation method reduced my bandwidth costs and improved my fill rates. Great technical deep dive.