Institutional Hotpath Latency

This page is the latency and determinism contract for institutional market makers. It separates the exchange hotpath from proxy/network effects and makes the response semantics explicit enough for an unattended quote engine.

Target

For a market maker quote loop, the target is sub-50 ms p99 from exchange ingress to HTTP/BSL ACK on the provisioned low-latency lane. This target is not a guarantee for public-internet round trips, Cloudflare/proxy paths, full responses, or durable/backoffice workflows.

Measure three separate clocks:

Clock	Meaning	How to measure
Client RTT	Client send to client receive. Includes network, TLS, proxy, and client runtime.	Client-side monotonic timer.
Core ACK	Sequencer request start to ACK response body finalization.	`x-mm-perf-total-us` or `bsl.timing.durationsUs.coreAck`.
BSL facade overhead	JSON/BSL response envelope work around the core submit.	`bsl.timing.durationsUs.facadeAugment`.

If client RTT is high but core ACK is low, optimize network path, DNS/direct endpoint choice, TLS reuse, and proxy hops before changing strategy logic.

Route selection

Use the connectivity bundle for the actual endpoints assigned to the account:

GET /api/v1/bsl/connectivity
SC-Auth-Version: 2
SC-Key: <institutional apiKeyId>
SC-Nonce: <monotonic nonce>
SC-Timestamp: <unix ms>
SC-Passphrase: <apiPassphrase>
SC-Signature: <hmac>

Hotpath preference:

Preference	Route	Use
1	BSL Direct TCP/TLS	Lowest overhead native binary session where provisioned.
2	`/api/order-entry/binary`	Direct binary HTTP lane.
3	`/api/v1/bsl/orders/compact`	BSL compact HTTP facade with institutional HMAC.
4	Generic platform/proxy path	Integration compatibility only; benchmark separately.

Keep connections warm. Reusing TLS sessions and a warm credential/signature cache matters more than micro-optimizing one isolated request.

Signing path

Do not call POST /api/v1/trading/actions/hash in the steady-state quote loop. That endpoint is a reference and conformance tool. Production quote engines compute the canonical action hash locally, sign locally, and submit the signed batch in one order-entry request. See Local Action Signing and pin the SDK golden vectors in CI.

The common two-request flow:

POST /api/v1/trading/actions/hash
sign
POST /api/v1/bsl/orders/compact

adds a full network/API round trip before every quote refresh. Keep it for bring-up and signer validation only.

Live stress reports from scripts/e2e/live-order-entry-stress.cjs expose this split under latencyAttribution: actionBuild.localSigning measures local SDK signing, actionBuild.nonce measures bootstrap/reservation work, submitHttp measures the order-entry request RTT, and removedHotpathRoundtrip.endpoint records that /api/v1/trading/actions/hash was intentionally not called in the hotpath. Use those fields to prove that a post-change run is no longer paying the 200-270 ms server-hash round trip observed in the original integration review.

ACK and response modes

There are two independent knobs:

Header	Controls	Hotpath setting
`X-BSL-Result-Mode`	ACK boundary: `ack`, `durable`, `full`.	`ack` for quote loops.
`X-Senticore-Response-Mode`	Body verbosity: `summary`, `detailed`, `full`.	`detailed` during integration; `summary` only after reconciliation is proven.

ACK boundaries:

Boundary	Server waits for	Client meaning
`accepted` / BSL `ack`	Admission boundary only.	Request entered the low-latency lane; not terminal order state.
`ingress_wal`	Primary ingress WAL append and forced sync.	Safer admission boundary; still not terminal order state.
`applied` / BSL `full`	Engine apply / terminal receipts.	Useful for conformance and slow paths; keep out of live quote refresh loops.
`primary_durable` / `durable`	Durable boundary.	Backoffice or explicit durability workflows, not low-latency quoting.

Do not run a market-maker quote loop with full responses unless the account has explicitly passed conformance for that mode and the latency budget includes engine terminal wait time.

Recommended submit headers:

X-BSL-Result-Mode: ack
X-Senticore-Response-Mode: detailed
Idempotency-Key: <strategy-request-key>

Expected successful shape:

{
  "ok": true,
  "seqs": [123],
  "derivedOrderIds": ["0x..."],
  "acceptedActions": 1,
  "responseMode": "detailed",
  "ackMode": "ingress_wal",
  "ackClass": "ingress_wal",
  "durableLsn": 22376075,
  "durablePendingSeqs": 1,
  "bsl": {
    "reconciliation": {
      "ackIsTerminalState": false,
      "bodyIncludesSeqs": true,
      "bodyIncludesDerivedOrderIds": true,
      "missingDerivedOrderIdCount": 0,
      "missingDerivedOrderIdsRecoveredLocally": 0,
      "missingDerivedOrderIdsFinalMissingAfterLocalDerivation": 0,
      "receiptPath": "/api/mm/action_receipts?account={account}&seqs={seqs}",
      "missingDerivedOrderIdsAction": "derive_locally_from_signed_payload_or_reconcile_by_seq",
      "quoteReplaceSummary": {
        "requestedLegs": [
          {
            "derivedOrderId": "0x...",
            "expectedDerivedOrderId": "0x...",
            "derivedOrderIdSource": "response.derivedOrderIds",
            "responseDerivedOrderIdMissing": false
          }
        ]
      }
    },
    "timing": {
      "durationsUs": {
        "total": 4210,
        "coreAck": 3830,
        "facadeAugment": 380,
        "auth": 120,
        "routing": 540,
        "risk": 0,
        "enqueue": 2100,
        "ackResolution": 0,
        "responseEmit": 90
      }
    }
  }
}

coreAck is the number to compare against the exchange-side sub-50 ms target. total in the BSL envelope includes facade work after the core response. bsl.reconciliation.ackIsTerminalState is false for the low-latency ACK lane. Use seqs[], lastSeq, seqChecksum, local derived order ids, and the receipt/drop-copy paths as the authoritative post-submit reconciliation anchors.

Latency budget

Use this budget when investigating a p99 spike:

Stage	Signal	Typical fix
Client transport	Client-side timer before response headers	Reuse one SDK client per process or strategy shard; keep HTTP/FIX/BSL TCP sessions persistent.
Auth	`x-mm-perf-routing-auth-us`	Warm HMAC credential cache, monotonic `SC-Nonce`, persistent session.
Signature verify	`x-mm-perf-signature-verify-us`	Use self-signed account actions where possible; keep signer sets warm.
Validation/routing	`x-mm-perf-routing-validation-us` and detailed routing headers	Single account, single shard, valid market, fresh timestamps, low fanout.
Risk/precheck	`x-mm-perf-prepare-risk-approval-us`, `x-mm-perf-prepare-risk-apply-us`	Keep quote shapes hotpath-safe; avoid `full` mode in quote loops.
WAL/enqueue	`x-mm-perf-enqueue-us`	Keep batches small, direct lane warm, avoid backlog and forced slow-lane syncs.
ACK wait	`x-mm-perf-ack-resolution-us`	Do not request `applied`, `durable`, or `full` on the quote loop.
Response	`x-mm-perf-response-emit-us`	Use `summary` only after stream/drop-copy reconciliation is proven.

Set MM_DETAILED_PERF_RESPONSE_HEADERS=true only for diagnostics. The basic hotpath headers are always enough for first-pass triage.

Batch shape

For sub-50 ms behavior, keep the quote batch boring:

one engine account per batch,
one market shard where possible,
QuoteReplace / SpotQuoteReplace over separate cancel plus place calls,
small batches inside the low-latency limit from GET /api/v1/bsl/limits,
hotpath-safe order types and post-only quote legs where appropriate,
no full result mode on the steady-state quote refresh path.

Read the live limits before the strategy starts:

GET /api/v1/bsl/limits

Important fields:

Field	Meaning
`ackModes.lowLatency`	Current low-latency ACK boundary for the account/lane.
`actions.lowLatencyMaxActions`	Maximum batch size that stays in the low-latency class.
`actions.singleAccountRequired`	Whether a batch may mix engine accounts.
`ingress.lowLatencyMaxInflight`	Push-side concurrency cap for the low-latency lane.
`backlog.ingressWalMaxPendingActions`	WAL/backlog pressure signal for admission throttling.
`nonceModel.actionNonce`	Live action nonce window and recovery rule.
`nonceModel.machineAuthNonce`	`SC-Nonce` replay contract for HMAC/agent auth.
`errorGuidance.nonceCodes`	Machine-readable nonce reject recovery actions.
`errorGuidance.requestCodes`	Queue, risk, duplicate nonce, and reservation guidance.

Nonce rules

Do not mix nonce families:

Nonce	Owner	Retry rule
`SC-Nonce`	HMAC credential / API key	Monotonic per key. Reuse is an auth replay and must not be retried.
Action `payload.nonce`	Engine account	Windowed admission. Re-sign with a fresh in-window nonce on stale/replay rejects.
`Idempotency-Key`	Client request	Same key and same body may replay the response; same key and different body conflicts.

Changing only Idempotency-Key does not fix an action nonce error because the action nonce is inside the signed payload.

Recommended action nonce flow:

Read GET /api/v1/accounts/:engineAccount/bootstrap?fresh=true.
Start from nonceFloor or nextNonce.
Allocate unused in-window nonces locally.
Persist nonce plus signed payload before submit.
On nonce_below_floor, nonce_replayed, or nonce_outside_window, re-read bootstrap/account state and re-sign.

QuoteReplace guarantees

QuoteReplace is the market-maker primitive. One signed parent action contains one or more quote legs. Each replacement child order id is deterministic from the signed parent payload and leg index.

Guarantees to rely on:

Contract	Guarantee
Ordering	A leg's `cancelOrderId` is processed before that leg's replacement order.
Identity	`derivedOrderIds[]` is deterministic and should match SDK local derivation.
Admission	An `ack` response means the group reached the configured admission boundary.
Terminal truth	Final open, filled, canceled, or rejected state comes from drop-copy/private stream/receipts/account reads.

Do not assume every replacement order is resting because the HTTP response is 200. A risk reject, market halt, post-only cross, cancel race, or terminal engine reject can still appear after an admission ACK.

Reconciliation

Persist before each submit:

Key	Why
`Idempotency-Key`	Retry correlation.
action nonce	Re-sign/replay diagnosis.
`clientOrderId`	Strategy identity.
expected `derivedOrderIds[]`	Cancel/replace and public-book verification.
response `seqs[]`	Receipt, stream, and replay gap-fill correlation.
market, book, side, price, qty	Local inventory/book model.

Reconciliation order:

Private stream or FIX/BSL drop-copy for terminal execution truth.
/api/v1/bsl/accounts/:account/executions?fromSeq=... for replay/gap-fill.
/api/v1/accounts/:engineAccount/orders and /fills for account-state confirmation.
Public market snapshot/order book only to verify visible resting liquidity.

Public book visibility is not account truth. It can lag, aggregate levels, or omit account ownership. Use it to confirm quote visibility after private state has been reconciled.

Local verification

Use these checks before handing a new endpoint bundle to a market maker:

cargo test -p senticore-sequencing-core bsl_order_entry_returns_contract_envelope_and_mode_headers -- --nocapture --test-threads=1
cargo test -p senticore-sequencing-core bsl_limits_route_exposes_business_line_contract -- --nocapture --test-threads=1
cargo test -p senticore-sequencing-core mm_action_batch_binary_quote_replace_applies_compact_action -- --nocapture --test-threads=1
cargo test -p senticore-sequencing-core external_gateway_ipc_accepts_atomic_group_as_one_batch_message -- --nocapture --test-threads=1

For latency work, run the dedicated exchange-side ACK gate and label the artifact with the route, host, profile, and result mode:

export RUNTIME_DATABASE_URL=postgres://...
export ENGINE_TRUTH_DATABASE_URL=$RUNTIME_DATABASE_URL

python3 backend/sequencer/scripts/run_exchange_h0_perf_baseline.py \
  --profile institutional_mm_hotpath_ack_smoke \
  --allow-dedicated \
  --artifact-dir backend/sequencer/runtime/perf/institutional-mm-hotpath-smoke

python3 backend/sequencer/scripts/run_exchange_h0_perf_baseline.py \
  --profile institutional_mm_hotpath_ack_50ms \
  --allow-dedicated \
  --artifact-dir backend/sequencer/runtime/perf/institutional-mm-hotpath

The smoke profile validates host wiring and artifact schema only. The 50ms gate fails when exchange-side HTTP ACK p99 is above 50000us. Add --process-timeout-sec <seconds> while debugging a stuck host so the runner writes a bounded failure artifact. Never compare a full or durable ACK run against an ack quote-loop target.

In the JSON artifact, use requestAckExclusiveBreakdownUs for the first-pass "where did the ACK time go?" attribution. It is non-overlapping. The detailed requestBreakdownUs object includes nested routing and enqueue diagnostics and must not be summed as if every field were exclusive.

Read processTotal and processUninstrumented before optimizing the engine. Those fields isolate the exchange process path. A run with low processTotal and high handlerUninstrumented is pointing at HTTP/Axum/harness wrapper, runtime scheduling, socket, or response decode pressure rather than direct Engine/WAL/order-book work.

The stable-account benchmark profile uses multiple pre-seeded account slots per phase so load generation stays inside the engine action nonce replay window. That is a harness constraint, not a relaxation of production nonce rules: market-maker clients must still allocate action nonces per engine account and recover from nonce rejects by re-reading bootstrap/account state and re-signing.

Target​

Route selection​

Signing path​

ACK and response modes​

Latency budget​

Batch shape​

Nonce rules​

QuoteReplace guarantees​

Reconciliation​

Local verification​