Architecture

Futuristic, CPU-first system with animated schematics. No proprietary internals disclosed.

ClientsSDK / CLI / DesktopGatewayHTTP / gRPCAuth / Quotaskeys • orgsrequestslimitsSchedulerbackpressure • threadsTokenizernormalize → idsDecoderunified, multimodalStream MuxSSE / WSroutetokenizedecodeframesPluginsRetrieval • Tools • GroundingDocument Storeindex / searchToolingHTTP • DB • searchTelemetryevents • traceshooksquerycallemitSSE/WebSocketClientslive tokens
RAG sequence (high-level)
ClientGatewaySchedulerTokenizerDecoderPlugins/v1/chat.completionsroutenormalize → idsretrieval(query)docs → contexttokens streamSSE/WS stream
Tool-use (function call)
ClientGatewayDecoderPluginsrequestdecodetool:{search}result JSONfinal textstream
Generation cache & adaptation (high-level)
Generation Cachetime-step tracesRecent WindowRAM hot spanCold Blocksdisk-backedRehydratetrace → attention keysParity Checksdense matchAdaptive Capacityopt-in growthCoordinatorsync eventson-demandhot pathcold pathcapacity signalvalidation
Deep Dives

Under-the-hood (conceptual)

  • UTF-8 text intake with whitespace & control handling.
  • Deterministic normalization before ID mapping.
  • Streaming-safe: chunked inputs maintain boundaries.