Architecture
π§ Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
Introductionβ
Cortex is a C++ AI engine designed to operate entirely on your local hardware infrastructure. This headless backend platform is also engineered to support TensorRT-LLM, ensuring high-performance machine-learning model execution. It is packaged with a Docker-inspired command-line interface and a Typescript client library.
The following guide details Cortex's core components, providing insights and instructions for those interested in customizing Cortex to meet specific requirements.
Architectureβ
Main Componentsβ
Cortex is architected with several key components:
- Cortex JS: This component acts as the interface layer where requests are received and responses are sent.
- Server: The central processing unit of Cortex, this component coordinates all activities across the system. It manages the data flow and ensures operations are correctly executed.
- Kernel: This component checks the server's hardware configuration. Based on the current hardware setup, it determines whether additional dependencies are required, optimizing the system for performance and compatibility.
- Runtime: This process involves dynamically loading necessary libraries and models based on the server's current needs and processing requests.
- Dynamic Libraries: Consists of inference engines loaded on-demand to enhance Cortex's processing power. These engines are essential for performing specialized computational tasks. Currently, Cortex supports:
Data Structureβ
Cortex is equipped with MySQL and SQLite databases, offering flexible data management options that can be easily adapted to different environments and requirements. It also has filesystem data that can be stored and retrieved using file-based mechanisms.
-
MySQL: This database is used because it is ideal for Cortex environments where scalability, security, and data integrity are critical. MySQL is well-suited for handling large model-size data from the core extensions.
-
SQLite: This database is used for simplicity and minimal setup. It can handle the small model size from the core extensions and any data from the External extensions.
-
File System: Cortex uses a filesystem approach for managing configuration files, such as
model.yaml
files. These files are stored in a structured directory hierarchy.
Providersβ
Cortex uses three different types of providers:
-
Internal Provider: Integral to the CLI, it includes the core binary (
.cpp
) and is compiled directly with the CLI, facilitating all application parts' direct access to core functionalities. -
Core Extensions: These are bundled with the CLI and include additional functionalities like remote engines and API models, facilitating more complex operations and interactions within the same architectural framework.
-
External Extensions: These are designed to be more flexible and are stored externally. They represent potential future expansions or integrations, allowing the architecture to extend its capabilities without modifying the core system.
Key Dependenciesβ
Cortex was developed using NestJS and operates via a Node.js server framework, handling all incoming and outgoing requests. It also has a C++ runtime to handle stateless requests.
Below is a detailed overview of its core architecture components:
-
NestJS Framework: NestJS framework serves as the backbone of the Cortex. This framework facilitates the organization of server-side logic into modules, controllers, and extensions, which are important for maintaining a clean codebase and efficient request handling.
-
Node.js Server: Node.js is the primary runtime for Cortex, which handles the HTTP requests, executes the server-side logic, and manages the responses.
-
C++ Runtime: C++ runtime is important for managing stateless requests. This component can handle intensive tasks that require optimized performance.
Code Structureβ
The repository is organized to separate concerns between domain definitions, business rules, and adapters or implementations.
# Entity Definitionsdomain/ # This is the core directory where the domains are defined. abstracts/ # Abstract base classes for common attributes and methods. models/ # Domain interface definitions, e.g. model, assistant. repositories/ # Extensions abstract and interface# Business Rulesusecases/ # Application logic assistants/ # CRUD logic (invokes dtos, entities). chat/ # Logic for chat functionalities. models/ # Logic for model operations.# Adapters & Implementationsinfrastructure/ # Implementations for Cortex interactions commanders/ # CLI handlers models/ questions/ # CLI installation UX shortcuts/ # CLI chained syntax types/ usecases/ # Invokes UseCases controllers/ # Nest controllers and HTTP routes assistants/ # Invokes UseCases chat/ # Invokes UseCases models/ # Invokes UseCases database/ # Database providers (mysql, sqlite) # Framework specific object definitions dtos/ # DTO definitions (data transfer & validation) entities/ # TypeORM entity definitions (db schema) # Providers providers/cortex # Cortex [server] provider (a core extension) repositories/extensions # Extension provider (core & external extensions)extensions/ # External extensionscommand.module.ts # CLI Commands Listmain.ts # Entrypoint
Runtimeβ
The sequence diagram above outlines the interactions between various components in the Cortex system during runtime, particularly when handling user requests via a CLI. Hereβs a detailed breakdown of the runtime sequence:
- User Request: The user initiates an interaction by requesting βa jokeβ via the Cortex CLI.
- Model Activation:
- The API directs the request to the
Model Controller/Service
. - The service pulls and starts the appropriate model and posts a request to
/completions
to prepare the model for processing.
- The API directs the request to the
- Chat Processing:
- The
Chat Controller/Service
processes the user's request by interacting with the Model Entity and Extension Repository to gather necessary data and logic.
- The
- Data Handling and Response Formation:
- The
Model Entity
andExtension Repository
perform data operations, which may involve calling aProvider
for additional processing. - Data is fetched, stored, and an inference is performed as needed.
- The
- Response Delivery:
- The response is formatted by the
Chat Controller/Service
and streamed back to the user through the API. - The user receives the processed response, completing the cycle of interaction.
- The response is formatted by the
Roadmapβ
Our development roadmap outlines key features and epics we will focus on in the upcoming releases. These enhancements aim to improve functionality, increase efficiency, and expand Cortex's capabilities.
- RAG: Improve response quality and contextual relevance in our AI models.
- Cortex Python Runtime: Provide a scalable Python execution environment for Cortex.