LLMs are now first-class citizens in Compose. The models top-level key lets you declare which models your application needs and wire them into your services, all in the same Compose file.

Basic usage

Declare a model at the top level, reference it from a service:

models:
  smollm:
    model: ai/smollm2

services:
  app:
    image: myapp
    models:
      - smollm

When the stack starts, Compose ensures the model is available locally and connects the app service to it. The container receives endpoint information via environment variables.

Default environment variables

By default, Compose injects two environment variables for each connected model:

  • <MODEL_NAME>_URL: the OpenAI-compatible endpoint URL
  • <MODEL_NAME>_MODEL: the model identifier

For the example above, the container sees:

SMOLLM_URL=http://model-runner.docker.internal/engines/llama.cpp/v1/
SMOLLM_MODEL=ai/smollm2

Your code uses any standard OpenAI-compatible client to talk to it.

Customizing the variable names

A common case: your code uses an official OpenAI client library, which reads the endpoint from OPENAI_BASE_URL by default. Override the injected variable names to match:

models:
  smollm:
    model: ai/smollm2

services:
  app:
    image: myapp
    models:
      smollm:
        endpoint_var: OPENAI_BASE_URL
        model_var: OPENAI_MODEL

The container now sees OPENAI_BASE_URL and OPENAI_MODEL. Your existing code works with no changes:

from openai import OpenAI

# Reads OPENAI_BASE_URL automatically
client = OpenAI(api_key="not-needed")
response = client.chat.completions.create(
    model=os.environ["OPENAI_MODEL"],
    messages=[{"role": "user", "content": "Hello!"}],
)

Multiple models

Declare and connect to multiple models at once:

models:
  chat:
    model: ai/smollm2
  embeddings:
    model: ai/granite-embedding-multilingual

services:
  api:
    image: myapp
    models:
      - chat
      - embeddings

Each model gets its own pair of environment variables.

A practical RAG-style stack

Combining a chat model and an embedding model with a vector database:

models:
  chat:
    model: ai/qwen2.5

  embeddings:
    model: ai/granite-embedding-multilingual

services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"

  api:
    build: ./api
    models:
      - chat
      - embeddings
    environment:
      QDRANT_URL: http://qdrant:6333
    depends_on:
      - qdrant

The API service has everything it needs: a chat model for generation, an embedding model for indexing, and a vector store, all declared in one Compose file.

Why this matters

Before models, running an AI app meant juggling separate tools: a model server, environment variables for the endpoint, model lifecycle, and your application stack. With models, it’s all in your compose.yml and starts with one docker compose up.

I covered this and other AI-related Compose features in detail at Devoxx France 2026, check the talk recording on YouTube once it’s published.

Further reading