Vulnerability-Lookup

PYSEC-2025-53

Vulnerability from pysec - Published: 2025-05-29 17:15 - Updated: 2025-06-26 21:23

Details

vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.

Impacted products

Name	purl
vllm	pkg:pypi/vllm

Aliases

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm",
        "purl": "pkg:pypi/vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
            }
          ],
          "repo": "https://github.com/vllm-project/vllm",
          "type": "GIT"
        },
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "0.9.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ],
      "versions": [
        "0.0.1",
        "0.1.0",
        "0.1.1",
        "0.1.2",
        "0.1.3",
        "0.1.4",
        "0.1.5",
        "0.1.6",
        "0.1.7",
        "0.2.0",
        "0.2.1",
        "0.2.1.post1",
        "0.2.2",
        "0.2.3",
        "0.2.4",
        "0.2.5",
        "0.2.6",
        "0.2.7",
        "0.3.0",
        "0.3.1",
        "0.3.2",
        "0.3.3",
        "0.4.0",
        "0.4.0.post1",
        "0.4.1",
        "0.4.2",
        "0.4.3",
        "0.5.0",
        "0.5.0.post1",
        "0.5.1",
        "0.5.2",
        "0.5.3",
        "0.5.3.post1",
        "0.5.4",
        "0.5.5",
        "0.6.0",
        "0.6.1",
        "0.6.1.post1",
        "0.6.1.post2",
        "0.6.2",
        "0.6.3",
        "0.6.3.post1",
        "0.6.4",
        "0.6.4.post1",
        "0.6.5",
        "0.6.6",
        "0.6.6.post1",
        "0.7.0",
        "0.7.1",
        "0.7.2",
        "0.7.3",
        "0.8.0",
        "0.8.1",
        "0.8.2",
        "0.8.3",
        "0.8.4",
        "0.8.5",
        "0.8.5.post1"
      ]
    }
  ],
  "aliases": [
    "CVE-2025-46570",
    "GHSA-4qjh-9fv9-r85r"
  ],
  "details": "vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.",
  "id": "PYSEC-2025-53",
  "modified": "2025-06-26T21:23:06.231251+00:00",
  "published": "2025-05-29T17:15:21+00:00",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://github.com/vllm-project/vllm/pull/17045"
    },
    {
      "type": "ADVISORY",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r"
    },
    {
      "type": "FIX",
      "url": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
    },
    {
      "type": "REPORT",
      "url": "https://github.com/vllm-project/vllm/pull/17045"
    }
  ]
}

GHSA-4QJH-9FV9-R85R

Vulnerability from github – Published: 2025-05-28 18:02 – Updated: 2025-06-27 21:06

Summary

Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

Details

This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

Description

When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.

For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.

Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.

Environment

GPU: NVIDIA A100 (40G)
CUDA: 11.8
PyTorch: 2.3.1
OS: Ubuntu 18.04
vLLM: v0.5.1 Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.

Leakage

We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability. roc_curves_combined_block_2

Results

In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results: - With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times. - When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.

Fixes

https://github.com/vllm-project/vllm/pull/17045

Severity ?

2.6 (Low)


                  
                    CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "0.9.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2025-46570"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-208"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2025-05-28T18:02:24Z",
    "nvd_published_at": "2025-05-29T17:15:21Z",
    "severity": "LOW"
  },
  "details": "This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.\n\n## Description\nWhen a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.\n\nFor instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim\u0027s input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.\n\nUnlike token-by-token sharing mechanisms, vLLM\u2019s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim\u0027s prompt at the chunk level.\n\n## Environment\n\n- GPU: NVIDIA A100 (40G)\n- CUDA: 11.8\n- PyTorch: 2.3.1\n- OS: Ubuntu 18.04\n- vLLM: v0.5.1\nConfiguration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.\n\n## Leakage\nWe conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.\n\u003cimg src=\"https://github.com/user-attachments/assets/db3491e9-02b7-424c-9b6d-56f553b39f2f\" alt=\"roc_curves_combined_block_2\" width=\"400\"/\u003e\n\n\n## Results\nIn our experiment, we analyzed the response time differences between cache hits and misses in vLLM\u0027s PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:\n- With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.\n- When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.\n\n## Fixes\n\n* https://github.com/vllm-project/vllm/pull/17045",
  "id": "GHSA-4qjh-9fv9-r85r",
  "modified": "2025-06-27T21:06:46Z",
  "published": "2025-05-28T18:02:24Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-46570"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/17045"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-53.yaml"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "Potential Timing Side-Channel Vulnerability in vLLM\u2019s Chunk-Based Prefix Caching"
}

CVE-2025-46570 (GCVE-0-2025-46570)

Vulnerability from cvelistv5 – Published: 2025-05-29 16:32 – Updated: 2025-05-29 18:05

Title

vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel

Summary

Severity ?

2.6 (Low)


                        
                          CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

CWE

CWE-208 - Observable Timing Discrepancy

Assigner

GitHub_M

References

URL

Tags

	https://github.com/vllm-project/vllm/security/adv…	x_refsource_CONFIRM
	https://github.com/vllm-project/vllm/pull/17045	x_refsource_MISC
	https://github.com/vllm-project/vllm/commit/77073…	x_refsource_MISC

Impacted products

	Vendor	Product	Version
	vllm-project	vllm	Affected: < 0.9.0

Show details on NVD website

JSON

To clipboard

{
  "containers": {
    "adp": [
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2025-46570",
                "options": [
                  {
                    "Exploitation": "none"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2025-05-29T18:04:57.706360Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2025-05-29T18:05:10.768Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "product": "vllm",
          "vendor": "vllm-project",
          "versions": [
            {
              "status": "affected",
              "version": "\u003c 0.9.0"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0."
        }
      ],
      "metrics": [
        {
          "cvssV3_1": {
            "attackComplexity": "HIGH",
            "attackVector": "NETWORK",
            "availabilityImpact": "NONE",
            "baseScore": 2.6,
            "baseSeverity": "LOW",
            "confidentialityImpact": "LOW",
            "integrityImpact": "NONE",
            "privilegesRequired": "LOW",
            "scope": "UNCHANGED",
            "userInteraction": "REQUIRED",
            "vectorString": "CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N",
            "version": "3.1"
          }
        }
      ],
      "problemTypes": [
        {
          "descriptions": [
            {
              "cweId": "CWE-208",
              "description": "CWE-208: Observable Timing Discrepancy",
              "lang": "en",
              "type": "CWE"
            }
          ]
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2025-05-29T16:32:42.794Z",
        "orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
        "shortName": "GitHub_M"
      },
      "references": [
        {
          "name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r",
          "tags": [
            "x_refsource_CONFIRM"
          ],
          "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r"
        },
        {
          "name": "https://github.com/vllm-project/vllm/pull/17045",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/pull/17045"
        },
        {
          "name": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
        }
      ],
      "source": {
        "advisory": "GHSA-4qjh-9fv9-r85r",
        "discovery": "UNKNOWN"
      },
      "title": "vLLM\u2019s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel"
    }
  },
  "cveMetadata": {
    "assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
    "assignerShortName": "GitHub_M",
    "cveId": "CVE-2025-46570",
    "datePublished": "2025-05-29T16:32:42.794Z",
    "dateReserved": "2025-04-24T21:10:48.175Z",
    "dateUpdated": "2025-05-29T18:05:10.768Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.1"
}

Sightings

Author	Source	Type	Date

Nomenclature

Seen: The vulnerability was mentioned, discussed, or observed by the user.
Confirmed: The vulnerability has been validated from an analyst's perspective.
Published Proof of Concept: A public proof of concept is available for this vulnerability.
Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
Not confirmed: The user expressed doubt about the validity of the vulnerability.
Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Detection rules are retrieved from Rulezet.

Action not permitted

PYSEC-2025-53

GHSA-4QJH-9FV9-R85R

Description

Environment

Leakage

Results

Fixes

CVE-2025-46570 (GCVE-0-2025-46570)

Tags

Sightings

Nomenclature