GHSA-8JR5-V98P-W75M

Vulnerability from github – Published: 2026-06-17 14:02 – Updated: 2026-06-18 14:31
VLAI
Summary
vLLM: image EXIF Rotation & PNG tRNS Transparency Not Normalized, Causing Mismatch Between Model Input and Expectations
Details

Summary

Issue 1: EXIF orientation not normalized → The image orientation processed by the model differs from how humans view it, introducing interpretation bias.

Issue 2: PNG tRNS not explicitly flattened before converting to RGB → After conversion, transparent/semi-transparent pixels are rendered unexpectedly, making otherwise subtle overlay elements visible and distorting the input content. (This attack is similar to AlphaDog: RGBA handling is already correct in vLLM, but since tRNS permits RGB images, the correct processing path isn’t taken.)

Issue 3 : Pillow only loads the first frame when loading APNG or GIF files.


Root Cause

  • Rotation: After opening an image, ImageOps.exif_transpose is not called to normalize EXIF orientation.
  • Transparency: Only RGBA→RGB is flattened with a background; PNGs carrying tRNS in P/L/RGB + tRNS and other non-RGBA modes take the image.convert("RGB") path, which implicitly discards/remaps transparency semantics.

Affected Code

https://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L77-L84

https://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L37-L43

https://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L26-L34

Current state: ImageOps.exif_transpose is not used. (Although the rescale_image_size function (https://github.com/vllm-project/vllm/blob/main/vllm/multimodal/image.py#L14) exists and includes a transpose parameter, I’ve found that it doesn’t seem to be called anywhere outside the test directory.)

Call order: _convert_image_mode runs first; if the conditions are met, convert_image_mode is called.

Issue: Only the “RGBA → RGB” path is explicitly flattened. P, L, or RGB with tRNS all fall back to image.convert("RGB"). For PNGs that include tRNS, convert("RGB") directly produces 24-bit RGB, leading to:

  • P mode: The transparent index becomes an actual RGB color (often black, white, or an undefined background), so transparency is lost.
  • L/LA and RGB + tRNS: convert("RGB") doesn’t composite against a chosen background first, so elements that relied on transparency to be hidden or softened become solid.

Impact & Scope

  • Impact: Pixels the model sees can diverge from operator expectations (due to orientation or transparency handling), potentially altering downstream reasoning.
  • Scope: The image I/O and mode-conversion paths in vllm/multimodal/image.py. The existing RGBA→RGB flattening is correct; the issues center on missing EXIF normalization and non-RGBA tRNS not being explicitly composited.

Case

EXIF: http://qiniu.funxingzuo.top/exif_orient_180.jpg tRNS: http://qiniu.funxingzuo.top/hello.png

Fix

A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44974

Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.11.0"
            },
            {
              "last_affected": "0.23.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-12491"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-436"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-06-17T14:02:42Z",
    "nvd_published_at": null,
    "severity": "MODERATE"
  },
  "details": "## Summary\n\nIssue 1: EXIF orientation not normalized \u2192 The image orientation processed by the model differs from how humans view it, introducing interpretation bias.\n\nIssue 2: PNG tRNS not explicitly flattened before converting to RGB \u2192 After conversion, transparent/semi-transparent pixels are rendered unexpectedly, making otherwise subtle overlay elements visible and distorting the input content. (This attack is similar to AlphaDog: RGBA handling is already correct in vLLM, but since tRNS permits RGB images, the correct processing path isn\u2019t taken.)\n\nIssue 3 : Pillow only loads the first frame when loading APNG or GIF files.\n\n---\n\n## Root Cause\n\n* **Rotation**: After opening an image, `ImageOps.exif_transpose` is not called to normalize EXIF orientation.\n* **Transparency**: Only **RGBA\u2192RGB** is flattened with a background; PNGs carrying **`tRNS`** in **`P`/`L`/`RGB + tRNS`** and other non-RGBA modes take the `image.convert(\"RGB\")` path, which implicitly discards/remaps transparency semantics.\n\n---\n\n## Affected Code\n\n\nhttps://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L77-L84\n\nhttps://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L37-L43\n\nhttps://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L26-L34\n\u003e Current state: `ImageOps.exif_transpose` is not used. (Although the `rescale_image_size` function ([https://github.com/vllm-project/vllm/blob/main/vllm/multimodal/image.py#L14](https://github.com/vllm-project/vllm/blob/main/vllm/multimodal/image.py#L14)) exists and includes a `transpose` parameter, I\u2019ve found that it doesn\u2019t seem to be called anywhere outside the `test` directory.\uff09\n\n\u003e **Call order**: `_convert_image_mode` runs first; if the conditions are met, `convert_image_mode` is called.\n\u003e \n\u003e **Issue**: Only the \u201cRGBA \u2192 RGB\u201d path is explicitly flattened. `P`, `L`, or `RGB` with `tRNS` all fall back to `image.convert(\"RGB\")`. For PNGs that include `tRNS`, `convert(\"RGB\")` directly produces 24-bit RGB, leading to:\n\u003e \n\u003e * **`P` mode**: The transparent index becomes an actual RGB color (often black, white, or an undefined background), so transparency is lost.\n\u003e * **`L/LA` and `RGB + tRNS`**: `convert(\"RGB\")` doesn\u2019t composite against a chosen background first, so elements that relied on transparency to be hidden or softened become solid.\n\n\n## Impact \u0026 Scope\n\n* **Impact**: Pixels the model sees can diverge from operator expectations (due to orientation or transparency handling), potentially altering downstream reasoning.\n* **Scope**: The image I/O and mode-conversion paths in `vllm/multimodal/image.py`. The existing **RGBA\u2192RGB** flattening is correct; the issues center on **missing EXIF normalization** and **non-RGBA `tRNS` not being explicitly composited**.\n\n## Case\nEXIF\uff1a http://qiniu.funxingzuo.top/exif_orient_180.jpg\ntRNS:  http://qiniu.funxingzuo.top/hello.png\n\n## Fix\n\nA fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44974",
  "id": "GHSA-8jr5-v98p-w75m",
  "modified": "2026-06-18T14:31:43Z",
  "published": "2026-06-17T14:02:42Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-8jr5-v98p-w75m"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-12491"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/44974"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/commit/cf1c90672404548aa3bc51f92c4745576a65ee26"
    },
    {
      "type": "WEB",
      "url": "https://access.redhat.com/security/cve/CVE-2026-12491"
    },
    {
      "type": "WEB",
      "url": "https://bugzilla.redhat.com/show_bug.cgi?id=2489786"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:L/A:L",
      "type": "CVSS_V3"
    }
  ],
  "summary": "vLLM: image EXIF Rotation \u0026 PNG tRNS Transparency Not Normalized, Causing Mismatch Between Model Input and Expectations"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.

Sightings

Author Source Type Date Other

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…