Vulnerability-Lookup

GHSA-5MQ8-78GM-PJMQ

Vulnerability from github – Published: 2026-03-06 18:39 – Updated: 2026-03-09 13:15

Summary

defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag

Details

Summary

The _findContentBySchemaText method in src/defuddle.ts interpolates image src and alt attributes directly into an HTML string without escaping:

html += `<img src="${imageSrc}" alt="${imageAlt}">`;

An attacker can use a " in the alt attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so _stripUnsafeElements cannot catch it.

Details

When _findContentBySchemaText finds a sibling image outside the matched content element, it reads the image's src and alt attributes via getAttribute() and interpolates them into a template literal. getAttribute('alt') returns the raw attribute value. If the alt contains ", it terminates the alt attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers).

The recently added _stripUnsafeElements() (commit f154cb7) strips on* attributes from DOM elements, but the alt attribute's name is alt (not on*), so it is preserved with its full value. The onload handler is created by the string interpolation, not present in the original DOM.

PoC

Input HTML:

<!DOCTYPE html>
<html>
<head>
<title>PoC</title>
<script type="application/ld+json">
{"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."}
</script>
</head>
<body>
<article><p>Short.</p></article>
<div class="post-container">
  <p>Extra text to inflate parent word count padding padding padding.</p>
  <div class="post-body">
    Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count.
  </div>
  <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'>
</div>
</body>
</html>

Output:

<img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)">

The onload event handler is injected as a separate HTML attribute.

Impact

XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the _findContentBySchemaText fallback, combined with a sibling image whose alt attribute contains a quote character followed by an event handler.

Suggested Fix

Use DOM API instead of string interpolation:

if (imageSrc) {
    const img = this.doc.createElement('img');
    img.setAttribute('src', imageSrc);
    img.setAttribute('alt', imageAlt);
    html += img.outerHTML;
}

This ensures attribute values are properly escaped by the DOM serializer.

Severity

2.1 (Low)


                  
                    CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N/E:P

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "npm",
        "name": "defuddle"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "0.9.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-30830"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-79"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-03-06T18:39:35Z",
    "nvd_published_at": "2026-03-07T06:16:11Z",
    "severity": "LOW"
  },
  "details": "### Summary\n\nThe `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping:\n\n```typescript\nhtml += `\u003cimg src=\"${imageSrc}\" alt=\"${imageAlt}\"\u003e`;\n```\n\nAn attacker can use a `\"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 \u2014 the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it.\n\n### Details\n\nWhen `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image\u0027s `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute(\u0027alt\u0027)` returns the raw attribute value. If the alt contains `\"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers).\n\nThe recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute\u0027s name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM.\n\n### PoC\n\nInput HTML:\n\n```html\n\u003c!DOCTYPE html\u003e\n\u003chtml\u003e\n\u003chead\u003e\n\u003ctitle\u003ePoC\u003c/title\u003e\n\u003cscript type=\"application/ld+json\"\u003e\n{\"@type\": \"Article\", \"text\": \"Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count.\"}\n\u003c/script\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003carticle\u003e\u003cp\u003eShort.\u003c/p\u003e\u003c/article\u003e\n\u003cdiv class=\"post-container\"\u003e\n  \u003cp\u003eExtra text to inflate parent word count padding padding padding.\u003c/p\u003e\n  \u003cdiv class=\"post-body\"\u003e\n    Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count.\n  \u003c/div\u003e\n  \u003cimg width=\"800\" height=\"600\" src=\"https://example.com/photo.jpg\" alt=\u0027pwned\" onload=\"alert(document.cookie)\u0027\u003e\n\u003c/div\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\nOutput:\n\n```html\n\u003cimg src=\"https://example.com/photo.jpg\" alt=\"pwned\" onload=\"alert(document.cookie)\"\u003e\n```\n\nThe `onload` event handler is injected as a separate HTML attribute.\n\n### Impact\n\nXSS in any application that renders defuddle\u0027s HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler.\n\n### Suggested Fix\n\nUse DOM API instead of string interpolation:\n\n```typescript\nif (imageSrc) {\n    const img = this.doc.createElement(\u0027img\u0027);\n    img.setAttribute(\u0027src\u0027, imageSrc);\n    img.setAttribute(\u0027alt\u0027, imageAlt);\n    html += img.outerHTML;\n}\n```\n\nThis ensures attribute values are properly escaped by the DOM serializer.",
  "id": "GHSA-5mq8-78gm-pjmq",
  "modified": "2026-03-09T13:15:41Z",
  "published": "2026-03-06T18:39:35Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/kepano/defuddle/security/advisories/GHSA-5mq8-78gm-pjmq"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-30830"
    },
    {
      "type": "WEB",
      "url": "https://github.com/kepano/defuddle/commit/f154cb740ee603431b69638273af737a27156df9"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/kepano/defuddle"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N/E:P",
      "type": "CVSS_V4"
    }
  ],
  "summary": "defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag"
}

CVE-2026-30830 (GCVE-0-2026-30830)

Vulnerability from cvelistv5 – Published: 2026-03-07 05:49 – Updated: 2026-03-10 17:58

Title

Defuddle: XSS via unescaped string interpolation in _findContentBySchemaText image tag

Summary

Defuddle cleans up HTML pages. Prior to version 0.9.0, the _findContentBySchemaText method in src/defuddle.ts interpolates image src and alt attributes directly into an HTML string without escaping. An attacker can use a " in the alt attribute to break out of the attribute context and inject event handler. This issue has been patched in version 0.9.0.

Severity

2.1 (Low)


                        
                          CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N/E:P

SSVC

Exploitation: poc Automatable: no Technical Impact: partial

CISA Coordinator (v2.0.3)

CWE

CWE-79 - Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

Assigner

GitHub_M

References

2 references

URL	Tags
https://github.com/kepano/defuddle/security/advis…	x_refsource_CONFIRM
https://github.com/kepano/defuddle/commit/f154cb7…	x_refsource_MISC

Impacted products

1 product

Vendor	Product	Version
kepano	defuddle	Affected: < 0.9.0

Show details on NVD website

JSON

To clipboard

{
  "containers": {
    "adp": [
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2026-30830",
                "options": [
                  {
                    "Exploitation": "poc"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2026-03-10T17:45:32.483501Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2026-03-10T17:58:06.614Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "product": "defuddle",
          "vendor": "kepano",
          "versions": [
            {
              "status": "affected",
              "version": "\u003c 0.9.0"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "Defuddle cleans up HTML pages. Prior to version 0.9.0, the _findContentBySchemaText method in src/defuddle.ts interpolates image src and alt attributes directly into an HTML string without escaping. An attacker can use a \" in the alt attribute to break out of the attribute context and inject event handler. This issue has been patched in version 0.9.0."
        }
      ],
      "metrics": [
        {
          "cvssV4_0": {
            "attackComplexity": "LOW",
            "attackRequirements": "NONE",
            "attackVector": "NETWORK",
            "baseScore": 2.1,
            "baseSeverity": "LOW",
            "privilegesRequired": "NONE",
            "subAvailabilityImpact": "NONE",
            "subConfidentialityImpact": "LOW",
            "subIntegrityImpact": "LOW",
            "userInteraction": "PASSIVE",
            "vectorString": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:N/VI:N/VA:N/SC:L/SI:L/SA:N/E:P",
            "version": "4.0",
            "vulnAvailabilityImpact": "NONE",
            "vulnConfidentialityImpact": "NONE",
            "vulnIntegrityImpact": "NONE"
          }
        }
      ],
      "problemTypes": [
        {
          "descriptions": [
            {
              "cweId": "CWE-79",
              "description": "CWE-79: Improper Neutralization of Input During Web Page Generation (\u0027Cross-site Scripting\u0027)",
              "lang": "en",
              "type": "CWE"
            }
          ]
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2026-03-07T05:49:15.964Z",
        "orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
        "shortName": "GitHub_M"
      },
      "references": [
        {
          "name": "https://github.com/kepano/defuddle/security/advisories/GHSA-5mq8-78gm-pjmq",
          "tags": [
            "x_refsource_CONFIRM"
          ],
          "url": "https://github.com/kepano/defuddle/security/advisories/GHSA-5mq8-78gm-pjmq"
        },
        {
          "name": "https://github.com/kepano/defuddle/commit/f154cb740ee603431b69638273af737a27156df9",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/kepano/defuddle/commit/f154cb740ee603431b69638273af737a27156df9"
        }
      ],
      "source": {
        "advisory": "GHSA-5mq8-78gm-pjmq",
        "discovery": "UNKNOWN"
      },
      "title": "Defuddle: XSS via unescaped string interpolation in _findContentBySchemaText image tag"
    }
  },
  "cveMetadata": {
    "assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
    "assignerShortName": "GitHub_M",
    "cveId": "CVE-2026-30830",
    "datePublished": "2026-03-07T05:49:15.964Z",
    "dateReserved": "2026-03-05T21:06:44.606Z",
    "dateUpdated": "2026-03-10T17:58:06.614Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.2"
}

Sightings

Author	Source	Type	Date	Other

Nomenclature

Seen: The vulnerability was mentioned, discussed, or observed by the user.
Confirmed: The vulnerability has been validated from an analyst's perspective.
Published Proof of Concept: A public proof of concept is available for this vulnerability.
Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
Not confirmed: The user expressed doubt about the validity of the vulnerability.
Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Detection rules are retrieved from Rulezet.

Action not permitted

GHSA-5MQ8-78GM-PJMQ

Summary

Details

PoC

Impact

Suggested Fix

CVE-2026-30830 (GCVE-0-2026-30830)

Tags

Sightings

Nomenclature