ghsa-mm63-c923-gw6c
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction
A hung_task problem shown below was found:
INFO: task kworker/0:0:8 blocked for more than 327 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Workqueue: events cgroup_bpf_release
Call Trace:
This issue can be reproduced by the following pressuse test: 1. A large number of cpuset cgroups are deleted. 2. Set cpu on and off repeatly. 3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature.
The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks, which may lead to deadlock. It can lead to a deadlock through the following steps: 1. A large number of cpusets are deleted asynchronously, which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are cgroup_bpf_release works, and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram, there are 256 (in the acvtive queue) + n (in the inactive queue) works. 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release), so it will be blocked for a while. 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2. 4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn, it will call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read. However, cpuset_css_offline will be blocked for step 3. 5. At this moment, there are 256 works in active queue that are cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as a result, all of them are blocked. Consequently, sscs.work can not be executed. Ultimately, this situation leads to four processes being blocked, forming a deadlock.
system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) ... 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) ...blocking... css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) ...blocking... 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking...
To fix the problem, place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn't create a large number of concurrent work items. If something is going to generate > ---truncated---
{ "affected": [], "aliases": [ "CVE-2024-53054" ], "database_specific": { "cwe_ids": [ "CWE-667" ], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-11-19T18:15:25Z", "severity": "MODERATE" }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ncgroup/bpf: use a dedicated workqueue for cgroup bpf destruction\n\nA hung_task problem shown below was found:\n\nINFO: task kworker/0:0:8 blocked for more than 327 seconds.\n\"echo 0 \u003e /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\nWorkqueue: events cgroup_bpf_release\nCall Trace:\n \u003cTASK\u003e\n __schedule+0x5a2/0x2050\n ? find_held_lock+0x33/0x100\n ? wq_worker_sleeping+0x9e/0xe0\n schedule+0x9f/0x180\n schedule_preempt_disabled+0x25/0x50\n __mutex_lock+0x512/0x740\n ? cgroup_bpf_release+0x1e/0x4d0\n ? cgroup_bpf_release+0xcf/0x4d0\n ? process_scheduled_works+0x161/0x8a0\n ? cgroup_bpf_release+0x1e/0x4d0\n ? mutex_lock_nested+0x2b/0x40\n ? __pfx_delay_tsc+0x10/0x10\n mutex_lock_nested+0x2b/0x40\n cgroup_bpf_release+0xcf/0x4d0\n ? process_scheduled_works+0x161/0x8a0\n ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0\n ? process_scheduled_works+0x161/0x8a0\n process_scheduled_works+0x23a/0x8a0\n worker_thread+0x231/0x5b0\n ? __pfx_worker_thread+0x10/0x10\n kthread+0x14d/0x1c0\n ? __pfx_kthread+0x10/0x10\n ret_from_fork+0x59/0x70\n ? __pfx_kthread+0x10/0x10\n ret_from_fork_asm+0x1b/0x30\n \u003c/TASK\u003e\n\nThis issue can be reproduced by the following pressuse test:\n1. A large number of cpuset cgroups are deleted.\n2. Set cpu on and off repeatly.\n3. Set watchdog_thresh repeatly.\nThe scripts can be obtained at LINK mentioned above the signature.\n\nThe reason for this issue is cgroup_mutex and cpu_hotplug_lock are\nacquired in different tasks, which may lead to deadlock.\nIt can lead to a deadlock through the following steps:\n1. A large number of cpusets are deleted asynchronously, which puts a\n large number of cgroup_bpf_release works into system_wq. The max_active\n of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are\n cgroup_bpf_release works, and many cgroup_bpf_release works will be put\n into inactive queue. As illustrated in the diagram, there are 256 (in\n the acvtive queue) + n (in the inactive queue) works.\n2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put\n smp_call_on_cpu work into system_wq. However step 1 has already filled\n system_wq, \u0027sscs.work\u0027 is put into inactive queue. \u0027sscs.work\u0027 has\n to wait until the works that were put into the inacvtive queue earlier\n have executed (n cgroup_bpf_release), so it will be blocked for a while.\n3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.\n4. Cpusets that were deleted at step 1 put cgroup_release works into\n cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.\n When cgroup_metux is acqured by work at css_killed_work_fn, it will\n call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.\n However, cpuset_css_offline will be blocked for step 3.\n5. At this moment, there are 256 works in active queue that are\n cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as\n a result, all of them are blocked. Consequently, sscs.work can not be\n executed. Ultimately, this situation leads to four processes being\n blocked, forming a deadlock.\n\nsystem_wq(step1)\t\tWatchDog(step2)\t\t\tcpu offline(step3)\tcgroup_destroy_wq(step4)\n...\n2000+ cgroups deleted asyn\n256 actives + n inactives\n\t\t\t\t__lockup_detector_reconfigure\n\t\t\t\tP(cpu_hotplug_lock.read)\n\t\t\t\tput sscs.work into system_wq\n256 + n + 1(sscs.work)\nsscs.work wait to be executed\n\t\t\t\twarting sscs.work finish\n\t\t\t\t\t\t\t\tpercpu_down_write\n\t\t\t\t\t\t\t\tP(cpu_hotplug_lock.write)\n\t\t\t\t\t\t\t\t...blocking...\n\t\t\t\t\t\t\t\t\t\t\tcss_killed_work_fn\n\t\t\t\t\t\t\t\t\t\t\tP(cgroup_mutex)\n\t\t\t\t\t\t\t\t\t\t\tcpuset_css_offline\n\t\t\t\t\t\t\t\t\t\t\tP(cpu_hotplug_lock.read)\n\t\t\t\t\t\t\t\t\t\t\t...blocking...\n256 cgroup_bpf_release\nmutex_lock(\u0026cgroup_mutex);\n..blocking...\n\nTo fix the problem, place cgroup_bpf_release works on a dedicated\nworkqueue which can break the loop and solve the problem. System wqs are\nfor misc things which shouldn\u0027t create a large number of concurrent work\nitems. If something is going to generate \u003e\n---truncated---", "id": "GHSA-mm63-c923-gw6c", "modified": "2024-11-22T21:32:13Z", "published": "2024-11-19T18:31:06Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-53054" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/0d86cd70fc6a7ba18becb52ad8334d5ad3eca530" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/117932eea99b729ee5d12783601a4f7f5fd58a23" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/6dab3331523ba73db1345d19e6f586dcd5f6efb4" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/71f14a9f5c7db72fdbc56e667d4ed42a1a760494" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "type": "CVSS_V3" } ] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.