ghsa-mm63-c923-gw6c
Vulnerability from github
Published
2024-11-19 18:31
Modified
2024-11-22 21:32
Details

In the Linux kernel, the following vulnerability has been resolved:

cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction

A hung_task problem shown below was found:

INFO: task kworker/0:0:8 blocked for more than 327 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Workqueue: events cgroup_bpf_release Call Trace: __schedule+0x5a2/0x2050 ? find_held_lock+0x33/0x100 ? wq_worker_sleeping+0x9e/0xe0 schedule+0x9f/0x180 schedule_preempt_disabled+0x25/0x50 __mutex_lock+0x512/0x740 ? cgroup_bpf_release+0x1e/0x4d0 ? cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? cgroup_bpf_release+0x1e/0x4d0 ? mutex_lock_nested+0x2b/0x40 ? __pfx_delay_tsc+0x10/0x10 mutex_lock_nested+0x2b/0x40 cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0 ? process_scheduled_works+0x161/0x8a0 process_scheduled_works+0x23a/0x8a0 worker_thread+0x231/0x5b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x14d/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x59/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30

This issue can be reproduced by the following pressuse test: 1. A large number of cpuset cgroups are deleted. 2. Set cpu on and off repeatly. 3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature.

The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks, which may lead to deadlock. It can lead to a deadlock through the following steps: 1. A large number of cpusets are deleted asynchronously, which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are cgroup_bpf_release works, and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram, there are 256 (in the acvtive queue) + n (in the inactive queue) works. 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release), so it will be blocked for a while. 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2. 4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn, it will call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read. However, cpuset_css_offline will be blocked for step 3. 5. At this moment, there are 256 works in active queue that are cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as a result, all of them are blocked. Consequently, sscs.work can not be executed. Ultimately, this situation leads to four processes being blocked, forming a deadlock.

system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) ... 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) ...blocking... css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) ...blocking... 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking...

To fix the problem, place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn't create a large number of concurrent work items. If something is going to generate > ---truncated---

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2024-53054"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-667"
    ],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-11-19T18:15:25Z",
    "severity": "MODERATE"
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ncgroup/bpf: use a dedicated workqueue for cgroup bpf destruction\n\nA hung_task problem shown below was found:\n\nINFO: task kworker/0:0:8 blocked for more than 327 seconds.\n\"echo 0 \u003e /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\nWorkqueue: events cgroup_bpf_release\nCall Trace:\n \u003cTASK\u003e\n __schedule+0x5a2/0x2050\n ? find_held_lock+0x33/0x100\n ? wq_worker_sleeping+0x9e/0xe0\n schedule+0x9f/0x180\n schedule_preempt_disabled+0x25/0x50\n __mutex_lock+0x512/0x740\n ? cgroup_bpf_release+0x1e/0x4d0\n ? cgroup_bpf_release+0xcf/0x4d0\n ? process_scheduled_works+0x161/0x8a0\n ? cgroup_bpf_release+0x1e/0x4d0\n ? mutex_lock_nested+0x2b/0x40\n ? __pfx_delay_tsc+0x10/0x10\n mutex_lock_nested+0x2b/0x40\n cgroup_bpf_release+0xcf/0x4d0\n ? process_scheduled_works+0x161/0x8a0\n ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0\n ? process_scheduled_works+0x161/0x8a0\n process_scheduled_works+0x23a/0x8a0\n worker_thread+0x231/0x5b0\n ? __pfx_worker_thread+0x10/0x10\n kthread+0x14d/0x1c0\n ? __pfx_kthread+0x10/0x10\n ret_from_fork+0x59/0x70\n ? __pfx_kthread+0x10/0x10\n ret_from_fork_asm+0x1b/0x30\n \u003c/TASK\u003e\n\nThis issue can be reproduced by the following pressuse test:\n1. A large number of cpuset cgroups are deleted.\n2. Set cpu on and off repeatly.\n3. Set watchdog_thresh repeatly.\nThe scripts can be obtained at LINK mentioned above the signature.\n\nThe reason for this issue is cgroup_mutex and cpu_hotplug_lock are\nacquired in different tasks, which may lead to deadlock.\nIt can lead to a deadlock through the following steps:\n1. A large number of cpusets are deleted asynchronously, which puts a\n   large number of cgroup_bpf_release works into system_wq. The max_active\n   of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are\n   cgroup_bpf_release works, and many cgroup_bpf_release works will be put\n   into inactive queue. As illustrated in the diagram, there are 256 (in\n   the acvtive queue) + n (in the inactive queue) works.\n2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put\n   smp_call_on_cpu work into system_wq. However step 1 has already filled\n   system_wq, \u0027sscs.work\u0027 is put into inactive queue. \u0027sscs.work\u0027 has\n   to wait until the works that were put into the inacvtive queue earlier\n   have executed (n cgroup_bpf_release), so it will be blocked for a while.\n3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2.\n4. Cpusets that were deleted at step 1 put cgroup_release works into\n   cgroup_destroy_wq. They are competing to get cgroup_mutex all the time.\n   When cgroup_metux is acqured by work at css_killed_work_fn, it will\n   call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read.\n   However, cpuset_css_offline will be blocked for step 3.\n5. At this moment, there are 256 works in active queue that are\n   cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as\n   a result, all of them are blocked. Consequently, sscs.work can not be\n   executed. Ultimately, this situation leads to four processes being\n   blocked, forming a deadlock.\n\nsystem_wq(step1)\t\tWatchDog(step2)\t\t\tcpu offline(step3)\tcgroup_destroy_wq(step4)\n...\n2000+ cgroups deleted asyn\n256 actives + n inactives\n\t\t\t\t__lockup_detector_reconfigure\n\t\t\t\tP(cpu_hotplug_lock.read)\n\t\t\t\tput sscs.work into system_wq\n256 + n + 1(sscs.work)\nsscs.work wait to be executed\n\t\t\t\twarting sscs.work finish\n\t\t\t\t\t\t\t\tpercpu_down_write\n\t\t\t\t\t\t\t\tP(cpu_hotplug_lock.write)\n\t\t\t\t\t\t\t\t...blocking...\n\t\t\t\t\t\t\t\t\t\t\tcss_killed_work_fn\n\t\t\t\t\t\t\t\t\t\t\tP(cgroup_mutex)\n\t\t\t\t\t\t\t\t\t\t\tcpuset_css_offline\n\t\t\t\t\t\t\t\t\t\t\tP(cpu_hotplug_lock.read)\n\t\t\t\t\t\t\t\t\t\t\t...blocking...\n256 cgroup_bpf_release\nmutex_lock(\u0026cgroup_mutex);\n..blocking...\n\nTo fix the problem, place cgroup_bpf_release works on a dedicated\nworkqueue which can break the loop and solve the problem. System wqs are\nfor misc things which shouldn\u0027t create a large number of concurrent work\nitems. If something is going to generate \u003e\n---truncated---",
  "id": "GHSA-mm63-c923-gw6c",
  "modified": "2024-11-22T21:32:13Z",
  "published": "2024-11-19T18:31:06Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-53054"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/0d86cd70fc6a7ba18becb52ad8334d5ad3eca530"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/117932eea99b729ee5d12783601a4f7f5fd58a23"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/6dab3331523ba73db1345d19e6f586dcd5f6efb4"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/71f14a9f5c7db72fdbc56e667d4ed42a1a760494"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.