ghsa-v5r4-gjfm-m9v2
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: Fix lock dependency warning
====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted
kworker/8:2/2676 is trying to acquire lock: ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550
but task is already holding lock: ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&svms->lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #1 (&mm->mmap_lock){++++}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20
-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20
other info that might help us debug this:
Chain exists of: (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&svms->lock); lock(&mm->mmap_lock); lock(&svms->lock); lock((work_completion)(&svm_bo->eviction_work));
I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this.
To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated.
v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling.
{ "affected": [], "aliases": [ "CVE-2024-26628" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-03-06T07:15:13Z", "severity": null }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/amdkfd: Fix lock dependency warning\n\n======================================================\nWARNING: possible circular locking dependency detected\n6.5.0-kfd-fkuehlin #276 Not tainted\n------------------------------------------------------\nkworker/8:2/2676 is trying to acquire lock:\nffff9435aae95c88 ((work_completion)(\u0026svm_bo-\u003eeviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550\n\nbut task is already holding lock:\nffff9435cd8e1720 (\u0026svms-\u003elock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]\n\nwhich lock already depends on the new lock.\n\nthe existing dependency chain (in reverse order) is:\n\n-\u003e #2 (\u0026svms-\u003elock){+.+.}-{3:3}:\n __mutex_lock+0x97/0xd30\n kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]\n kfd_ioctl+0x1b2/0x5d0 [amdgpu]\n __x64_sys_ioctl+0x86/0xc0\n do_syscall_64+0x39/0x80\n entry_SYSCALL_64_after_hwframe+0x63/0xcd\n\n-\u003e #1 (\u0026mm-\u003emmap_lock){++++}-{3:3}:\n down_read+0x42/0x160\n svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]\n process_one_work+0x27a/0x540\n worker_thread+0x53/0x3e0\n kthread+0xeb/0x120\n ret_from_fork+0x31/0x50\n ret_from_fork_asm+0x11/0x20\n\n-\u003e #0 ((work_completion)(\u0026svm_bo-\u003eeviction_work)){+.+.}-{0:0}:\n __lock_acquire+0x1426/0x2200\n lock_acquire+0xc1/0x2b0\n __flush_work+0x80/0x550\n __cancel_work_timer+0x109/0x190\n svm_range_bo_release+0xdc/0x1c0 [amdgpu]\n svm_range_free+0x175/0x180 [amdgpu]\n svm_range_deferred_list_work+0x15d/0x340 [amdgpu]\n process_one_work+0x27a/0x540\n worker_thread+0x53/0x3e0\n kthread+0xeb/0x120\n ret_from_fork+0x31/0x50\n ret_from_fork_asm+0x11/0x20\n\nother info that might help us debug this:\n\nChain exists of:\n (work_completion)(\u0026svm_bo-\u003eeviction_work) --\u003e \u0026mm-\u003emmap_lock --\u003e \u0026svms-\u003elock\n\n Possible unsafe locking scenario:\n\n CPU0 CPU1\n ---- ----\n lock(\u0026svms-\u003elock);\n lock(\u0026mm-\u003emmap_lock);\n lock(\u0026svms-\u003elock);\n lock((work_completion)(\u0026svm_bo-\u003eeviction_work));\n\nI believe this cannot really lead to a deadlock in practice, because\nsvm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO\nrefcount is non-0. That means it\u0027s impossible that svm_range_bo_release\nis running concurrently. However, there is no good way to annotate this.\n\nTo avoid the problem, take a BO reference in\nsvm_range_schedule_evict_svm_bo instead of in the worker. That way it\u0027s\nimpossible for a BO to get freed while eviction work is pending and the\ncancel_work_sync call in svm_range_bo_release can be eliminated.\n\nv2: Use svm_bo_ref_unless_zero and explained why that\u0027s safe. Also\nremoved redundant checks that are already done in\namdkfd_fence_enable_signaling.", "id": "GHSA-v5r4-gjfm-m9v2", "modified": "2024-03-06T09:30:29Z", "published": "2024-03-06T09:30:29Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-26628" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/28d2d623d2fbddcca5c24600474e92f16ebb3a05" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/47bf0f83fc86df1bf42b385a91aadb910137c5c9" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/7a70663ba02bd4e19aea8d70c979eb3bd03d839d" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/8b25d397162b0316ceda40afaa63ee0c4a97d28b" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/cb96e492d72d143d57db2d2bc143a1cee8741807" } ], "schema_version": "1.4.0", "severity": [] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.