ghsa-v5r4-gjfm-m9v2
Vulnerability from github
Published
2024-03-06 09:30
Modified
2024-03-06 09:30
Details

In the Linux kernel, the following vulnerability has been resolved:

drm/amdkfd: Fix lock dependency warning

====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin #276 Not tainted


kworker/8:2/2676 is trying to acquire lock: ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550

but task is already holding lock: ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&svms->lock){+.+.}-{3:3}: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #1 (&mm->mmap_lock){++++}-{3:3}: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20

-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20

other info that might help us debug this:

Chain exists of: (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock

Possible unsafe locking scenario:

   CPU0                    CPU1
   ----                    ----

lock(&svms->lock); lock(&mm->mmap_lock); lock(&svms->lock); lock((work_completion)(&svm_bo->eviction_work));

I believe this cannot really lead to a deadlock in practice, because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it's impossible that svm_range_bo_release is running concurrently. However, there is no good way to annotate this.

To avoid the problem, take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it's impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated.

v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling.

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2024-26628"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-03-06T07:15:13Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/amdkfd: Fix lock dependency warning\n\n======================================================\nWARNING: possible circular locking dependency detected\n6.5.0-kfd-fkuehlin #276 Not tainted\n------------------------------------------------------\nkworker/8:2/2676 is trying to acquire lock:\nffff9435aae95c88 ((work_completion)(\u0026svm_bo-\u003eeviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550\n\nbut task is already holding lock:\nffff9435cd8e1720 (\u0026svms-\u003elock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]\n\nwhich lock already depends on the new lock.\n\nthe existing dependency chain (in reverse order) is:\n\n-\u003e #2 (\u0026svms-\u003elock){+.+.}-{3:3}:\n       __mutex_lock+0x97/0xd30\n       kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]\n       kfd_ioctl+0x1b2/0x5d0 [amdgpu]\n       __x64_sys_ioctl+0x86/0xc0\n       do_syscall_64+0x39/0x80\n       entry_SYSCALL_64_after_hwframe+0x63/0xcd\n\n-\u003e #1 (\u0026mm-\u003emmap_lock){++++}-{3:3}:\n       down_read+0x42/0x160\n       svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]\n       process_one_work+0x27a/0x540\n       worker_thread+0x53/0x3e0\n       kthread+0xeb/0x120\n       ret_from_fork+0x31/0x50\n       ret_from_fork_asm+0x11/0x20\n\n-\u003e #0 ((work_completion)(\u0026svm_bo-\u003eeviction_work)){+.+.}-{0:0}:\n       __lock_acquire+0x1426/0x2200\n       lock_acquire+0xc1/0x2b0\n       __flush_work+0x80/0x550\n       __cancel_work_timer+0x109/0x190\n       svm_range_bo_release+0xdc/0x1c0 [amdgpu]\n       svm_range_free+0x175/0x180 [amdgpu]\n       svm_range_deferred_list_work+0x15d/0x340 [amdgpu]\n       process_one_work+0x27a/0x540\n       worker_thread+0x53/0x3e0\n       kthread+0xeb/0x120\n       ret_from_fork+0x31/0x50\n       ret_from_fork_asm+0x11/0x20\n\nother info that might help us debug this:\n\nChain exists of:\n  (work_completion)(\u0026svm_bo-\u003eeviction_work) --\u003e \u0026mm-\u003emmap_lock --\u003e \u0026svms-\u003elock\n\n Possible unsafe locking scenario:\n\n       CPU0                    CPU1\n       ----                    ----\n  lock(\u0026svms-\u003elock);\n                               lock(\u0026mm-\u003emmap_lock);\n                               lock(\u0026svms-\u003elock);\n  lock((work_completion)(\u0026svm_bo-\u003eeviction_work));\n\nI believe this cannot really lead to a deadlock in practice, because\nsvm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO\nrefcount is non-0. That means it\u0027s impossible that svm_range_bo_release\nis running concurrently. However, there is no good way to annotate this.\n\nTo avoid the problem, take a BO reference in\nsvm_range_schedule_evict_svm_bo instead of in the worker. That way it\u0027s\nimpossible for a BO to get freed while eviction work is pending and the\ncancel_work_sync call in svm_range_bo_release can be eliminated.\n\nv2: Use svm_bo_ref_unless_zero and explained why that\u0027s safe. Also\nremoved redundant checks that are already done in\namdkfd_fence_enable_signaling.",
  "id": "GHSA-v5r4-gjfm-m9v2",
  "modified": "2024-03-06T09:30:29Z",
  "published": "2024-03-06T09:30:29Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-26628"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/28d2d623d2fbddcca5c24600474e92f16ebb3a05"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/47bf0f83fc86df1bf42b385a91aadb910137c5c9"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/7a70663ba02bd4e19aea8d70c979eb3bd03d839d"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/8b25d397162b0316ceda40afaa63ee0c4a97d28b"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/cb96e492d72d143d57db2d2bc143a1cee8741807"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading...

Loading...

Loading...

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.