ghsa-gghh-gpx3-pmx6
Vulnerability from github
Published
2024-05-01 06:31
Modified
2024-05-01 06:31
Details

In the Linux kernel, the following vulnerability has been resolved:

drm/i915/vma: Fix UAF on destroy against retire race

Object debugging tools were sporadically reporting illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle.

[161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915] [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0 ... [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1 [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022 [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915] [161.360592] RIP: 0010:debug_print_object+0x80/0xb0 ... [161.361347] debug_object_free+0xeb/0x110 [161.361362] i915_active_fini+0x14/0x130 [i915] [161.361866] release_references+0xfe/0x1f0 [i915] [161.362543] i915_vma_parked+0x1db/0x380 [i915] [161.363129] __gt_park+0x121/0x230 [i915] [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]

That has been tracked down to be happening when another thread is deactivating the VMA inside __active_retire() helper, after the VMA's active counter has been already decremented to 0, but before deactivation of the VMA's object is reported to the object debugging tool.

We could prevent from that race by serializing i915_active_fini() with __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from being used, e.g. from __i915_vma_retire() called at the end of __active_retire(), after that VMA has been already freed by a concurrent i915_vma_destroy() on return from the i915_active_fini(). Then, we should rather fix the issue at the VMA level, not in i915_active.

Since __i915_vma_parked() is called from __gt_park() on last put of the GT's wakeref, the issue could be addressed by holding the GT wakeref long enough for __active_retire() to complete before that wakeref is released and the GT parked.

I believe the issue was introduced by commit d93939730347 ("drm/i915: Remove the vma refcount") which moved a call to i915_active_fini() from a dropped i915_vma_release(), called on last put of the removed VMA kref, to i915_vma_parked() processing path called on last put of a GT wakeref. However, its visibility to the object debugging tool was suppressed by a bug in i915_active that was fixed two weeks later with commit e92eb246feb9 ("drm/i915/active: Fix missing debug object activation").

A VMA associated with a request doesn't acquire a GT wakeref by itself. Instead, it depends on a wakeref held directly by the request's active intel_context for a GT associated with its VM, and indirectly on that intel_context's engine wakeref if the engine belongs to the same GT as the VMA's VM. Those wakerefs are released asynchronously to VMA deactivation.

Fix the issue by getting a wakeref for the VMA's GT when activating it, and putting that wakeref only after the VMA is deactivated. However, exclude global GTT from that processing path, otherwise the GPU never goes idle. Since __i915_vma_retire() may be called from atomic contexts, use async variant of wakeref put. Also, to avoid circular locking dependency, take care of acquiring the wakeref before VM mutex when both are needed.

v7: Add inline comments with justifications for: - using untracked variants of intel_gt_pm_get/put() (Nirmoy), - using async variant of _put(), - not getting the wakeref in case of a global GTT, - always getting the first wakeref outside vm->mutex. v6: Since __i915_vma_active/retire() callbacks are not serialized, storing a wakeref tracking handle inside struct i915_vma is not safe, and there is no other good place for that. Use untracked variants of intel_gt_pm_get/put_async(). v5: Replace "tile" with "GT" across commit description (Rodrigo), - ---truncated---

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2024-26939"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-416"
    ],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-05-01T06:15:09Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/i915/vma: Fix UAF on destroy against retire race\n\nObject debugging tools were sporadically reporting illegal attempts to\nfree a still active i915 VMA object when parking a GT believed to be idle.\n\n[161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]\n[161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0\n...\n[161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1\n[161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022\n[161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]\n[161.360592] RIP: 0010:debug_print_object+0x80/0xb0\n...\n[161.361347] debug_object_free+0xeb/0x110\n[161.361362] i915_active_fini+0x14/0x130 [i915]\n[161.361866] release_references+0xfe/0x1f0 [i915]\n[161.362543] i915_vma_parked+0x1db/0x380 [i915]\n[161.363129] __gt_park+0x121/0x230 [i915]\n[161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]\n\nThat has been tracked down to be happening when another thread is\ndeactivating the VMA inside __active_retire() helper, after the VMA\u0027s\nactive counter has been already decremented to 0, but before deactivation\nof the VMA\u0027s object is reported to the object debugging tool.\n\nWe could prevent from that race by serializing i915_active_fini() with\n__active_retire() via ref-\u003etree_lock, but that wouldn\u0027t stop the VMA from\nbeing used, e.g. from __i915_vma_retire() called at the end of\n__active_retire(), after that VMA has been already freed by a concurrent\ni915_vma_destroy() on return from the i915_active_fini().  Then, we should\nrather fix the issue at the VMA level, not in i915_active.\n\nSince __i915_vma_parked() is called from __gt_park() on last put of the\nGT\u0027s wakeref, the issue could be addressed by holding the GT wakeref long\nenough for __active_retire() to complete before that wakeref is released\nand the GT parked.\n\nI believe the issue was introduced by commit d93939730347 (\"drm/i915:\nRemove the vma refcount\") which moved a call to i915_active_fini() from\na dropped i915_vma_release(), called on last put of the removed VMA kref,\nto i915_vma_parked() processing path called on last put of a GT wakeref.\nHowever, its visibility to the object debugging tool was suppressed by a\nbug in i915_active that was fixed two weeks later with commit e92eb246feb9\n(\"drm/i915/active: Fix missing debug object activation\").\n\nA VMA associated with a request doesn\u0027t acquire a GT wakeref by itself.\nInstead, it depends on a wakeref held directly by the request\u0027s active\nintel_context for a GT associated with its VM, and indirectly on that\nintel_context\u0027s engine wakeref if the engine belongs to the same GT as the\nVMA\u0027s VM.  Those wakerefs are released asynchronously to VMA deactivation.\n\nFix the issue by getting a wakeref for the VMA\u0027s GT when activating it,\nand putting that wakeref only after the VMA is deactivated.  However,\nexclude global GTT from that processing path, otherwise the GPU never goes\nidle.  Since __i915_vma_retire() may be called from atomic contexts, use\nasync variant of wakeref put.  Also, to avoid circular locking dependency,\ntake care of acquiring the wakeref before VM mutex when both are needed.\n\nv7: Add inline comments with justifications for:\n    - using untracked variants of intel_gt_pm_get/put() (Nirmoy),\n    - using async variant of _put(),\n    - not getting the wakeref in case of a global GTT,\n    - always getting the first wakeref outside vm-\u003emutex.\nv6: Since __i915_vma_active/retire() callbacks are not serialized, storing\n    a wakeref tracking handle inside struct i915_vma is not safe, and\n    there is no other good place for that.  Use untracked variants of\n    intel_gt_pm_get/put_async().\nv5: Replace \"tile\" with \"GT\" across commit description (Rodrigo),\n  - \n---truncated---",
  "id": "GHSA-gghh-gpx3-pmx6",
  "modified": "2024-05-01T06:31:41Z",
  "published": "2024-05-01T06:31:41Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-26939"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/0e45882ca829b26b915162e8e86dbb1095768e9e"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/59b2626dd8c8a2e13f18054b3530e0c00073d79f"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/5e3eb862df9f972ab677fb19e0d4b9b1be8db7b5"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/704edc9252f4988ae1ad7dafa23d0db8d90d7190"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading...

Loading...

Loading...
  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.