ghsa-95mq-4jg4-hpq2
Vulnerability from github
Published
2024-05-21 18:31
Modified
2024-05-21 18:31
Details

In the Linux kernel, the following vulnerability has been resolved:

drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...]

To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2023-52738"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-05-21T16:15:13Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini\n\nCurrently amdgpu calls drm_sched_fini() from the fence driver sw fini\nroutine - such function is expected to be called only after the\nrespective init function - drm_sched_init() - was executed successfully.\n\nHappens that we faced a driver probe failure in the Steam Deck\nrecently, and the function drm_sched_fini() was called even without\nits counter-part had been previously called, causing the following oops:\n\namdgpu: probe of 0000:04:00.0 failed with error -110\nBUG: kernel NULL pointer dereference, address: 0000000000000090\nPGD 0 P4D 0\nOops: 0002 [#1] PREEMPT SMP NOPTI\nCPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338\nHardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022\nRIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]\n[...]\nCall Trace:\n \u003cTASK\u003e\n amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]\n amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]\n amdgpu_driver_release_kms+0x16/0x30 [amdgpu]\n devm_drm_dev_init_release+0x49/0x70\n [...]\n\nTo prevent that, check if the drm_sched was properly initialized for a\ngiven ring before calling its fini counter-part.\n\nNotice ideally we\u0027d use sched.ready for that; such field is set as the latest\nthing on drm_sched_init(). But amdgpu seems to \"override\" the meaning of such\nfield - in the above oops for example, it was a GFX ring causing the crash, and\nthe sched.ready field was set to true in the ring init routine, regardless of\nthe state of the DRM scheduler. Hence, we ended-up using sched.ops as per\nChristian\u0027s suggestion [0], and also removed the no_scheduler check [1].\n\n[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/\n[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/",
  "id": "GHSA-95mq-4jg4-hpq2",
  "modified": "2024-05-21T18:31:19Z",
  "published": "2024-05-21T18:31:19Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2023-52738"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/2bcbbef9cace772f5b7128b11401c515982de34b"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/2e557c8ca2c585bdef591b8503ba83b85f5d0afd"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/5ad7bbf3dba5c4a684338df1f285080f2588b535"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading...

Loading...