cve-2024-35931
Vulnerability from cvelistv5
Published
2024-05-19 10:10
Modified
2024-09-11 17:33
Severity
Summary
drm/amdgpu: Skip do PCI error slot reset during RAS recovery
Impacted products
VendorProduct
LinuxLinux
LinuxLinux
Show details on NVD website


{
  "containers": {
    "adp": [
      {
        "providerMetadata": {
          "dateUpdated": "2024-08-02T03:21:49.006Z",
          "orgId": "af854a3a-2127-422b-91ae-364da2661108",
          "shortName": "CVE"
        },
        "references": [
          {
            "tags": [
              "x_transferred"
            ],
            "url": "https://git.kernel.org/stable/c/395ca1031acf89d8ecb26127c544a71688d96f35"
          },
          {
            "tags": [
              "x_transferred"
            ],
            "url": "https://git.kernel.org/stable/c/601429cca96b4af3be44172c3b64e4228515dbe1"
          }
        ],
        "title": "CVE Program Container"
      },
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2024-35931",
                "options": [
                  {
                    "Exploitation": "none"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2024-09-10T15:41:01.828598Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2024-09-11T17:33:15.513Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "defaultStatus": "unaffected",
          "product": "Linux",
          "programFiles": [
            "drivers/gpu/drm/amd/amdgpu/amdgpu_device.c"
          ],
          "repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git",
          "vendor": "Linux",
          "versions": [
            {
              "lessThan": "395ca1031acf",
              "status": "affected",
              "version": "1da177e4c3f4",
              "versionType": "git"
            },
            {
              "lessThan": "601429cca96b",
              "status": "affected",
              "version": "1da177e4c3f4",
              "versionType": "git"
            }
          ]
        },
        {
          "defaultStatus": "affected",
          "product": "Linux",
          "programFiles": [
            "drivers/gpu/drm/amd/amdgpu/amdgpu_device.c"
          ],
          "repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git",
          "vendor": "Linux",
          "versions": [
            {
              "lessThanOrEqual": "6.8.*",
              "status": "unaffected",
              "version": "6.8.6",
              "versionType": "custom"
            },
            {
              "lessThanOrEqual": "*",
              "status": "unaffected",
              "version": "6.9",
              "versionType": "original_commit_for_fix"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/amdgpu: Skip do PCI error slot reset during RAS recovery\n\nWhy:\n    The PCI error slot reset maybe triggered after inject ue to UMC multi times, this\n    caused system hang.\n    [  557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume\n    [  557.373718] [drm] PCIE GART of 512M enabled.\n    [  557.373722] [drm] PTB located at 0x0000031FED700000\n    [  557.373788] [drm] VRAM is lost due to GPU reset!\n    [  557.373789] [drm] PSP is resuming...\n    [  557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset\n    [  557.547067] [drm] PCI error: detected callback, state(1)!!\n    [  557.547069] [drm] No support for XGMI hive yet...\n    [  557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter\n    [  557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations\n    [  557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered\n    [  557.610492] [drm] PCI error: slot reset callback!!\n    ...\n    [  560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!\n    [  560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!\n    [  560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI\n    [  560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G           OE     5.15.0-91-generic #101-Ubuntu\n    [  560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023\n    [  560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]\n    [  560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]\n    [  560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff \u003c48\u003e 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00\n    [  560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202\n    [  560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0\n    [  560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010\n    [  560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08\n    [  560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000\n    [  560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000\n    [  560.803889] FS:  0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000\n    [  560.812973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033\n    [  560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0\n    [  560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000\n    [  560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400\n    [  560.843444] PKRU: 55555554\n    [  560.846480] Call Trace:\n    [  560.849225]  \u003cTASK\u003e\n    [  560.851580]  ? show_trace_log_lvl+0x1d6/0x2ea\n    [  560.856488]  ? show_trace_log_lvl+0x1d6/0x2ea\n    [  560.861379]  ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]\n    [  560.867778]  ? show_regs.part.0+0x23/0x29\n    [  560.872293]  ? __die_body.cold+0x8/0xd\n    [  560.876502]  ? die_addr+0x3e/0x60\n    [  560.880238]  ? exc_general_protection+0x1c5/0x410\n    [  560.885532]  ? asm_exc_general_protection+0x27/0x30\n    [  560.891025]  ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]\n    [  560.898323]  amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]\n    [  560.904520]  process_one_work+0x228/0x3d0\nHow:\n    In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected\n    all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure."
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2024-05-29T05:31:27.658Z",
        "orgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
        "shortName": "Linux"
      },
      "references": [
        {
          "url": "https://git.kernel.org/stable/c/395ca1031acf89d8ecb26127c544a71688d96f35"
        },
        {
          "url": "https://git.kernel.org/stable/c/601429cca96b4af3be44172c3b64e4228515dbe1"
        }
      ],
      "title": "drm/amdgpu: Skip do PCI error slot reset during RAS recovery",
      "x_generator": {
        "engine": "bippy-a5840b7849dd"
      }
    }
  },
  "cveMetadata": {
    "assignerOrgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
    "assignerShortName": "Linux",
    "cveId": "CVE-2024-35931",
    "datePublished": "2024-05-19T10:10:39.706Z",
    "dateReserved": "2024-05-17T13:50:33.129Z",
    "dateUpdated": "2024-09-11T17:33:15.513Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.1",
  "meta": {
    "nvd": "{\"cve\":{\"id\":\"CVE-2024-35931\",\"sourceIdentifier\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\",\"published\":\"2024-05-19T11:15:49.133\",\"lastModified\":\"2024-05-20T13:00:04.957\",\"vulnStatus\":\"Awaiting Analysis\",\"descriptions\":[{\"lang\":\"en\",\"value\":\"In the Linux kernel, the following vulnerability has been resolved:\\n\\ndrm/amdgpu: Skip do PCI error slot reset during RAS recovery\\n\\nWhy:\\n    The PCI error slot reset maybe triggered after inject ue to UMC multi times, this\\n    caused system hang.\\n    [  557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume\\n    [  557.373718] [drm] PCIE GART of 512M enabled.\\n    [  557.373722] [drm] PTB located at 0x0000031FED700000\\n    [  557.373788] [drm] VRAM is lost due to GPU reset!\\n    [  557.373789] [drm] PSP is resuming...\\n    [  557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset\\n    [  557.547067] [drm] PCI error: detected callback, state(1)!!\\n    [  557.547069] [drm] No support for XGMI hive yet...\\n    [  557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter\\n    [  557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations\\n    [  557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered\\n    [  557.610492] [drm] PCI error: slot reset callback!!\\n    ...\\n    [  560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!\\n    [  560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!\\n    [  560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI\\n    [  560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G           OE     5.15.0-91-generic #101-Ubuntu\\n    [  560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023\\n    [  560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]\\n    [  560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]\\n    [  560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff \u003c48\u003e 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00\\n    [  560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202\\n    [  560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0\\n    [  560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010\\n    [  560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08\\n    [  560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000\\n    [  560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000\\n    [  560.803889] FS:  0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000\\n    [  560.812973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033\\n    [  560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0\\n    [  560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000\\n    [  560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400\\n    [  560.843444] PKRU: 55555554\\n    [  560.846480] Call Trace:\\n    [  560.849225]  \u003cTASK\u003e\\n    [  560.851580]  ? show_trace_log_lvl+0x1d6/0x2ea\\n    [  560.856488]  ? show_trace_log_lvl+0x1d6/0x2ea\\n    [  560.861379]  ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]\\n    [  560.867778]  ? show_regs.part.0+0x23/0x29\\n    [  560.872293]  ? __die_body.cold+0x8/0xd\\n    [  560.876502]  ? die_addr+0x3e/0x60\\n    [  560.880238]  ? exc_general_protection+0x1c5/0x410\\n    [  560.885532]  ? asm_exc_general_protection+0x27/0x30\\n    [  560.891025]  ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]\\n    [  560.898323]  amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]\\n    [  560.904520]  process_one_work+0x228/0x3d0\\nHow:\\n    In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected\\n    all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure.\"},{\"lang\":\"es\",\"value\":\"En el kernel de Linux, se resolvi\u00f3 la siguiente vulnerabilidad: drm/amdgpu: Omitir el restablecimiento de la ranura de error PCI durante la recuperaci\u00f3n de RAS Por qu\u00e9: El restablecimiento de la ranura de error PCI puede activarse despu\u00e9s de inyectar ue en UMC varias veces, lo que provoc\u00f3 que el sistema se bloqueara. [ 557.371857] amdgpu 0000:af:00.0: amdgpu: el reinicio de la GPU se realiz\u00f3 correctamente, intentando reanudar [ 557.373718] [drm] PCIE GART de 512M habilitado. [ 557.373722] [drm] PTB ubicado en 0x0000031FED700000 [ 557.373788] [drm] \u00a1La VRAM se pierde debido al reinicio de la GPU! [ 557.373789] [drm] PSP se est\u00e1 reanudando... [ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Estado del dispositivo = 1 pci_status: 0. Salir, resultado = 3, es necesario restablecer [ 557.547067] [drm] Error de PCI: devoluci\u00f3n de llamada detectada , estado(1)!! [ 557.547069] [drm] A\u00fan no hay soporte para XGMI hive... [ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Estado del dispositivo = 1 pci_status: 0. Ingrese [ 557.607763] mlx5_core 0000:55:00.0: espere el valor del contador vital 0x16b5b despu\u00e9s de 1 iteraci\u00f3n [557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Estado del dispositivo = 1 pci_status: 1. Salir, err = 0, resultado = 5, recuperado [557.610492] [drm] Error de PCI: \u00a1devoluci\u00f3n de llamada de restablecimiento de ranura! ... [ 560.689382] amdgpu 0000:3f:00.0: amdgpu: \u00a1El reinicio de GPU (2) se realiz\u00f3 correctamente! [ 560.689546] amdgpu 0000:5a:00.0: amdgpu: \u00a1El reinicio de GPU (2) se realiz\u00f3 correctamente! [ 560.689562] falla de protecci\u00f3n general, probablemente para direcci\u00f3n no can\u00f3nica 0x5f080b54534f611f: 0000 [#1] SMP NOPTI [ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic # 101-Ubuntu [ 560.712057] Nombre de hardware: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 08/11/2023 [ 560.720959] Cola de trabajo: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu] [ 87] QEPD: 0010: amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.736891] C\u00f3digo: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff \u0026lt;48\u0026gt; 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00 [ 560.757967] RSP: 0018:ffa0000032e53d80 00010202 [560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0 [ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010 [ 560.779867] BP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08 [ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 000 [560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000 [ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000 [ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0 [ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 560.835433] DR3: 0000000000000000 DR6: 00000000ffe07f0 DR7: 0000000000000400 [ 560.843444] PKRU: 55555554 [ 560.846480] Seguimiento de llamadas: [ 560.849225]  [ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea [560.856488]? show_trace_log_lvl+0x1d6/0x2ea [560.861379]? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.867778] ? show_regs.part.0+0x23/0x29 [560.872293]? __die_body.cold+0x8/0xd [ 560.876502] ? die_addr+0x3e/0x60 [ 560.880238] ? exc_general_protection+0x1c5/0x410 [560.885532]? asm_exc_general_protection+0x27/0x30 [560.891025]? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.904520] Process_one_work+0x228/0x3d0 C\u00f3mo: En la recuperaci\u00f3n de RAS, el restablecimiento del modo 1 se emite desde RAS fatal manejo de errores y esperaba todos los nodos en una colmena que se van a restablecer. no es necesario emitir otro modo-1 durante este procedimiento.\"}],\"metrics\":{},\"references\":[{\"url\":\"https://git.kernel.org/stable/c/395ca1031acf89d8ecb26127c544a71688d96f35\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"},{\"url\":\"https://git.kernel.org/stable/c/601429cca96b4af3be44172c3b64e4228515dbe1\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"}]}}"
  }
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading...

Loading...