ghsa-935g-fc8j-74rf
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
RDMA/mad: Improve handling of timed out WRs of mad agent
Current timeout handler of mad agent acquires/releases mad_agent_priv lock for every timed out WRs. This causes heavy locking contention when higher no. of WRs are to be handled inside timeout handler.
This leads to softlockup with below trace in some use cases where rdma-cm path is used to establish connection between peer nodes
Trace:
BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u128:3:19767]
CPU: 4 PID: 19767 Comm: kworker/u128:3 Kdump: loaded Tainted: G OE
------- --- 5.14.0-427.13.1.el9_4.x86_64 #1
Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.4.8 11/26/2019
Workqueue: ib_mad1 timeout_sends [ib_core]
RIP: 0010:__do_softirq+0x78/0x2ac
RSP: 0018:ffffb253449e4f98 EFLAGS: 00000246
RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 000000000000001f
RDX: 000000000000001d RSI: 000000003d1879ab RDI: fff363b66fd3a86b
RBP: ffffb253604cbcd8 R08: 0000009065635f3b R09: 0000000000000000
R10: 0000000000000040 R11: ffffb253449e4ff8 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040
FS: 0000000000000000(0000) GS:ffff8caa1fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd9ec9db900 CR3: 0000000891934006 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
Simplified timeout handler by creating local list of timed out WRs and invoke send handler post creating the list. The new method acquires/ releases lock once to fetch the list and hence helps to reduce locking contetiong when processing higher no. of WRs
{ "affected": [], "aliases": [ "CVE-2024-50095" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-11-05T17:15:06Z", "severity": "MODERATE" }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nRDMA/mad: Improve handling of timed out WRs of mad agent\n\nCurrent timeout handler of mad agent acquires/releases mad_agent_priv\nlock for every timed out WRs. This causes heavy locking contention\nwhen higher no. of WRs are to be handled inside timeout handler.\n\nThis leads to softlockup with below trace in some use cases where\nrdma-cm path is used to establish connection between peer nodes\n\nTrace:\n-----\n BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u128:3:19767]\n CPU: 4 PID: 19767 Comm: kworker/u128:3 Kdump: loaded Tainted: G OE\n ------- --- 5.14.0-427.13.1.el9_4.x86_64 #1\n Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.4.8 11/26/2019\n Workqueue: ib_mad1 timeout_sends [ib_core]\n RIP: 0010:__do_softirq+0x78/0x2ac\n RSP: 0018:ffffb253449e4f98 EFLAGS: 00000246\n RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 000000000000001f\n RDX: 000000000000001d RSI: 000000003d1879ab RDI: fff363b66fd3a86b\n RBP: ffffb253604cbcd8 R08: 0000009065635f3b R09: 0000000000000000\n R10: 0000000000000040 R11: ffffb253449e4ff8 R12: 0000000000000000\n R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040\n FS: 0000000000000000(0000) GS:ffff8caa1fc80000(0000) knlGS:0000000000000000\n CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033\n CR2: 00007fd9ec9db900 CR3: 0000000891934006 CR4: 00000000007706e0\n DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000\n DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400\n PKRU: 55555554\n Call Trace:\n \u003cIRQ\u003e\n ? show_trace_log_lvl+0x1c4/0x2df\n ? show_trace_log_lvl+0x1c4/0x2df\n ? __irq_exit_rcu+0xa1/0xc0\n ? watchdog_timer_fn+0x1b2/0x210\n ? __pfx_watchdog_timer_fn+0x10/0x10\n ? __hrtimer_run_queues+0x127/0x2c0\n ? hrtimer_interrupt+0xfc/0x210\n ? __sysvec_apic_timer_interrupt+0x5c/0x110\n ? sysvec_apic_timer_interrupt+0x37/0x90\n ? asm_sysvec_apic_timer_interrupt+0x16/0x20\n ? __do_softirq+0x78/0x2ac\n ? __do_softirq+0x60/0x2ac\n __irq_exit_rcu+0xa1/0xc0\n sysvec_call_function_single+0x72/0x90\n \u003c/IRQ\u003e\n \u003cTASK\u003e\n asm_sysvec_call_function_single+0x16/0x20\n RIP: 0010:_raw_spin_unlock_irq+0x14/0x30\n RSP: 0018:ffffb253604cbd88 EFLAGS: 00000247\n RAX: 000000000001960d RBX: 0000000000000002 RCX: ffff8cad2a064800\n RDX: 000000008020001b RSI: 0000000000000001 RDI: ffff8cad5d39f66c\n RBP: ffff8cad5d39f600 R08: 0000000000000001 R09: 0000000000000000\n R10: ffff8caa443e0c00 R11: ffffb253604cbcd8 R12: ffff8cacb8682538\n R13: 0000000000000005 R14: ffffb253604cbd90 R15: ffff8cad5d39f66c\n cm_process_send_error+0x122/0x1d0 [ib_cm]\n timeout_sends+0x1dd/0x270 [ib_core]\n process_one_work+0x1e2/0x3b0\n ? __pfx_worker_thread+0x10/0x10\n worker_thread+0x50/0x3a0\n ? __pfx_worker_thread+0x10/0x10\n kthread+0xdd/0x100\n ? __pfx_kthread+0x10/0x10\n ret_from_fork+0x29/0x50\n \u003c/TASK\u003e\n\nSimplified timeout handler by creating local list of timed out WRs\nand invoke send handler post creating the list. The new method acquires/\nreleases lock once to fetch the list and hence helps to reduce locking\ncontetiong when processing higher no. of WRs", "id": "GHSA-935g-fc8j-74rf", "modified": "2024-11-12T21:30:50Z", "published": "2024-11-05T18:32:11Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-50095" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/2a777679b8ccd09a9a65ea0716ef10365179caac" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/3e799fa463508abe7a738ce5d0f62a8dfd05262a" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/7022a517bf1ca37ef5a474365bcc5eafd345a13a" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/713adaf0ecfc49405f6e5d9e409d984f628de818" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/a195a42dd25ca4f12489687065d00be64939409f" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/e80eadb3604a92d2d086e956b8b2692b699d4d0a" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "type": "CVSS_V3" } ] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.