ghsa-j3g9-72cw-7wcp
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
net/smc: fix kernel panic caused by race of smc_sock
A crash occurs when smc_cdc_tx_handler() tries to access smc_sock but smc_release() has already freed it.
[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
[ 4570.696048] #PF: supervisor write access in kernel mode
[ 4570.696728] #PF: error_code(0x0002) - not-present page
[ 4570.697401] PGD 0 P4D 0
[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
<...>
[ 4570.711446] Call Trace:
[ 4570.711746]
Though smc_cdc_tx_handler() checked the existence of smc connection, smc_release() may have already dismissed and released the smc socket before smc_cdc_tx_handler() further visits it.
smc_cdc_tx_handler() |smc_release() if (!conn) | | |smc_cdc_tx_dismiss_slots() | smc_cdc_tx_dismisser() | |sock_put(&smc->sk) <- last sock_put, | smc_sock freed bh_lock_sock(&smc->sk) (panic) |
To make sure we won't receive any CDC messages after we free the smc_sock, add a refcount on the smc_connection for inflight CDC message(posted to the QP but haven't received related CQE), and don't release the smc_connection until all the inflight CDC messages haven been done, for both success or failed ones.
Using refcount on CDC messages brings another problem: when the link is going to be destroyed, smcr_link_clear() will reset the QP, which then remove all the pending CQEs related to the QP in the CQ. To make sure all the CQEs will always come back so the refcount on the smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced by smc_ib_modify_qp_error(). And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we need to wait for all pending WQEs done, or we may encounter use-after- free when handling CQEs.
For IB device removal routine, we need to wait for all the QPs on that device been destroyed before we can destroy CQs on the device, or the refcount on smc_connection won't reach 0 and smc_sock cannot be released.
{ "affected": [], "aliases": [ "CVE-2021-46925" ], "database_specific": { "cwe_ids": [ "CWE-362" ], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-02-27T10:15:07Z", "severity": "MODERATE" }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nnet/smc: fix kernel panic caused by race of smc_sock\n\nA crash occurs when smc_cdc_tx_handler() tries to access smc_sock\nbut smc_release() has already freed it.\n\n[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88\n[ 4570.696048] #PF: supervisor write access in kernel mode\n[ 4570.696728] #PF: error_code(0x0002) - not-present page\n[ 4570.697401] PGD 0 P4D 0\n[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI\n[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111\n[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0\n[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30\n\u003c...\u003e\n[ 4570.711446] Call Trace:\n[ 4570.711746] \u003cIRQ\u003e\n[ 4570.711992] smc_cdc_tx_handler+0x41/0xc0\n[ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560\n[ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10\n[ 4570.713489] tasklet_action_common.isra.17+0x66/0x140\n[ 4570.714083] __do_softirq+0x123/0x2f4\n[ 4570.714521] irq_exit_rcu+0xc4/0xf0\n[ 4570.714934] common_interrupt+0xba/0xe0\n\nThough smc_cdc_tx_handler() checked the existence of smc connection,\nsmc_release() may have already dismissed and released the smc socket\nbefore smc_cdc_tx_handler() further visits it.\n\nsmc_cdc_tx_handler() |smc_release()\nif (!conn) |\n |\n |smc_cdc_tx_dismiss_slots()\n | smc_cdc_tx_dismisser()\n |\n |sock_put(\u0026smc-\u003esk) \u003c- last sock_put,\n | smc_sock freed\nbh_lock_sock(\u0026smc-\u003esk) (panic) |\n\nTo make sure we won\u0027t receive any CDC messages after we free the\nsmc_sock, add a refcount on the smc_connection for inflight CDC\nmessage(posted to the QP but haven\u0027t received related CQE), and\ndon\u0027t release the smc_connection until all the inflight CDC messages\nhaven been done, for both success or failed ones.\n\nUsing refcount on CDC messages brings another problem: when the link\nis going to be destroyed, smcr_link_clear() will reset the QP, which\nthen remove all the pending CQEs related to the QP in the CQ. To make\nsure all the CQEs will always come back so the refcount on the\nsmc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced\nby smc_ib_modify_qp_error().\nAnd remove the timeout in smc_wr_tx_wait_no_pending_sends() since we\nneed to wait for all pending WQEs done, or we may encounter use-after-\nfree when handling CQEs.\n\nFor IB device removal routine, we need to wait for all the QPs on that\ndevice been destroyed before we can destroy CQs on the device, or\nthe refcount on smc_connection won\u0027t reach 0 and smc_sock cannot be\nreleased.", "id": "GHSA-j3g9-72cw-7wcp", "modified": "2024-04-10T15:30:32Z", "published": "2024-02-27T12:31:09Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2021-46925" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/349d43127dac00c15231e8ffbcaabd70f7b0e544" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/b85f751d71ae8e2a15e9bda98852ea9af35282eb" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/e8a5988a85c719ce7205cb00dcf0716dcf611332" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H", "type": "CVSS_V3" } ] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.