ghsa-wp7p-m23w-r9gj
Vulnerability from github
Published
2024-06-19 15:30
Modified
2024-08-29 03:30
Severity
Details

In the Linux kernel, the following vulnerability has been resolved:

net/mlx5: Reload only IB representors upon lag disable/enable

On lag disable, the bond IB device along with all of its representors are destroyed, and then the slaves' representors get reloaded.

In case the slave IB representor load fails, the eswitch error flow unloads all representors, including ethernet representors, where the netdevs get detached and removed from lag bond. Such flow is inaccurate as the lag driver is not responsible for loading/unloading ethernet representors. Furthermore, the flow described above begins by holding lag lock to prevent bond changes during disable flow. However, when reaching the ethernet representors detachment from lag, the lag lock is required again, triggering the following deadlock:

Call trace: __switch_to+0xf4/0x148 __schedule+0x2c8/0x7d0 schedule+0x50/0xe0 schedule_preempt_disabled+0x18/0x28 __mutex_lock.isra.13+0x2b8/0x570 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x4c/0x68 mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core] mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core] mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core] mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core] mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core] mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core] mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core] mlx5_disable_lag+0x130/0x138 [mlx5_core] mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core] devlink_nl_cmd_eswitch_set_doit+0xdc/0x180 genl_family_rcv_msg_doit.isra.17+0xe8/0x138 genl_rcv_msg+0xe4/0x220 netlink_rcv_skb+0x44/0x108 genl_rcv+0x40/0x58 netlink_unicast+0x198/0x268 netlink_sendmsg+0x1d4/0x418 sock_sendmsg+0x54/0x60 __sys_sendto+0xf4/0x120 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x8c/0x120 do_el0_svc+0x30/0xa0 el0_svc+0x20/0x30 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180

Thus, upon lag enable/disable, load and unload only the IB representors of the slaves preventing the deadlock mentioned above.

While at it, refactor the mlx5_esw_offloads_rep_load() function to have a static helper method for its internal logic, in symmetry with the representor unload design.

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2024-38557"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-667"
    ],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-06-19T14:15:15Z",
    "severity": "MODERATE"
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nnet/mlx5: Reload only IB representors upon lag disable/enable\n\nOn lag disable, the bond IB device along with all of its\nrepresentors are destroyed, and then the slaves\u0027 representors get reloaded.\n\nIn case the slave IB representor load fails, the eswitch error flow\nunloads all representors, including ethernet representors, where the\nnetdevs get detached and removed from lag bond. Such flow is inaccurate\nas the lag driver is not responsible for loading/unloading ethernet\nrepresentors. Furthermore, the flow described above begins by holding\nlag lock to prevent bond changes during disable flow. However, when\nreaching the ethernet representors detachment from lag, the lag lock is\nrequired again, triggering the following deadlock:\n\nCall trace:\n__switch_to+0xf4/0x148\n__schedule+0x2c8/0x7d0\nschedule+0x50/0xe0\nschedule_preempt_disabled+0x18/0x28\n__mutex_lock.isra.13+0x2b8/0x570\n__mutex_lock_slowpath+0x1c/0x28\nmutex_lock+0x4c/0x68\nmlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core]\nmlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core]\nmlx5e_detach_netdev+0x6c/0xb0 [mlx5_core]\nmlx5e_netdev_change_profile+0x44/0x138 [mlx5_core]\nmlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core]\nmlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core]\nmlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core]\nmlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core]\nmlx5_disable_lag+0x130/0x138 [mlx5_core]\nmlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev-\u003elock\nmlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core]\ndevlink_nl_cmd_eswitch_set_doit+0xdc/0x180\ngenl_family_rcv_msg_doit.isra.17+0xe8/0x138\ngenl_rcv_msg+0xe4/0x220\nnetlink_rcv_skb+0x44/0x108\ngenl_rcv+0x40/0x58\nnetlink_unicast+0x198/0x268\nnetlink_sendmsg+0x1d4/0x418\nsock_sendmsg+0x54/0x60\n__sys_sendto+0xf4/0x120\n__arm64_sys_sendto+0x30/0x40\nel0_svc_common+0x8c/0x120\ndo_el0_svc+0x30/0xa0\nel0_svc+0x20/0x30\nel0_sync_handler+0x90/0xb8\nel0_sync+0x160/0x180\n\nThus, upon lag enable/disable, load and unload only the IB representors\nof the slaves preventing the deadlock mentioned above.\n\nWhile at it, refactor the mlx5_esw_offloads_rep_load() function to have\na static helper method for its internal logic, in symmetry with the\nrepresentor unload design.",
  "id": "GHSA-wp7p-m23w-r9gj",
  "modified": "2024-08-29T03:30:48Z",
  "published": "2024-06-19T15:30:53Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-38557"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/0f06228d4a2dcc1fca5b3ddb0eefa09c05b102c4"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/0f320f28f54b1b269a755be2e3fb3695e0b80b07"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/e93fc8d959e56092e2eca1e5511c2d2f0ad6807a"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/f03c714a0fdd1f93101a929d0e727c28a66383fc"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading...

Loading...