GHSA-896C-JR9V-XC8G

Vulnerability from github – Published: 2025-10-21 12:31 – Updated: 2025-10-21 12:31
VLAI?
Details

In the Linux kernel, the following vulnerability has been resolved:

blk-iolatency: Fix inflight count imbalances and IO hangs on offline

iolatency needs to track the number of inflight IOs per cgroup. As this tracking can be expensive, it is disabled when no cgroup has iolatency configured for the device. To ensure that the inflight counters stay balanced, iolatency_set_limit() freezes the request_queue while manipulating the enabled counter, which ensures that no IO is in flight and thus all counters are zero.

Unfortunately, iolatency_set_limit() isn't the only place where the enabled counter is manipulated. iolatency_pd_offline() can also dec the counter and trigger disabling. As this disabling happens without freezing the q, this can easily happen while some IOs are in flight and thus leak the counts.

This can be easily demonstrated by turning on iolatency on an one empty cgroup while IOs are in flight in other cgroups and then removing the cgroup. Note that iolatency shouldn't have been enabled elsewhere in the system to ensure that removing the cgroup disables iolatency for the whole device.

The following keeps flipping on and off iolatency on sda:

echo +io > /sys/fs/cgroup/cgroup.subtree_control while true; do mkdir -p /sys/fs/cgroup/test echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency sleep 1 rmdir /sys/fs/cgroup/test sleep 1 done

and there's concurrent fio generating direct rand reads:

fio --name test --filename=/dev/sda --direct=1 --rw=randread \ --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k

while monitoring with the following drgn script:

while True: for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()): for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list): blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node') pd = blkg.pd[prog['blkcg_policy_iolatency'].plid] if pd.value_() == 0: continue iolat = container_of(pd, 'struct iolatency_grp', 'pd') inflight = iolat.rq_wait.inflight.counter.value_() if inflight: print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} ' f'{cgroup_path(css.cgroup).decode("utf-8")}') time.sleep(1)

The monitoring output looks like the following:

inflight=1 sda /user.slice inflight=1 sda /user.slice ... inflight=14 sda /user.slice inflight=13 sda /user.slice inflight=17 sda /user.slice inflight=15 sda /user.slice inflight=18 sda /user.slice inflight=17 sda /user.slice inflight=20 sda /user.slice inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19 inflight=19 sda /user.slice inflight=19 sda /user.slice

If a cgroup with stuck inflight ends up getting throttled, the throttled IOs will never get issued as there's no completion event to wake it up leading to an indefinite hang.

This patch fixes the bug by unifying enable handling into a work item which is automatically kicked off from iolatency_set_min_lat_nsec() which is called from both iolatency_set_limit() and iolatency_pd_offline() paths. Punting to a work item is necessary as iolatency_pd_offline() is called under spinlocks while freezing a request_queue requires a sleepable context.

This also simplifies the code reducing LOC sans the comments and avoids the unnecessary freezes which were happening whenever a cgroup's latency target is newly set or cleared.

Show details on source website

{
  "affected": [],
  "aliases": [
    "CVE-2022-49394"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-02-26T07:01:15Z",
    "severity": "MODERATE"
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nblk-iolatency: Fix inflight count imbalances and IO hangs on offline\n\niolatency needs to track the number of inflight IOs per cgroup. As this\ntracking can be expensive, it is disabled when no cgroup has iolatency\nconfigured for the device. To ensure that the inflight counters stay\nbalanced, iolatency_set_limit() freezes the request_queue while manipulating\nthe enabled counter, which ensures that no IO is in flight and thus all\ncounters are zero.\n\nUnfortunately, iolatency_set_limit() isn\u0027t the only place where the enabled\ncounter is manipulated. iolatency_pd_offline() can also dec the counter and\ntrigger disabling. As this disabling happens without freezing the q, this\ncan easily happen while some IOs are in flight and thus leak the counts.\n\nThis can be easily demonstrated by turning on iolatency on an one empty\ncgroup while IOs are in flight in other cgroups and then removing the\ncgroup. Note that iolatency shouldn\u0027t have been enabled elsewhere in the\nsystem to ensure that removing the cgroup disables iolatency for the whole\ndevice.\n\nThe following keeps flipping on and off iolatency on sda:\n\n  echo +io \u003e /sys/fs/cgroup/cgroup.subtree_control\n  while true; do\n      mkdir -p /sys/fs/cgroup/test\n      echo \u00278:0 target=100000\u0027 \u003e /sys/fs/cgroup/test/io.latency\n      sleep 1\n      rmdir /sys/fs/cgroup/test\n      sleep 1\n  done\n\nand there\u0027s concurrent fio generating direct rand reads:\n\n  fio --name test --filename=/dev/sda --direct=1 --rw=randread \\\n      --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k\n\nwhile monitoring with the following drgn script:\n\n  while True:\n    for css in css_for_each_descendant_pre(prog[\u0027blkcg_root\u0027].css.address_of_()):\n        for pos in hlist_for_each(container_of(css, \u0027struct blkcg\u0027, \u0027css\u0027).blkg_list):\n            blkg = container_of(pos, \u0027struct blkcg_gq\u0027, \u0027blkcg_node\u0027)\n            pd = blkg.pd[prog[\u0027blkcg_policy_iolatency\u0027].plid]\n            if pd.value_() == 0:\n                continue\n            iolat = container_of(pd, \u0027struct iolatency_grp\u0027, \u0027pd\u0027)\n            inflight = iolat.rq_wait.inflight.counter.value_()\n            if inflight:\n                print(f\u0027inflight={inflight} {disk_name(blkg.q.disk).decode(\"utf-8\")} \u0027\n                      f\u0027{cgroup_path(css.cgroup).decode(\"utf-8\")}\u0027)\n    time.sleep(1)\n\nThe monitoring output looks like the following:\n\n  inflight=1 sda /user.slice\n  inflight=1 sda /user.slice\n  ...\n  inflight=14 sda /user.slice\n  inflight=13 sda /user.slice\n  inflight=17 sda /user.slice\n  inflight=15 sda /user.slice\n  inflight=18 sda /user.slice\n  inflight=17 sda /user.slice\n  inflight=20 sda /user.slice\n  inflight=19 sda /user.slice \u003c- fio stopped, inflight stuck at 19\n  inflight=19 sda /user.slice\n  inflight=19 sda /user.slice\n\nIf a cgroup with stuck inflight ends up getting throttled, the throttled IOs\nwill never get issued as there\u0027s no completion event to wake it up leading\nto an indefinite hang.\n\nThis patch fixes the bug by unifying enable handling into a work item which\nis automatically kicked off from iolatency_set_min_lat_nsec() which is\ncalled from both iolatency_set_limit() and iolatency_pd_offline() paths.\nPunting to a work item is necessary as iolatency_pd_offline() is called\nunder spinlocks while freezing a request_queue requires a sleepable context.\n\nThis also simplifies the code reducing LOC sans the comments and avoids the\nunnecessary freezes which were happening whenever a cgroup\u0027s latency target\nis newly set or cleared.",
  "id": "GHSA-896c-jr9v-xc8g",
  "modified": "2025-10-21T12:31:25Z",
  "published": "2025-10-21T12:31:25Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2022-49394"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/515d077ee3085ae343b6bea7fd031f9906645f38"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/5b0ff3ebbef791341695b718f8d2870869cf1d01"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/77692c02e1517c54f2fd0535f41aa4286ac9f140"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/8a177a36da6c54c98b8685d4f914cb3637d53c0d"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/968f7a239c590454ffba79c126fbe0e963a0ba78"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/a30acbb5dfb7bcc813ad6a18ca31011ac44e5547"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/d19fa8f252000d141f9199ca32959c50314e1f05"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.


Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…