CVE-2026-23157 (GCVE-0-2026-23157)
Vulnerability from cvelistv5 – Published: 2026-02-14 16:01 – Updated: 2026-02-14 16:01
VLAI?
Title
btrfs: do not strictly require dirty metadata threshold for metadata writepages
Summary
In the Linux kernel, the following vulnerability has been resolved:
btrfs: do not strictly require dirty metadata threshold for metadata writepages
[BUG]
There is an internal report that over 1000 processes are
waiting at the io_schedule_timeout() of balance_dirty_pages(), causing
a system hang and trigger a kernel coredump.
The kernel is v6.4 kernel based, but the root problem still applies to
any upstream kernel before v6.18.
[CAUSE]
From Jan Kara for his wisdom on the dirty page balance behavior first.
This cgroup dirty limit was what was actually playing the role here
because the cgroup had only a small amount of memory and so the dirty
limit for it was something like 16MB.
Dirty throttling is responsible for enforcing that nobody can dirty
(significantly) more dirty memory than there's dirty limit. Thus when
a task is dirtying pages it periodically enters into balance_dirty_pages()
and we let it sleep there to slow down the dirtying.
When the system is over dirty limit already (either globally or within
a cgroup of the running task), we will not let the task exit from
balance_dirty_pages() until the number of dirty pages drops below the
limit.
So in this particular case, as I already mentioned, there was a cgroup
with relatively small amount of memory and as a result with dirty limit
set at 16MB. A task from that cgroup has dirtied about 28MB worth of
pages in btrfs btree inode and these were practically the only dirty
pages in that cgroup.
So that means the only way to reduce the dirty pages of that cgroup is
to writeback the dirty pages of btrfs btree inode, and only after that
those processes can exit balance_dirty_pages().
Now back to the btrfs part, btree_writepages() is responsible for
writing back dirty btree inode pages.
The problem here is, there is a btrfs internal threshold that if the
btree inode's dirty bytes are below the 32M threshold, it will not
do any writeback.
This behavior is to batch as much metadata as possible so we won't write
back those tree blocks and then later re-COW them again for another
modification.
This internal 32MiB is higher than the existing dirty page size (28MiB),
meaning no writeback will happen, causing a deadlock between btrfs and
cgroup:
- Btrfs doesn't want to write back btree inode until more dirty pages
- Cgroup/MM doesn't want more dirty pages for btrfs btree inode
Thus any process touching that btree inode is put into sleep until
the number of dirty pages is reduced.
Thanks Jan Kara a lot for the analysis of the root cause.
[ENHANCEMENT]
Since kernel commit b55102826d7d ("btrfs: set AS_KERNEL_FILE on the
btree_inode"), btrfs btree inode pages will only be charged to the root
cgroup which should have a much larger limit than btrfs' 32MiB
threshold.
So it should not affect newer kernels.
But for all current LTS kernels, they are all affected by this problem,
and backporting the whole AS_KERNEL_FILE may not be a good idea.
Even for newer kernels I still think it's a good idea to get
rid of the internal threshold at btree_writepages(), since for most cases
cgroup/MM has a better view of full system memory usage than btrfs' fixed
threshold.
For internal callers using btrfs_btree_balance_dirty() since that
function is already doing internal threshold check, we don't need to
bother them.
But for external callers of btree_writepages(), just respect their
requests and write back whatever they want, ignoring the internal
btrfs threshold to avoid such deadlock on btree inode dirty page
balancing.
Severity ?
No CVSS data available.
Assigner
References
Impacted products
{
"containers": {
"cna": {
"affected": [
{
"defaultStatus": "unaffected",
"product": "Linux",
"programFiles": [
"fs/btrfs/disk-io.c",
"fs/btrfs/extent_io.c",
"fs/btrfs/extent_io.h"
],
"repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git",
"vendor": "Linux",
"versions": [
{
"lessThan": "629666d20c7dcd740e193ec0631fdff035b1f7d6",
"status": "affected",
"version": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
"versionType": "git"
},
{
"lessThan": "4e159150a9a56d66d247f4b5510bed46fe58aa1c",
"status": "affected",
"version": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
"versionType": "git"
}
]
},
{
"defaultStatus": "affected",
"product": "Linux",
"programFiles": [
"fs/btrfs/disk-io.c",
"fs/btrfs/extent_io.c",
"fs/btrfs/extent_io.h"
],
"repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git",
"vendor": "Linux",
"versions": [
{
"lessThanOrEqual": "6.18.*",
"status": "unaffected",
"version": "6.18.9",
"versionType": "semver"
},
{
"lessThanOrEqual": "*",
"status": "unaffected",
"version": "6.19",
"versionType": "original_commit_for_fix"
}
]
}
],
"cpeApplicability": [
{
"nodes": [
{
"cpeMatch": [
{
"criteria": "cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*",
"versionEndExcluding": "6.18.9",
"vulnerable": true
},
{
"criteria": "cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*",
"versionEndExcluding": "6.19",
"vulnerable": true
}
],
"negate": false,
"operator": "OR"
}
]
}
],
"descriptions": [
{
"lang": "en",
"value": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: do not strictly require dirty metadata threshold for metadata writepages\n\n[BUG]\nThere is an internal report that over 1000 processes are\nwaiting at the io_schedule_timeout() of balance_dirty_pages(), causing\na system hang and trigger a kernel coredump.\n\nThe kernel is v6.4 kernel based, but the root problem still applies to\nany upstream kernel before v6.18.\n\n[CAUSE]\nFrom Jan Kara for his wisdom on the dirty page balance behavior first.\n\n This cgroup dirty limit was what was actually playing the role here\n because the cgroup had only a small amount of memory and so the dirty\n limit for it was something like 16MB.\n\n Dirty throttling is responsible for enforcing that nobody can dirty\n (significantly) more dirty memory than there\u0027s dirty limit. Thus when\n a task is dirtying pages it periodically enters into balance_dirty_pages()\n and we let it sleep there to slow down the dirtying.\n\n When the system is over dirty limit already (either globally or within\n a cgroup of the running task), we will not let the task exit from\n balance_dirty_pages() until the number of dirty pages drops below the\n limit.\n\n So in this particular case, as I already mentioned, there was a cgroup\n with relatively small amount of memory and as a result with dirty limit\n set at 16MB. A task from that cgroup has dirtied about 28MB worth of\n pages in btrfs btree inode and these were practically the only dirty\n pages in that cgroup.\n\nSo that means the only way to reduce the dirty pages of that cgroup is\nto writeback the dirty pages of btrfs btree inode, and only after that\nthose processes can exit balance_dirty_pages().\n\nNow back to the btrfs part, btree_writepages() is responsible for\nwriting back dirty btree inode pages.\n\nThe problem here is, there is a btrfs internal threshold that if the\nbtree inode\u0027s dirty bytes are below the 32M threshold, it will not\ndo any writeback.\n\nThis behavior is to batch as much metadata as possible so we won\u0027t write\nback those tree blocks and then later re-COW them again for another\nmodification.\n\nThis internal 32MiB is higher than the existing dirty page size (28MiB),\nmeaning no writeback will happen, causing a deadlock between btrfs and\ncgroup:\n\n- Btrfs doesn\u0027t want to write back btree inode until more dirty pages\n\n- Cgroup/MM doesn\u0027t want more dirty pages for btrfs btree inode\n Thus any process touching that btree inode is put into sleep until\n the number of dirty pages is reduced.\n\nThanks Jan Kara a lot for the analysis of the root cause.\n\n[ENHANCEMENT]\nSince kernel commit b55102826d7d (\"btrfs: set AS_KERNEL_FILE on the\nbtree_inode\"), btrfs btree inode pages will only be charged to the root\ncgroup which should have a much larger limit than btrfs\u0027 32MiB\nthreshold.\nSo it should not affect newer kernels.\n\nBut for all current LTS kernels, they are all affected by this problem,\nand backporting the whole AS_KERNEL_FILE may not be a good idea.\n\nEven for newer kernels I still think it\u0027s a good idea to get\nrid of the internal threshold at btree_writepages(), since for most cases\ncgroup/MM has a better view of full system memory usage than btrfs\u0027 fixed\nthreshold.\n\nFor internal callers using btrfs_btree_balance_dirty() since that\nfunction is already doing internal threshold check, we don\u0027t need to\nbother them.\n\nBut for external callers of btree_writepages(), just respect their\nrequests and write back whatever they want, ignoring the internal\nbtrfs threshold to avoid such deadlock on btree inode dirty page\nbalancing."
}
],
"providerMetadata": {
"dateUpdated": "2026-02-14T16:01:23.874Z",
"orgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
"shortName": "Linux"
},
"references": [
{
"url": "https://git.kernel.org/stable/c/629666d20c7dcd740e193ec0631fdff035b1f7d6"
},
{
"url": "https://git.kernel.org/stable/c/4e159150a9a56d66d247f4b5510bed46fe58aa1c"
}
],
"title": "btrfs: do not strictly require dirty metadata threshold for metadata writepages",
"x_generator": {
"engine": "bippy-1.2.0"
}
}
},
"cveMetadata": {
"assignerOrgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
"assignerShortName": "Linux",
"cveId": "CVE-2026-23157",
"datePublished": "2026-02-14T16:01:23.874Z",
"dateReserved": "2026-01-13T15:37:45.978Z",
"dateUpdated": "2026-02-14T16:01:23.874Z",
"state": "PUBLISHED"
},
"dataType": "CVE_RECORD",
"dataVersion": "5.2",
"vulnerability-lookup:meta": {
"nvd": "{\"cve\":{\"id\":\"CVE-2026-23157\",\"sourceIdentifier\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\",\"published\":\"2026-02-14T16:15:55.863\",\"lastModified\":\"2026-02-14T16:15:55.863\",\"vulnStatus\":\"Received\",\"cveTags\":[],\"descriptions\":[{\"lang\":\"en\",\"value\":\"In the Linux kernel, the following vulnerability has been resolved:\\n\\nbtrfs: do not strictly require dirty metadata threshold for metadata writepages\\n\\n[BUG]\\nThere is an internal report that over 1000 processes are\\nwaiting at the io_schedule_timeout() of balance_dirty_pages(), causing\\na system hang and trigger a kernel coredump.\\n\\nThe kernel is v6.4 kernel based, but the root problem still applies to\\nany upstream kernel before v6.18.\\n\\n[CAUSE]\\nFrom Jan Kara for his wisdom on the dirty page balance behavior first.\\n\\n This cgroup dirty limit was what was actually playing the role here\\n because the cgroup had only a small amount of memory and so the dirty\\n limit for it was something like 16MB.\\n\\n Dirty throttling is responsible for enforcing that nobody can dirty\\n (significantly) more dirty memory than there\u0027s dirty limit. Thus when\\n a task is dirtying pages it periodically enters into balance_dirty_pages()\\n and we let it sleep there to slow down the dirtying.\\n\\n When the system is over dirty limit already (either globally or within\\n a cgroup of the running task), we will not let the task exit from\\n balance_dirty_pages() until the number of dirty pages drops below the\\n limit.\\n\\n So in this particular case, as I already mentioned, there was a cgroup\\n with relatively small amount of memory and as a result with dirty limit\\n set at 16MB. A task from that cgroup has dirtied about 28MB worth of\\n pages in btrfs btree inode and these were practically the only dirty\\n pages in that cgroup.\\n\\nSo that means the only way to reduce the dirty pages of that cgroup is\\nto writeback the dirty pages of btrfs btree inode, and only after that\\nthose processes can exit balance_dirty_pages().\\n\\nNow back to the btrfs part, btree_writepages() is responsible for\\nwriting back dirty btree inode pages.\\n\\nThe problem here is, there is a btrfs internal threshold that if the\\nbtree inode\u0027s dirty bytes are below the 32M threshold, it will not\\ndo any writeback.\\n\\nThis behavior is to batch as much metadata as possible so we won\u0027t write\\nback those tree blocks and then later re-COW them again for another\\nmodification.\\n\\nThis internal 32MiB is higher than the existing dirty page size (28MiB),\\nmeaning no writeback will happen, causing a deadlock between btrfs and\\ncgroup:\\n\\n- Btrfs doesn\u0027t want to write back btree inode until more dirty pages\\n\\n- Cgroup/MM doesn\u0027t want more dirty pages for btrfs btree inode\\n Thus any process touching that btree inode is put into sleep until\\n the number of dirty pages is reduced.\\n\\nThanks Jan Kara a lot for the analysis of the root cause.\\n\\n[ENHANCEMENT]\\nSince kernel commit b55102826d7d (\\\"btrfs: set AS_KERNEL_FILE on the\\nbtree_inode\\\"), btrfs btree inode pages will only be charged to the root\\ncgroup which should have a much larger limit than btrfs\u0027 32MiB\\nthreshold.\\nSo it should not affect newer kernels.\\n\\nBut for all current LTS kernels, they are all affected by this problem,\\nand backporting the whole AS_KERNEL_FILE may not be a good idea.\\n\\nEven for newer kernels I still think it\u0027s a good idea to get\\nrid of the internal threshold at btree_writepages(), since for most cases\\ncgroup/MM has a better view of full system memory usage than btrfs\u0027 fixed\\nthreshold.\\n\\nFor internal callers using btrfs_btree_balance_dirty() since that\\nfunction is already doing internal threshold check, we don\u0027t need to\\nbother them.\\n\\nBut for external callers of btree_writepages(), just respect their\\nrequests and write back whatever they want, ignoring the internal\\nbtrfs threshold to avoid such deadlock on btree inode dirty page\\nbalancing.\"}],\"metrics\":{},\"references\":[{\"url\":\"https://git.kernel.org/stable/c/4e159150a9a56d66d247f4b5510bed46fe58aa1c\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"},{\"url\":\"https://git.kernel.org/stable/c/629666d20c7dcd740e193ec0631fdff035b1f7d6\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"}]}}"
}
}
Loading…
Loading…
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.
Loading…
Loading…