FKIE_CVE-2025-40303

Vulnerability from fkie_nvd - Published: 2025-12-08 01:16 - Updated: 2025-12-08 18:26
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved: btrfs: ensure no dirty metadata is written back for an fs with errors [BUG] During development of a minor feature (make sure all btrfs_bio::end_io() is called in task context), I noticed a crash in generic/388, where metadata writes triggered new works after btrfs_stop_all_workers(). It turns out that it can even happen without any code modification, just using RAID5 for metadata and the same workload from generic/388 is going to trigger the use-after-free. [CAUSE] If btrfs hits an error, the fs is marked as error, no new transaction is allowed thus metadata is in a frozen state. But there are some metadata modifications before that error, and they are still in the btree inode page cache. Since there will be no real transaction commit, all those dirty folios are just kept as is in the page cache, and they can not be invalidated by invalidate_inode_pages2() call inside close_ctree(), because they are dirty. And finally after btrfs_stop_all_workers(), we call iput() on btree inode, which triggers writeback of those dirty metadata. And if the fs is using RAID56 metadata, this will trigger RMW and queue new works into rmw_workers, which is already stopped, causing warning from queue_work() and use-after-free. [FIX] Add a special handling for write_one_eb(), that if the fs is already in an error state, immediately mark the bbio as failure, instead of really submitting them. Then during close_ctree(), iput() will just discard all those dirty tree blocks without really writing them back, thus no more new jobs for already stopped-and-freed workqueues. The extra discard in write_one_eb() also acts as an extra safenet. E.g. the transaction abort is triggered by some extent/free space tree corruptions, and since extent/free space tree is already corrupted some tree blocks may be allocated where they shouldn't be (overwriting existing tree blocks). In that case writing them back will further corrupting the fs.
Impacted products
Vendor Product Version

{
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: ensure no dirty metadata is written back for an fs with errors\n\n[BUG]\nDuring development of a minor feature (make sure all btrfs_bio::end_io()\nis called in task context), I noticed a crash in generic/388, where\nmetadata writes triggered new works after btrfs_stop_all_workers().\n\nIt turns out that it can even happen without any code modification, just\nusing RAID5 for metadata and the same workload from generic/388 is going\nto trigger the use-after-free.\n\n[CAUSE]\nIf btrfs hits an error, the fs is marked as error, no new\ntransaction is allowed thus metadata is in a frozen state.\n\nBut there are some metadata modifications before that error, and they are\nstill in the btree inode page cache.\n\nSince there will be no real transaction commit, all those dirty folios\nare just kept as is in the page cache, and they can not be invalidated\nby invalidate_inode_pages2() call inside close_ctree(), because they are\ndirty.\n\nAnd finally after btrfs_stop_all_workers(), we call iput() on btree\ninode, which triggers writeback of those dirty metadata.\n\nAnd if the fs is using RAID56 metadata, this will trigger RMW and queue\nnew works into rmw_workers, which is already stopped, causing warning\nfrom queue_work() and use-after-free.\n\n[FIX]\nAdd a special handling for write_one_eb(), that if the fs is already in\nan error state, immediately mark the bbio as failure, instead of really\nsubmitting them.\n\nThen during close_ctree(), iput() will just discard all those dirty\ntree blocks without really writing them back, thus no more new jobs for\nalready stopped-and-freed workqueues.\n\nThe extra discard in write_one_eb() also acts as an extra safenet.\nE.g. the transaction abort is triggered by some extent/free space\ntree corruptions, and since extent/free space tree is already corrupted\nsome tree blocks may be allocated where they shouldn\u0027t be (overwriting\nexisting tree blocks). In that case writing them back will further\ncorrupting the fs."
    }
  ],
  "id": "CVE-2025-40303",
  "lastModified": "2025-12-08T18:26:49.133",
  "metrics": {},
  "published": "2025-12-08T01:16:02.440",
  "references": [
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/066ee13f05fbd82ada01883e51f0695172f98dff"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/2618849f31e7cf51fadd4a5242458501a6d5b315"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/54a5b5a15588e3b0b294df31474d08a2678d4291"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/e2b3859067bf012d53c49b3f885fef40624a2c83"
    }
  ],
  "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
  "vulnStatus": "Awaiting Analysis"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.


Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…