<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href="/static/style.xsl" type="text/xsl"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
  <channel>
    <title>Most recent sightings.</title>
    <link>https://vulnerability.circl.lu</link>
    <description>Contains only the most 10 recent sightings.</description>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <generator>python-feedgen</generator>
    <language>en</language>
    <lastBuildDate>Wed, 24 Jun 2026 09:34:09 +0000</lastBuildDate>
    <item>
      <title>e677ffcf-bc9f-42b2-95c9-9cba5b49dbdc</title>
      <link>https://vulnerability.circl.lu/sighting/e677ffcf-bc9f-42b2-95c9-9cba5b49dbdc/export</link>
      <description>{"uuid": "e677ffcf-bc9f-42b2-95c9-9cba5b49dbdc", "vulnerability_lookup_origin": "1a89b78e-f703-45f3-bb86-59eb712668bd", "author": "9f56dd64-161d-43a6-b9c3-555944290a09", "vulnerability": "CVE-2026-24887", "type": "seen", "source": "https://gist.github.com/NikosRig/b4330ceb780fe22bf3c14f38d7d90795", "content": "# When is an AI agent's approval prompt a security boundary?\n\nI reported three approval-bypass findings to an open-source AI agent. Between\nthe day I submitted and the day they replied, the project rewrote its security\npolicy \u2014 in a way that reclassified my findings out of scope \u2014 and then closed\nthem citing the new text. This is a writeup of what happened and the genuine\nquestion underneath it, because I don't think the answer is obvious and I think\nthe industry hasn't settled it.\n\nI'll start by conceding the other side, because it's strong.\n\n## The vendor is not wrong about the hard part\n\nThe project is Hermes Agent (Nous Research). Like most agents with shell access,\nit screens commands against a denylist and prompts the operator before running\nanything that looks destructive. Their current position is that this gate is an\n*in-process heuristic, not a security boundary* \u2014 that shell is Turing-complete,\na denylist over shell strings is structurally incomplete, and the real boundary\nfor adversarial input is OS-level isolation (run it in a container).\n\nThat is correct. You cannot regex your way to a complete boundary over shell,\nand \"run untrusted workloads in a sandbox\" is the right posture. I'm not\ndisputing any of that, and any framing of this story that ignores it is unfair.\n\n## The three findings (mechanism only \u2014 two are still live)\n\n1. **Smart-approval prompt injection.** In the optional \"smart\" mode, a second\n   LLM judges flagged commands. The untrusted command was interpolated into the\n   reviewer's prompt with no separation between data and instructions, and the\n   verdict was parsed with a loose substring match. Injected text could talk the\n   reviewer into approving.\n\n2. **Startup-hook code execution.** Any `.py` file in the agent's hooks\n   directory is executed at gateway startup \u2014 no registration, no hash, no\n   signature. A prompt-injected model can write that file via a normal tool call\n   that triggers no approval, yielding code execution on the next restart.\n\n3. **Approval-gate parsing bypass.** The detector matches regex against the raw\n   command string, not parsed shell tokens. Equivalent rewrites \u2014 quoted command\n   names, variable indirection, alternate shell binaries, octal `chmod` prefixes,\n   versioned interpreter names \u2014 run the same dangerous action and bypass the\n   prompt entirely.\n\nI retested all three against the current release in a clean Docker build before\nwriting this. Finding #1 was meaningfully hardened in June (the live bypass rate\ndropped from 6/8 to 1/8). **Findings #2 and #3 still reproduce on the current\nversion.** I'm deliberately not publishing weaponized exploits for the two that\nare still live.\n\nThese matter for one specific, common deployment: the default local backend,\nexposed to untrusted input (a messaging gateway, web content, MCP output),\nwithout a sandbox. In that configuration the prompt is the thing the operator is\ncounting on, and it can be skipped.\n\n## The part I think is worth discussing\n\nTwo things, both verifiable.\n\n**The timeline.** The version of SECURITY.md live the day I reported called the\napproval system \"a core security boundary\" and explicitly placed in scope\n\"prompt injection ... that results in a concrete bypass of the approval system.\"\nSix days later the policy was rewritten (\"rewrite policy around OS-level\nisolation as the boundary\"); the approval gate became a non-boundary heuristic\nand the clause that put my findings in scope was removed. My reports were then\nclosed as out of scope, citing the new sections \u2014 without acknowledging that the\npolicy had changed since submission. The commits are public:\n`401aadb5b`, `0d1cbc2dd`.\n\nI don't claim malice. The original policy was two weeks old and may have\nover-claimed; the rewrite reads like a genuine clarification. But the procedure\n\u2014 change the scope while reports are open, close under the new text, don't flag\nthe change \u2014 is the part that sits wrong with me, independent of whether the new\nthreat model is right.\n\n**The industry inconsistency.** Finding #3 is the same class as CVE-2026-24887\nin Claude Code: \"an error in command parsing\" that lets untrusted input \"bypass\nthe confirmation prompt.\" Anthropic rated it **8.8 HIGH** and shipped a fix in\n2.0.72. Anthropic also recommends sandboxing Claude Code \u2014 the same posture\nHermes invokes \u2014 and still treated a confirmation-prompt bypass as a real,\nhigh-severity vulnerability. \"The sandbox is the real boundary\" and \"a prompt\nbypass is a vulnerability\" are evidently not mutually exclusive; a direct peer\nholds both. And Hermes themselves shipped a `fix(security)` commit for finding\n#1 \u2014 the very class they'd closed as out-of-scope.\n\n## The actual question\n\nTwo serious projects looked at the same class of bug and reached opposite\nconclusions about whether it's a vulnerability at all \u2014 and the line between\nthem is drawn in policy, not in code. As we hand agents real shell access, \"is\nthe human-confirmation step a security control or a convenience?\" stops being\nphilosophical: it decides whether bypasses get fixed, get CVEs, or get closed.\nI think it's a control whose bypass matters in the default deployment. Reasonable\npeople disagree. I'd like to hear how others draw the line.\n\nEverything above is verifiable from public git history and public vulnerability\ndatabases. I'm happy to answer questions.\n", "creation_timestamp": "2026-06-23T21:00:12.000000Z"}</description>
      <content:encoded>{"uuid": "e677ffcf-bc9f-42b2-95c9-9cba5b49dbdc", "vulnerability_lookup_origin": "1a89b78e-f703-45f3-bb86-59eb712668bd", "author": "9f56dd64-161d-43a6-b9c3-555944290a09", "vulnerability": "CVE-2026-24887", "type": "seen", "source": "https://gist.github.com/NikosRig/b4330ceb780fe22bf3c14f38d7d90795", "content": "# When is an AI agent's approval prompt a security boundary?\n\nI reported three approval-bypass findings to an open-source AI agent. Between\nthe day I submitted and the day they replied, the project rewrote its security\npolicy \u2014 in a way that reclassified my findings out of scope \u2014 and then closed\nthem citing the new text. This is a writeup of what happened and the genuine\nquestion underneath it, because I don't think the answer is obvious and I think\nthe industry hasn't settled it.\n\nI'll start by conceding the other side, because it's strong.\n\n## The vendor is not wrong about the hard part\n\nThe project is Hermes Agent (Nous Research). Like most agents with shell access,\nit screens commands against a denylist and prompts the operator before running\nanything that looks destructive. Their current position is that this gate is an\n*in-process heuristic, not a security boundary* \u2014 that shell is Turing-complete,\na denylist over shell strings is structurally incomplete, and the real boundary\nfor adversarial input is OS-level isolation (run it in a container).\n\nThat is correct. You cannot regex your way to a complete boundary over shell,\nand \"run untrusted workloads in a sandbox\" is the right posture. I'm not\ndisputing any of that, and any framing of this story that ignores it is unfair.\n\n## The three findings (mechanism only \u2014 two are still live)\n\n1. **Smart-approval prompt injection.** In the optional \"smart\" mode, a second\n   LLM judges flagged commands. The untrusted command was interpolated into the\n   reviewer's prompt with no separation between data and instructions, and the\n   verdict was parsed with a loose substring match. Injected text could talk the\n   reviewer into approving.\n\n2. **Startup-hook code execution.** Any `.py` file in the agent's hooks\n   directory is executed at gateway startup \u2014 no registration, no hash, no\n   signature. A prompt-injected model can write that file via a normal tool call\n   that triggers no approval, yielding code execution on the next restart.\n\n3. **Approval-gate parsing bypass.** The detector matches regex against the raw\n   command string, not parsed shell tokens. Equivalent rewrites \u2014 quoted command\n   names, variable indirection, alternate shell binaries, octal `chmod` prefixes,\n   versioned interpreter names \u2014 run the same dangerous action and bypass the\n   prompt entirely.\n\nI retested all three against the current release in a clean Docker build before\nwriting this. Finding #1 was meaningfully hardened in June (the live bypass rate\ndropped from 6/8 to 1/8). **Findings #2 and #3 still reproduce on the current\nversion.** I'm deliberately not publishing weaponized exploits for the two that\nare still live.\n\nThese matter for one specific, common deployment: the default local backend,\nexposed to untrusted input (a messaging gateway, web content, MCP output),\nwithout a sandbox. In that configuration the prompt is the thing the operator is\ncounting on, and it can be skipped.\n\n## The part I think is worth discussing\n\nTwo things, both verifiable.\n\n**The timeline.** The version of SECURITY.md live the day I reported called the\napproval system \"a core security boundary\" and explicitly placed in scope\n\"prompt injection ... that results in a concrete bypass of the approval system.\"\nSix days later the policy was rewritten (\"rewrite policy around OS-level\nisolation as the boundary\"); the approval gate became a non-boundary heuristic\nand the clause that put my findings in scope was removed. My reports were then\nclosed as out of scope, citing the new sections \u2014 without acknowledging that the\npolicy had changed since submission. The commits are public:\n`401aadb5b`, `0d1cbc2dd`.\n\nI don't claim malice. The original policy was two weeks old and may have\nover-claimed; the rewrite reads like a genuine clarification. But the procedure\n\u2014 change the scope while reports are open, close under the new text, don't flag\nthe change \u2014 is the part that sits wrong with me, independent of whether the new\nthreat model is right.\n\n**The industry inconsistency.** Finding #3 is the same class as CVE-2026-24887\nin Claude Code: \"an error in command parsing\" that lets untrusted input \"bypass\nthe confirmation prompt.\" Anthropic rated it **8.8 HIGH** and shipped a fix in\n2.0.72. Anthropic also recommends sandboxing Claude Code \u2014 the same posture\nHermes invokes \u2014 and still treated a confirmation-prompt bypass as a real,\nhigh-severity vulnerability. \"The sandbox is the real boundary\" and \"a prompt\nbypass is a vulnerability\" are evidently not mutually exclusive; a direct peer\nholds both. And Hermes themselves shipped a `fix(security)` commit for finding\n#1 \u2014 the very class they'd closed as out-of-scope.\n\n## The actual question\n\nTwo serious projects looked at the same class of bug and reached opposite\nconclusions about whether it's a vulnerability at all \u2014 and the line between\nthem is drawn in policy, not in code. As we hand agents real shell access, \"is\nthe human-confirmation step a security control or a convenience?\" stops being\nphilosophical: it decides whether bypasses get fixed, get CVEs, or get closed.\nI think it's a control whose bypass matters in the default deployment. Reasonable\npeople disagree. I'd like to hear how others draw the line.\n\nEverything above is verifiable from public git history and public vulnerability\ndatabases. I'm happy to answer questions.\n", "creation_timestamp": "2026-06-23T21:00:12.000000Z"}</content:encoded>
      <guid isPermaLink="false">https://vulnerability.circl.lu/sighting/e677ffcf-bc9f-42b2-95c9-9cba5b49dbdc/export</guid>
      <pubDate>Tue, 23 Jun 2026 21:00:12 +0000</pubDate>
    </item>
  </channel>
</rss>
