{"uuid": "5ffe9e99-bbac-4707-8c46-0b13024e4f9d", "vulnerability_lookup_origin": "1a89b78e-f703-45f3-bb86-59eb712668bd", "author": "9f56dd64-161d-43a6-b9c3-555944290a09", "vulnerability": "CVE-2020-0601", "type": "seen", "source": "https://gist.github.com/secdev02/ac6bc3af4d025d816247d10d62183f4e", "content": "# WireGuard ECC &amp; Encryption Deep Audit\n**Scope:** `crypto/zinc/curve25519/`, `crypto/zinc/poly1305/`, `crypto/zinc/chacha20poly1305.c`, `noise.c`  \n**Focus:** Curve parameter injection (CurveBall class), field arithmetic, key validation, AEAD correctness  \n**Date:** 2026-06-23\n\n---\n\n## Executive Summary\n\n| # | File | Finding | Severity |\n|---|------|---------|----------|\n| ECC-1 | curve25519.c | CurveBall class \u2014 NOT applicable (positive finding) | N/A |\n| ECC-2 | curve25519.c | Torsion/low-order input points \u2014 caught by output check | Informational |\n| ECC-3 | noise.c | Peer public key stored without upfront validation | Low |\n| ECC-4 | chacha20poly1305.c | In-place decryption before MAC verification (RFC 8439 non-conformance) | Low/Medium |\n| ECC-5 | chacha20poly1305.c | sg_inplace MAC tag pointer: ssize_t + size_t mixing | Low |\n| ECC-6 | curve25519-hacl64.c | fdifference adds 8p: correct but undocumented | Informational |\n| ECC-7 | noise.c | static_private dead variable with memzero_explicit | Informational |\n| ECC-8 | curve25519-hacl64.c | format_fcontract_trim single-pass reduction | Informational (correct) |\n| ECC-9 | poly1305-donna64.c | Poly1305 r-clamping verified correct | Informational (correct) |\n\n**No critical bugs found.** The cryptographic primitives are formally verified implementations. The most significant finding is the in-place decrypt-before-MAC pattern (ECC-4), which is architecturally non-standard but safely contained within WireGuard's queue model. The CurveBall attack class is architecturally impossible against this codebase.\n\n---\n\n## Finding ECC-1: CurveBall (CVE-2020-0601) Class \u2014 Not Applicable\n\n**Verdict:** Immune by design.\n\nCVE-2020-0601 exploited Windows CryptoAPI accepting attacker-specified generator points for named ECDSA curves, allowing signature forgery by anyone who could substitute their own G. The attack requires that curve parameters \u2014 specifically the base point \u2014 be configurable or network-supplied.\n\n**WireGuard's architecture eliminates this class entirely.**\n\nThe basepoint for key generation is a compile-time constant in `curve25519.c`:\n\n```c\nbool curve25519_generate_public(u8 pub[CURVE25519_KEY_SIZE],\n                                const u8 secret[CURVE25519_KEY_SIZE])\n{\n    static const u8 basepoint[CURVE25519_KEY_SIZE] __aligned(32) = { 9 };\n    ...\n}\n```\n\nThis is the RFC 7748 section 6.1 standard `u = 9` generator for Curve25519. It cannot be changed at runtime.\n\nThe curve arithmetic itself (`curve25519-hacl64.c`, `curve25519-fiat32.c`) has all parameters embedded as numeric literals in the field operations. The prime `p = 2^255 - 19` appears as:\n\n- `0x7ffffffffffedLLU` \u2014 limb 0 of p in 51-bit representation (verified: `p &amp; (2^51-1) = 2^51-19`)\n- `0x7ffffffffffffLLU` \u2014 limbs 1 through 4 of p (all equal to `2^51-1`)\n- The Montgomery ladder constant `scalar = 121665` \u2014 this is `(A-2)/4` where `A = 486662`, the Bernstein optimization for Curve25519 doubling\n\nNone of these are read from configuration, netlink attributes, or incoming packets. There is no surface for curve parameter injection.\n\n---\n\n## Finding ECC-2: Torsion Point Inputs (Small Subgroup Attack)\n\n**Severity:** Informational (correctly mitigated)\n\nCurve25519 has cofactor 8. Its torsion subgroup has 8 elements. An attacker could submit one of these low-order u-coordinates as their peer public key:\n\n```\nu = 0\nu = 1\nu = 325606250916557431795983626356110631294008115727848805560023387167927233504\nu = 39382357235489614581723060781553021112529911719440698176882885853963445705823\nu = p-1, p, ...\n```\n\nMultiplying any of these by a clamped scalar (a multiple of 8) always yields the identity point, whose u-coordinate is 0.\n\n**WireGuard's defence \u2014 the null-point return check in `curve25519.c`:**\n\n```c\nbool curve25519(u8 mypublic[CURVE25519_KEY_SIZE],\n                const u8 secret[CURVE25519_KEY_SIZE],\n                const u8 basepoint[CURVE25519_KEY_SIZE])\n{\n    if (!curve25519_arch(mypublic, secret, basepoint))\n        curve25519_generic(mypublic, secret, basepoint);\n    return crypto_memneq(mypublic, null_point, CURVE25519_KEY_SIZE);\n}\n```\n\n`curve25519()` returns false if the output is all-zero. `mix_dh()` in `noise.c` propagates this correctly:\n\n```c\nstatic bool __must_check mix_dh(...)\n{\n    if (unlikely(!curve25519(dh_calculation, private, public)))\n        return false;  // handshake aborted\n    ...\n}\n```\n\nThe check is on the output, not the input. This is correct per RFC 7748, which explicitly states that checking the output for the all-zero string is the correct mitigation. Checking the input would require expensive point validation that provides no additional security for X25519.\n\n**Clamping provides structural defence:** `curve25519_clamp_secret` clears the 3 low bits of the scalar (ensuring a multiple of 8). Since any torsion-group point has order dividing 8, `clamp(s) * torsion_point = 8k * torsion_point = identity`. The null-point check catches the result.\n\n---\n\n## Finding ECC-3: Peer Public Key Stored Without Immediate Validation\n\n**Severity:** Low\n\nIn `noise.c`, `wg_noise_handshake_init()`:\n\n```c\nvoid wg_noise_handshake_init(..., const u8 peer_public_key[NOISE_PUBLIC_KEY_LEN], ...)\n{\n    memset(handshake, 0, sizeof(*handshake));\n    memcpy(handshake-&gt;remote_static, peer_public_key, NOISE_PUBLIC_KEY_LEN);\n    ...\n    wg_noise_precompute_static_static(peer);  // validation is deferred here\n}\n```\n\nThe key is stored first, then validated indirectly by `wg_noise_precompute_static_static`:\n\n```c\nvoid wg_noise_precompute_static_static(struct wg_peer *peer)\n{\n    if (!peer-&gt;handshake.static_identity-&gt;has_identity ||\n        !curve25519(peer-&gt;handshake.precomputed_static_static,\n                    peer-&gt;handshake.static_identity-&gt;static_private,\n                    peer-&gt;handshake.remote_static))\n        memset(peer-&gt;handshake.precomputed_static_static, 0, NOISE_PUBLIC_KEY_LEN);\n}\n```\n\nIf the public key is a torsion point (DH output = 0), `curve25519()` returns false and `precomputed_static_static` is zeroed. Then during the handshake, `mix_precomputed_dh()` rejects it:\n\n```c\nstatic bool __must_check mix_precomputed_dh(...)\n{\n    static u8 zero_point[NOISE_PUBLIC_KEY_LEN];\n    if (unlikely(!crypto_memneq(precomputed, zero_point, NOISE_PUBLIC_KEY_LEN)))\n        return false;\n    ...\n}\n```\n\n**The chain is correct but indirect.** The raw bytes of any 32-byte value can be stored in `remote_static` \u2014 including keys that produce a zero DH output only with specific private keys. Rejection is deferred to `precompute_static_static` (peer creation) and `mix_precomputed_dh` (handshake time).\n\n**Timing concern:** `wg_noise_precompute_static_static` is also called after local key rotation. During the window between storing the new peer key and the precompute completing, `remote_static` holds the new (unvalidated) key while `precomputed_static_static` may still hold a stale value from the previous computation.\n\n**Recommended pattern:** Validate the DH output at the netlink layer before accepting a new peer key, and return an error to userspace if it produces zero.\n\n---\n\n## Finding ECC-4: In-Place Decryption Before MAC Verification\n\n**Severity:** Low/Medium \u2014 RFC 8439 non-conformant; contained by WireGuard's queue model\n\nRFC 8439 section 2.8 states: receivers MUST verify the Poly1305 tag before acting on any decrypted data. The rationale is that an attacker who can submit chosen ciphertexts and observe partial decryption results can, in some contexts, extract key material.\n\n**WireGuard's `chacha20poly1305_decrypt_sg_inplace()` does the opposite:**\n\n```c\nsg_miter_start(&amp;miter, src, sg_nents(src), SG_MITER_TO_SG | SG_MITER_ATOMIC);\nfor (sl = src_len; sl &gt; 0 &amp;&amp; sg_miter_next(&amp;miter); sl -= miter.length) {\n    u8 *addr = miter.addr;\n    size_t length = min_t(size_t, sl, miter.length);\n\n    poly1305_update(&amp;poly1305_state, addr, length, ...);   // 1. MAC over ciphertext\n\n    // 2. Decrypt IN PLACE \u2014 overwrites buffer before MAC is checked\n    chacha20(&amp;chacha20_state, addr, addr, l, simd_context);\n    ...\n}\n// 3. MAC only checked AFTER all decryption has already occurred\npoly1305_final(&amp;poly1305_state, b.computed_mac, simd_context);\nret = !crypto_memneq(b.computed_mac, ...);\n```\n\nPlaintext is written to the skb's backing memory on every loop iteration, before `poly1305_final` confirms the tag is valid.\n\n**Why WireGuard's model contains this:**\n\nThe skb follows this path after decryption:\n\n1. `decrypt_packet()` decrypts in place, returns true/false\n2. `wg_packet_decrypt_worker()` sets `PACKET_STATE_CRYPTED` (success) or `PACKET_STATE_DEAD` (failure) via `atomic_set_release()`\n3. `wg_packet_rx_poll()` reads the state with `atomic_read_acquire()` \u2014 a full acquire barrier\n4. Only `PACKET_STATE_CRYPTED` packets reach `wg_packet_consume_data_done()` and the networking stack\n\nThe acquire/release pair provides happens-before ordering: any thread observing `PACKET_STATE_CRYPTED` is guaranteed to see the completed, authenticated decryption. No unauthenticated plaintext escapes to userspace.\n\n**Residual risk:** The plaintext bytes live in `skb-&gt;data` before authentication completes. If a kernel panic, debugging facility (e.g., `kcore`, KGDB), or future code path reads `skb-&gt;data` in that window, it would observe unauthenticated plaintext. Not a current exploit path, but a robustness concern.\n\n**Why the design is this way:** The encrypt variant must write ciphertext in a single pass for performance. The decrypt variant mirrors the structure. A separate scratch buffer would eliminate the issue but requires an extra allocation per packet \u2014 unacceptable for kernel networking at line rate.\n\n---\n\n## Finding ECC-5: sg_inplace MAC Tag Pointer \u2014 ssize_t + size_t Mixing\n\n**Severity:** Low (no underflow in practice; type mixing is unsafe-looking)\n\nIn the fast path of `chacha20poly1305_decrypt_sg_inplace()`:\n\n```c\n// sl is ssize_t (signed), miter.length is size_t (unsigned)\nif (likely(sl &lt;= -POLY1305_MAC_SIZE)) {\n    poly1305_final(&amp;poly1305_state, b.computed_mac, simd_context);\n    ret = !crypto_memneq(b.computed_mac,\n                         miter.addr + miter.length + sl,   // mixed arithmetic\n                         POLY1305_MAC_SIZE);\n}\n```\n\nWhen `sl` is negative and `miter.length + sl` is evaluated with implicit unsigned conversion, if `|sl| &gt; miter.length` the result wraps to a huge positive number. That would send the pointer far outside the buffer.\n\n**Why it does not wrap in practice:** The condition `sl &lt;= -POLY1305_MAC_SIZE` means the loop consumed all ciphertext AND the current segment extended at least 16 bytes past the ciphertext end into the auth tag. Therefore `miter.length &gt;= |sl| &gt;= 16`, so `miter.length + sl &gt;= 0`. No underflow.\n\n**The guarantee is implicit** in the loop invariant, not in the type system.\n\n**Safer form:**\n\n```c\nu8 *tag_ptr = miter.addr + miter.length - (size_t)(-(sl));\nret = !crypto_memneq(b.computed_mac, tag_ptr, POLY1305_MAC_SIZE);\n```\n\n---\n\n## Finding ECC-6: fdifference Adds 8p, Not 2p (Undocumented)\n\n**Severity:** Informational (correct, but undocumented)\n\nIn `curve25519-hacl64.c`, `fdifference()` computes `b - a` by first adding a large multiple of the prime to `b`:\n\n```c\ntmp[0] = b0 + 0x3fffffffffff68LLU;\ntmp[1] = b1 + 0x3ffffffffffff8LLU;\n...\na[i] = tmp[i] - a[i];  // = (b + correction) - a\n```\n\nThe correction constant is exactly **8p** (verified: `8 * (2^255 - 19)` reconstructed from the 51-bit limbs matches). The reason for 8p rather than the intuitively expected 2p is that intermediate 51-bit limbs in the HACL* representation can carry slightly beyond their nominal bounds after `fsum` and `fmul`, requiring a larger correction to guarantee non-negative results.\n\nNo bug here, but the comment is absent. An auditor who attempts to verify this by computing 2p or 4p will fail to match, waste significant time, or incorrectly flag it. **A comment is needed:**\n\n```c\n/* Add 8p before subtracting a to ensure a non-negative result.\n * 8p in 51-bit limb form: [0x3fffffffffff68, 0x3ffffffffffff8, ...] */\n```\n\n---\n\n## Finding ECC-7: Dead Variable `static_private` in consume_response\n\n**Severity:** Informational (misleading to auditors)\n\nIn `wg_noise_handshake_consume_response()`:\n\n```c\nu8 static_private[NOISE_PUBLIC_KEY_LEN];   // declared, never written\n// ... (never assigned) ...\nmemzero_explicit(static_private, NOISE_PUBLIC_KEY_LEN);   // zeroes garbage stack bytes\n```\n\n`static_private` is never populated. The `memzero_explicit` zeroes uninitialized stack memory. This is a refactor artifact \u2014 an earlier version of the responder path used a local copy of the static private key for a `se` DH step that was later replaced by the precomputed value path.\n\nSecurity impact: zero. Auditor impact: a reviewer seeing `memzero_explicit` will assume the variable held a live private key and look for where it was populated. That time is wasted. The variable and its cleanup call should be removed.\n\n---\n\n## Finding ECC-8: format_fcontract_trim Single-Pass Reduction (Verified Correct)\n\n**Severity:** Informational\n\n`format_fcontract_trim` performs a single conditional subtraction of `p` to canonicalize the output:\n\n```c\nu64 mask0 = u64_gte_mask(a0, 0x7ffffffffffedLLU);   // a[0] &gt;= p[0]?\nu64 mask1 = u64_eq_mask(a1, 0x7ffffffffffffLLU);    // a[1] == p[1]?\n...\nu64 mask = mask0 &amp; mask1 &amp; mask2 &amp; mask3 &amp; mask4;   // value &gt;= p?\n// subtract p once, conditionally\n```\n\nOne reduction is sufficient: the two carry passes (`format_fcontract_first_carry_full` and `format_fcontract_second_carry_full`) fold the top bit back via `modulo_carry_top` (multiply by 19), leaving the value in `[0, 2^255)`. Since `p = 2^255 - 19`, this range is `[0, p + 18]`. One subtraction of `p` reduces to `[0, p-1]`. Correct.\n\nThe constant-time comparisons `u64_gte_mask` and `u64_eq_mask` were both verified. `u64_gte_mask(a, b)` uses the standard carry-bit extraction trick: `(a ^ q) &gt;&gt; 63 - 1` where `q` encodes whether borrow occurred. No branches. Correct.\n\n---\n\n## Finding ECC-9: Poly1305 r-Clamping Verified Correct\n\n**Severity:** Informational\n\n`poly1305-donna64.c` clamps the Poly1305 accumulation key `r` at initialization per RFC 8439 section 2.5:\n\n```c\nst-&gt;r[0] = t0 &amp; 0xffc0fffffffULL;\nst-&gt;r[1] = ((t0 &gt;&gt; 44) | (t1 &lt;&lt; 20)) &amp; 0xfffffc0ffffULL;\nst-&gt;r[2] = ((t1 &gt;&gt; 24)) &amp; 0x00ffffffc0fULL;\n```\n\nThe key is represented as three 44-bit limbs. The masks clear the bits that RFC 8439 requires to be zero. Verified against the RFC table. Correct.\n\n---\n\n## Constant-Time Analysis\n\n| Operation | Mechanism | Status |\n|-----------|-----------|--------|\n| Montgomery ladder bit swap | XOR: `x = swap &amp; (ai ^ bi)` | Constant-time |\n| Field element comparison (`u64_eq_mask`, `u64_gte_mask`) | Arithmetic, no branches | Constant-time |\n| Poly1305 tag comparison | `crypto_memneq` (kernel) | Constant-time |\n| Field multiplication (hacl64) | `__uint128_t` wide multiplies, no data-dependent branches | Constant-time |\n| Canonical reduction (format_fcontract_trim) | Masked conditional subtract | Constant-time |\n| Scalar clamping | Bitwise AND/OR | Constant-time |\n| Ladder iteration count | Fixed 256 iterations (32 bytes x 8 bits, 4 steps/bit) | Constant-time |\n\n**Not audited here:** `curve25519-x86_64.c` and `curve25519-arm.S` \u2014 these require separate assembly-level review for data-dependent branches or cache-timing patterns.\n\n---\n\n## Key Call Chain\n\n```\nnetlink: set_peer()\n  \u2514\u2500 wg_peer_create() \u2192 wg_noise_handshake_init()\n       \u2514\u2500 wg_noise_precompute_static_static()\n            \u2514\u2500 curve25519(ss, local_priv, remote_pub)    [validates output only]\n\nnoise: consume_initiation()\n  \u251c\u2500 message_ephemeral(e, src-&gt;unencrypted_ephemeral)    [no explicit point check on e]\n  \u251c\u2500 mix_dh(ck, key, local_priv, e)                     [curve25519(); zero output check]\n  \u251c\u2500 message_decrypt(s, src-&gt;encrypted_static)           [AEAD authenticated]\n  \u251c\u2500 mix_precomputed_dh(ck, key, precomputed_ss)         [zero-check on precomputed]\n  \u2514\u2500 message_decrypt(t, src-&gt;encrypted_timestamp)        [AEAD authenticated]\n\nreceive: wg_packet_decrypt_worker()\n  \u2514\u2500 decrypt_packet()\n       \u2514\u2500 chacha20poly1305_decrypt_sg_inplace()          [decrypt THEN verify \u2014 ECC-4]\n```\n\n---\n\n## Suggested Next Audit Areas\n\n- `curve25519-x86_64.c` / `curve25519-arm.S` \u2014 assembly paths for timing side-channels and branch-on-secret-bit patterns\n- `blake2s.c` \u2014 HKDF parameter size arithmetic in `kdf()`, particularly the `first_len/second_len/third_len` bounds checked by `WARN_ON`\n- `cookie.c` \u2014 `xchacha20poly1305_encrypt` nonce derivation; confirm the XChaCha HChaCha20 subkey extraction is correct and the 192-bit nonce provides adequate birthday-bound security\n- `peerlookup.c` \u2014 `wg_pubkey_hashtable_lookup()` timing: lookup time proportional to peer count could leak peer existence via timing oracle", "creation_timestamp": "2026-06-23T20:43:49.000000Z"}