Vulnerability-Lookup

GHSA-GF3V-FWQG-4VH7

Vulnerability from github – Published: 2026-02-11 15:13 – Updated: 2026-02-12 14:19

Summary

@langchain/community affected by SSRF Bypass in RecursiveUrlLoader via insufficient URL origin validation

Details

Description

The RecursiveUrlLoader class in @langchain/community is a web crawler that recursively follows links from a starting URL. Its preventOutside option (enabled by default) is intended to restrict crawling to the same site as the base URL.

The implementation used String.startsWith() to compare URLs, which does not perform semantic URL validation. An attacker who controls content on a crawled page could include links to domains that share a string prefix with the target (e.g., https://example.com.attacker.com passes a startsWith check against https://example.com), causing the crawler to follow links to attacker-controlled or internal infrastructure.

Additionally, the crawler performed no validation against private or reserved IP addresses. A crawled page could include links targeting cloud metadata services (169.254.169.254), localhost, or RFC 1918 addresses, and the crawler would fetch them without restriction.

Impact

An attacker who can influence the content of a page being crawled (e.g., by placing a link on a public-facing page, forum, or user-generated content) could cause the crawler to:

Fetch cloud instance metadata (AWS, GCP, Azure), potentially exposing IAM credentials and session tokens
Access internal services on private networks (10.x, 172.16.x, 192.168.x)
Connect to localhost services
Exfiltrate response data via attacker-controlled redirect chains

This is exploitable in any environment where RecursiveUrlLoader runs on infrastructure with access to cloud metadata or internal services — which includes most cloud-hosted deployments.

Resolution

Two changes were made:

Origin comparison replaced. The startsWith check was replaced with a strict origin comparison using the URL API (new URL(link).origin === new URL(baseUrl).origin). This correctly validates scheme, hostname, and port as a unit, preventing subdomain-based bypasses.
SSRF validation added to all fetch operations. A new URL validation module (@langchain/core/utils/ssrf) was introduced and applied before every outbound fetch in the crawler. This blocks requests to:
Cloud metadata endpoints: 169.254.169.254, 169.254.170.2, 100.100.100.200, metadata.google.internal, and related hostnames
Private IP ranges: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 169.254.0.0/16
IPv6 equivalents: ::1, fc00::/7, fe80::/10
Non-HTTP/HTTPS schemes (file:, ftp:, javascript:, etc.)

Cloud metadata endpoints are unconditionally blocked and cannot be overridden.

Workarounds

Users who cannot upgrade immediately should avoid using RecursiveUrlLoader on untrusted or user-influenced content, or should run the crawler in a network environment without access to cloud metadata or internal services.

Severity ?

4.1 (Medium)


                  
                    CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:N/A:N

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "database_specific": {
        "last_known_affected_version_range": "\u003c= 1.1.13"
      },
      "package": {
        "ecosystem": "npm",
        "name": "@langchain/community"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "1.1.14"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-26019"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-918"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-02-11T15:13:20Z",
    "nvd_published_at": "2026-02-11T22:15:51Z",
    "severity": "MODERATE"
  },
  "details": "## Description\n\nThe `RecursiveUrlLoader` class in `@langchain/community` is a web crawler that recursively follows links from a starting URL. Its `preventOutside` option (enabled by default) is intended to restrict crawling to the same site as the base URL.\n\nThe implementation used `String.startsWith()` to compare URLs, which does not perform semantic URL validation. An attacker who controls content on a crawled page could include links to domains that share a string prefix with the target (e.g., `https://example.com.attacker.com` passes a `startsWith` check against `https://example.com`), causing the crawler to follow links to attacker-controlled or internal infrastructure.\n\nAdditionally, the crawler performed no validation against private or reserved IP addresses. A crawled page could include links targeting cloud metadata services (`169.254.169.254`), localhost, or RFC 1918 addresses, and the crawler would fetch them without restriction.\n\n## Impact\n\nAn attacker who can influence the content of a page being crawled (e.g., by placing a link on a public-facing page, forum, or user-generated content) could cause the crawler to:\n\n- Fetch cloud instance metadata (AWS, GCP, Azure), potentially exposing IAM credentials and session tokens\n- Access internal services on private networks (`10.x`, `172.16.x`, `192.168.x`)\n- Connect to localhost services\n- Exfiltrate response data via attacker-controlled redirect chains\n\nThis is exploitable in any environment where `RecursiveUrlLoader` runs on infrastructure with access to cloud metadata or internal services \u2014 which includes most cloud-hosted deployments.\n\n## Resolution\n\nTwo changes were made:\n\n1. **Origin comparison replaced.** The `startsWith` check was replaced with a strict origin comparison using the URL API (`new URL(link).origin === new URL(baseUrl).origin`). This correctly validates scheme, hostname, and port as a unit, preventing subdomain-based bypasses.\n\n2. **SSRF validation added to all fetch operations.** A new URL validation module (`@langchain/core/utils/ssrf`) was introduced and applied before every outbound fetch in the crawler. This blocks requests to:\n   - **Cloud metadata endpoints:** `169.254.169.254`, `169.254.170.2`, `100.100.100.200`, `metadata.google.internal`, and related hostnames\n   - **Private IP ranges:** `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `127.0.0.0/8`, `169.254.0.0/16`\n   - **IPv6 equivalents:** `::1`, `fc00::/7`, `fe80::/10`\n   - **Non-HTTP/HTTPS schemes** (`file:`, `ftp:`, `javascript:`, etc.)\n\nCloud metadata endpoints are unconditionally blocked and cannot be overridden.\n\n## Workarounds\n\nUsers who cannot upgrade immediately should avoid using `RecursiveUrlLoader` on untrusted or user-influenced content, or should run the crawler in a network environment without access to cloud metadata or internal services.",
  "id": "GHSA-gf3v-fwqg-4vh7",
  "modified": "2026-02-12T14:19:06Z",
  "published": "2026-02-11T15:13:20Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/langchain-ai/langchainjs/security/advisories/GHSA-gf3v-fwqg-4vh7"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-26019"
    },
    {
      "type": "WEB",
      "url": "https://github.com/langchain-ai/langchainjs/pull/9990"
    },
    {
      "type": "WEB",
      "url": "https://github.com/langchain-ai/langchainjs/commit/d5e3db0d01ab321ec70a875805b2f74aefdadf9d"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/langchain-ai/langchainjs"
    },
    {
      "type": "WEB",
      "url": "https://github.com/langchain-ai/langchainjs/releases/tag/%40langchain%2Fcommunity%401.1.14"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:N/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "@langchain/community affected by SSRF Bypass in RecursiveUrlLoader via insufficient URL origin validation"
}

CVE-2026-26019 (GCVE-0-2026-26019)

Vulnerability from cvelistv5 – Published: 2026-02-11 21:11 – Updated: 2026-02-12 21:14

Title

@langchain/community affected by SSRF Bypass in RecursiveUrlLoader via insufficient URL origin validation

Summary

LangChain is a framework for building LLM-powered applications. Prior to 1.1.14, the RecursiveUrlLoader class in @langchain/community is a web crawler that recursively follows links from a starting URL. Its preventOutside option (enabled by default) is intended to restrict crawling to the same site as the base URL. The implementation used String.startsWith() to compare URLs, which does not perform semantic URL validation. An attacker who controls content on a crawled page could include links to domains that share a string prefix with the target, causing the crawler to follow links to attacker-controlled or internal infrastructure. Additionally, the crawler performed no validation against private or reserved IP addresses. A crawled page could include links targeting cloud metadata services, localhost, or RFC 1918 addresses, and the crawler would fetch them without restriction. This vulnerability is fixed in 1.1.14.

Severity ?

4.1 (Medium)


                        
                          CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:L/I:N/A:N

CWE

CWE-918 - Server-Side Request Forgery (SSRF)

Assigner

GitHub_M

References

URL

Tags

	https://github.com/langchain-ai/langchainjs/secur…	x_refsource_CONFIRM
	https://github.com/langchain-ai/langchainjs/pull/9990	x_refsource_MISC
	https://github.com/langchain-ai/langchainjs/commi…	x_refsource_MISC
	https://github.com/langchain-ai/langchainjs/relea…	x_refsource_MISC

Impacted products

	Vendor	Product	Version
	langchain-ai	langchainjs	Affected: < 1.1.14

Show details on NVD website