Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>

<overview>
<p>
The Go module <code>golang.org/x/net/idna</code> implements UTS-46 IDNA
processing. On the <code>Lookup</code> and <code>Display</code> profiles
(and any profile constructed via <code>idna.New(idna.MapForLookup(), ...)</code>),
both <code>(*Profile).ToASCII</code> and <code>(*Profile).ToUnicode</code>
apply an NFKC-based character map that folds <strong>100 distinct
non-ASCII Unicode digit codepoints</strong> to their ASCII equivalents.
The 100 codepoints partition into the following Unicode-block ranges:
</p>
<ul>
<li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li>
<li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li>
<li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li>
<li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li>
<li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li>
<li>Mathematical Alphanumeric Symbols digits, spanning bold,
double-struck, sans-serif, sans-serif-bold, and monospace styles
(U+1D7CE..U+1D7FF): 50 codepoints</li>
<li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li>
</ul>
<p>
Devanagari digits (U+0966..U+096F) are <strong>not</strong> in scope:
empirical testing against <code>golang.org/x/net/idna v0.53.0</code>
confirms they do not fold to ASCII via UTS-46. The
<code>Registration</code> profile is structurally covered by the rule
but disallows every fold codepoint at the rune-validation stage, so a
caller that respects the returned <code>error</code> never sees a
smuggled literal from that profile in practice.
</p>
<p>
The library contains no IP-literal detection. A caller that applies UTS-46
mapping to an attacker-controlled host string and consumes the result in a
network sink without rechecking against IP-literal parsers receives a
valid ASCII IPv4 literal back as the "domain name" output. Any downstream
allowlist check, SSRF guard, NoProxy match, or TLS-SNI router that does
not re-check the post-IDNA result is bypassed. The anti-pattern also
applies to callers that do a pre-IDNA <code>net.ParseIP</code> check and
think it is sufficient: the smuggled host is not ASCII, so the pre-IDNA
check rejects it as non-IP, and the post-IDNA value (now a numeric
literal) reaches the sink unguarded.
</p>
<p>
IPv6 is out of scope: <code>:</code> is a UTS-46 disallowed character;
bare-IPv6 inputs are rejected by IDNA rune-validation before any
digit-fold mapping runs.
</p>
<p>
Sinks where the smuggled literal becomes exploitable include
<code>net.JoinHostPort</code>, <code>net.Dial</code>,
<code>(*http.Request).URL.Host</code>, <code>(*tls.Config).ServerName</code>,
<code>(*http.Cookie).Domain</code>, and any HTTP client request URL
constructed from the mapped value.
</p>
</overview>

<recommendation>
<p>
Either:
</p>
<ol>
<li>
Use a strict IDNA profile option that returns an error if the mapped
output parses as an IP literal, if your IDNA library exposes one.
</li>
<li>
Apply the explicit safe pattern: after the IDNA mapping call, strip
trailing dots from the result and parse it. Reject if
<code>net.ParseIP</code> returns a non-<code>nil</code> address, or if
<code>netip.ParseAddr</code> returns no error (note the inverted
convention: <code>netip.ParseAddr</code> reports a successfully parsed
address via <code>err == nil</code>, not via a non-zero return). The
trailing-dot strip is required because <code>"0.¹.0.0."</code> maps to
<code>"0.1.0.0."</code>, which a bare <code>net.ParseIP</code> rejects
on its own yet is still an IP literal for routing purposes; the strip
exposes the literal so the parser sees it.
</li>
</ol>
</recommendation>

<example>
<p>
Vulnerable pattern. The host string is mapped through the IDNA profile
and reaches a network sink with no post-IDNA IP-literal recheck:
</p>

<sample src="IdnaIpLiteralSmuggleBad.go"/>

<p>
Safe pattern. Post-IDNA trailing-dot strip followed by
<code>net.ParseIP</code> recheck:
</p>

<sample src="IdnaIpLiteralSmuggleGood.go"/>

<p>
The safe pattern accepts three trailing-dot strip forms. They are
<strong>not</strong> equivalent in coverage:
</p>
<ul>
<li><code>strings.TrimRight(ace, ".")</code>: strict form. Strips
all trailing dots, so the multi-dot residue produced when UTS-46
maps the fullwidth dot U+FF0E or the ideographic dot U+3002 next
to ASCII dots is fully removed.</li>
<li><code>strings.TrimSuffix(ace, ".")</code>: lenient form. Strips
only one trailing dot. Sufficient for the canonical
<code>"0.1.0.0."</code> shape but leaves residue if multiple
trailing dots were produced by mapping.</li>
<li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>:
manual single-dot slice. Behaves identically to
<code>TrimSuffix</code> in coverage and inherits the same
multi-dot-residue limitation.</li>
</ul>
<p>
Callers whose threat model includes the multi-trailing-dot variant
should prefer <code>strings.TrimRight</code>. After the strip, parse
with <code>netip.ParseAddr</code> (preferred) or <code>net.ParseIP</code>
and reject if the value parses as an IP literal (<code>err == nil</code>
for the former, non-<code>nil</code> return for the latter).
</p>
</example>

<references>

<li>
Unicode Technical Standard #46 (IDNA Compatibility Processing):
<a href="https://www.unicode.org/reports/tr46/">https://www.unicode.org/reports/tr46/</a>
</li>
<li>
<code>golang.org/x/net/idna</code> package documentation:
<a href="https://pkg.go.dev/golang.org/x/net/idna">https://pkg.go.dev/golang.org/x/net/idna</a>
</li>
<li>
WHATWG URL Standard, <code>ends_in_a_number</code> host parser check
(prior art for IP-literal detection in URL parsers):
<a href="https://url.spec.whatwg.org/#ends-in-a-number-checker">https://url.spec.whatwg.org/#ends-in-a-number-checker</a>
</li>
<li>
CWE-918: Server-Side Request Forgery (SSRF):
<a href="https://cwe.mitre.org/data/definitions/918.html">https://cwe.mitre.org/data/definitions/918.html</a>
</li>
<li>
CWE-020: Improper Input Validation:
<a href="https://cwe.mitre.org/data/definitions/20.html">https://cwe.mitre.org/data/definitions/20.html</a>
</li>

</references>
</qhelp>
36 changes: 36 additions & 0 deletions go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/**
* @name IDNA digit-fold IP-literal smuggling via UTS-46 NFKC mapping
* @description An untrusted hostname flows through
* `(*golang.org/x/net/idna.Profile).ToASCII` or `.ToUnicode`
* on a digit-folding profile (which folds 100 non-ASCII
* Unicode digit codepoints to ASCII via UTS-46 NFKC) and
* reaches a security-relevant hostname sink without a
* post-IDNA IP-literal recheck. A caller that omits the
* recheck (or only runs `net.ParseIP` BEFORE the mapping
* call) will accept a smuggled IPv4 literal such as
* `"0.¹.0.0"` (which maps to `"0.1.0.0"`). Scope is IPv4
* only because IPv6 colons are rejected by IDNA
* rune-validation before UTS-46 mapping runs.
* @id go/idna-ip-literal-smuggle
* @kind path-problem
* @problem.severity warning
* @security-severity 8.1
* @precision high
* @tags security
* experimental
* external/cwe/cwe-918
* external/cwe/cwe-020
* @requires codeql/go-all >= 0.6.0
*/

import go
import IdnaIpLiteralSmuggle
import Flow::PathGraph

from
Flow::PathNode source,
Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink,
"Untrusted hostname from $@ flows through a `golang.org/x/net/idna` mapping call (which performs UTS-46 NFKC digit folding) and reaches this hostname sink without a post-IDNA `net.ParseIP` (or `netip.ParseAddr`) recheck on the trailing-dot-stripped value.",
source.getNode(), "this user-controlled value"
Loading
Loading