Skip to content

Commit 92e52e2

Browse files
committed
Go: add experimental query for IDNA digit-fold IP-literal smuggling
Adds go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/. The query detects callers that pass an untrusted hostname through golang.org/x/net/idna (UTS-46 NFKC mapping folds 100 non-ASCII Unicode digit codepoints to ASCII) into a network sink without rechecking the post-IDNA value as an IP literal after a trailing-dot trim. Without the recheck, an input such as "0.[U+00B9].0.0" maps to "0.1.0.0" and slips past any pre-IDNA net.ParseIP guard, smuggling an IPv4 literal into net.JoinHostPort, net.Dial, http.Request.URL.Host, tls.Config.ServerName, http.Cookie.Domain, and the DNS resolver entry points. Implementation uses TaintTracking::GlobalWithState with two flow states (TPreIdna, TPostIdna). The IDNA mapping call is a state transition step. The barrier is a trailing-dot trim followed by net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix in TPostIdna; a bare ParseIP without the prior trim does not sanitize because "0.1.0.0." is rejected by ParseIP yet remains a valid IP literal for routing. IPv6 is out of scope because the colon is a UTS-46 disallowed rune. Tests: 23 unique sink alerts on the positive fixture (which includes the canonical canonicalAddr-shape wrapper), no false positives on the negative fixture (safe pattern, Punycode profile, non-IDNA hostname use). codeql test run reports 1 of 1 PASSED.
1 parent 154d213 commit 92e52e2

10 files changed

Lines changed: 1327 additions & 0 deletions

File tree

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
The Go module <code>golang.org/x/net/idna</code> implements UTS-46 IDNA
9+
processing. During the <code>Lookup</code> and <code>MapForLookup</code>
10+
profiles, <code>(*Profile).ToASCII</code> applies an NFKC-based character map
11+
that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong>
12+
across 8 families to their ASCII equivalents. The 8 families are:
13+
</p>
14+
<ul>
15+
<li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li>
16+
<li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li>
17+
<li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li>
18+
<li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li>
19+
<li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li>
20+
<li>Mathematical bold, sans-serif, double-struck, and monospace digits
21+
(U+1D7CE..U+1D7FF): 50 codepoints</li>
22+
<li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li>
23+
</ul>
24+
<p>
25+
The library contains no IP-literal detection. A caller that applies UTS-46
26+
mapping to an attacker-controlled host string and consumes the result in a
27+
network sink without rechecking against IP-literal parsers receives a
28+
valid ASCII IPv4 literal back as the "domain name" output. Any downstream
29+
allowlist check, SSRF guard, NoProxy match, or TLS-SNI router that does
30+
not re-check the post-IDNA result is bypassed. The anti-pattern also
31+
applies to callers that do a pre-IDNA <code>net.ParseIP</code> check and
32+
think it is sufficient: the smuggled host is not ASCII, so the pre-IDNA
33+
check rejects it as non-IP, and the post-IDNA value (now a numeric
34+
literal) reaches the sink unguarded.
35+
</p>
36+
<p>
37+
IPv6 is out of scope: <code>:</code> is a UTS-46 disallowed character;
38+
bare-IPv6 inputs are rejected by IDNA rune-validation before any
39+
digit-fold mapping runs.
40+
</p>
41+
<p>
42+
Sinks where the smuggled literal becomes exploitable include
43+
<code>net.JoinHostPort</code>, <code>net.Dial</code>,
44+
<code>(*http.Request).URL.Host</code>, <code>(*tls.Config).ServerName</code>,
45+
<code>(*http.Cookie).Domain</code>, and any HTTP client request URL
46+
constructed from the mapped value.
47+
</p>
48+
</overview>
49+
50+
<recommendation>
51+
<p>
52+
Either:
53+
</p>
54+
<ol>
55+
<li>
56+
Use a strict IDNA profile option that returns an error if the mapped
57+
output parses as an IP literal, if your IDNA library exposes one.
58+
</li>
59+
<li>
60+
Apply the explicit safe pattern: after <code>idna.ToASCII</code>, trim a
61+
single trailing dot and call <code>net.ParseIP</code> (or
62+
<code>netip.ParseAddr</code>) on the result, then reject on non-nil. The
63+
trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to
64+
<code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its
65+
own yet is still an IP literal for routing purposes.
66+
</li>
67+
</ol>
68+
</recommendation>
69+
70+
<example>
71+
<p>
72+
Vulnerable pattern. <code>net.ParseIP</code> is called only before
73+
<code>idna.ToASCII</code>, so the smuggled literal slips through:
74+
</p>
75+
76+
<sample src="IdnaIpLiteralSmuggleBad.go"/>
77+
78+
<p>
79+
Safe pattern. Post-IDNA trailing-dot trim followed by
80+
<code>net.ParseIP</code> recheck:
81+
</p>
82+
83+
<sample src="IdnaIpLiteralSmuggleGood.go"/>
84+
85+
<p>
86+
The safe pattern accepts three equivalent trailing-dot trim forms:
87+
</p>
88+
<ul>
89+
<li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles
90+
the fullwidth and ideographic dot variants that produce multiple
91+
trailing ASCII dots after UTS-46 mapping.</li>
92+
<li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form.
93+
Sufficient for most inputs but incomplete for the multi-dot
94+
variant.</li>
95+
<li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>:
96+
manual slice form. Equivalent to <code>TrimSuffix</code> in
97+
effect.</li>
98+
</ul>
99+
<p>
100+
After trimming, call <code>netip.ParseAddr</code> (preferred) or
101+
<code>net.ParseIP</code> on the result and reject if it parses as an IP literal.
102+
</p>
103+
</example>
104+
105+
<references>
106+
107+
<li>
108+
Unicode Technical Standard #46 (IDNA Compatibility Processing):
109+
<a href="https://www.unicode.org/reports/tr46/">https://www.unicode.org/reports/tr46/</a>
110+
</li>
111+
<li>
112+
<code>golang.org/x/net/idna</code> package documentation:
113+
<a href="https://pkg.go.dev/golang.org/x/net/idna">https://pkg.go.dev/golang.org/x/net/idna</a>
114+
</li>
115+
<li>
116+
WHATWG URL Standard, <code>ends_in_a_number</code> host parser check
117+
(prior art for IP-literal detection in URL parsers):
118+
<a href="https://url.spec.whatwg.org/#ends-in-a-number-checker">https://url.spec.whatwg.org/#ends-in-a-number-checker</a>
119+
</li>
120+
<li>
121+
CWE-918: Server-Side Request Forgery (SSRF):
122+
<a href="https://cwe.mitre.org/data/definitions/918.html">https://cwe.mitre.org/data/definitions/918.html</a>
123+
</li>
124+
<li>
125+
CWE-020: Improper Input Validation:
126+
<a href="https://cwe.mitre.org/data/definitions/20.html">https://cwe.mitre.org/data/definitions/20.html</a>
127+
</li>
128+
129+
</references>
130+
</qhelp>
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
/**
2+
* @name IDNA digit-fold IP-literal smuggling via UTS-46 NFKC mapping
3+
* @description An untrusted hostname flows through `golang.org/x/net/idna`
4+
* mapping (which folds 100 non-ASCII Unicode digit codepoints to
5+
* ASCII via UTS-46 NFKC) and reaches a security-relevant
6+
* hostname sink without a post-IDNA IP-literal recheck. A
7+
* caller that calls `net.ParseIP` only BEFORE `idna.ToASCII`
8+
* will accept a smuggled IPv4 literal such as `"0.¹.0.0"`
9+
* (which maps to `"0.1.0.0"`). Scope is IPv4 only because
10+
* IPv6 colons are rejected by IDNA rune-validation before
11+
* UTS-46 mapping runs.
12+
* @id go/idna-ip-literal-smuggle
13+
* @kind path-problem
14+
* @problem.severity warning
15+
* @security-severity 8.1
16+
* @precision high
17+
* @tags security
18+
* experimental
19+
* external/cwe/cwe-918
20+
* external/cwe/cwe-020
21+
* @requires codeql/go-all >= 0.6.0
22+
*/
23+
24+
import go
25+
import IdnaIpLiteralSmuggle
26+
import Flow::PathGraph
27+
28+
from
29+
Flow::PathNode source,
30+
Flow::PathNode sink
31+
where Flow::flowPath(source, sink)
32+
select sink.getNode(), source, sink,
33+
"Untrusted hostname from $@ flows through `idna.ToASCII` (which performs UTS-46 NFKC digit folding) and reaches this hostname sink without a post-IDNA `net.ParseIP` recheck (after a trailing-dot trim).",
34+
source.getNode(), "this user-controlled value"

0 commit comments

Comments
 (0)