Skip to content

Commit 6392c52

Browse files
committed
docs: align IDNA digit-fold qhelp/qll/ql with implemented model
Fix several drift issues between the prose and the predicates: - qhelp now lists the seven Unicode-block ranges that account for the 100 fold codepoints, replacing the prior "8 families" claim that did not match the bullet list. Adds an explicit note that Devanagari digits do not fold (verified empirically against golang.org/x/net/idna v0.53.0) and that Registration disallows every fold codepoint at rune validation. - qhelp recommendation corrects netip.ParseAddr semantics (parsed via err == nil, not via a non-nil return) and rewrites the trim-form list to state that TrimSuffix and the manual slice are not equivalent to TrimRight for multi-trailing-dot inputs. - qhelp example caption matches the actual sample, which shows the no-recheck shape, not a pre-IDNA ParseIP shape. - qll module docstring now covers both ToASCII and ToUnicode on the digit-folding profiles, lists the actual idna.New(MapForLookup) construction shape, and explicitly states that the package-level idna.ToASCII helper and the Punycode profile are excluded. - ql @description and select message reflect that the model covers both ToASCII and ToUnicode and that the no-recheck case is the primary anti-pattern, not just the pre-IDNA-ParseIP case.
1 parent 92e52e2 commit 6392c52

3 files changed

Lines changed: 82 additions & 51 deletions

File tree

go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp

Lines changed: 48 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,34 @@
66
<overview>
77
<p>
88
The Go module <code>golang.org/x/net/idna</code> implements UTS-46 IDNA
9-
processing. During the <code>Lookup</code> and <code>MapForLookup</code>
10-
profiles, <code>(*Profile).ToASCII</code> applies an NFKC-based character map
11-
that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong>
12-
across 8 families to their ASCII equivalents. The 8 families are:
9+
processing. On the <code>Lookup</code> and <code>Display</code> profiles
10+
(and any profile constructed via <code>idna.New(idna.MapForLookup(), ...)</code>),
11+
both <code>(*Profile).ToASCII</code> and <code>(*Profile).ToUnicode</code>
12+
apply an NFKC-based character map that folds <strong>100 distinct
13+
non-ASCII Unicode digit codepoints</strong> to their ASCII equivalents.
14+
The 100 codepoints partition into the following Unicode-block ranges:
1315
</p>
1416
<ul>
1517
<li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li>
1618
<li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li>
1719
<li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li>
1820
<li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li>
1921
<li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li>
20-
<li>Mathematical bold, sans-serif, double-struck, and monospace digits
22+
<li>Mathematical Alphanumeric Symbols digits, spanning bold,
23+
double-struck, sans-serif, sans-serif-bold, and monospace styles
2124
(U+1D7CE..U+1D7FF): 50 codepoints</li>
2225
<li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li>
2326
</ul>
2427
<p>
28+
Devanagari digits (U+0966..U+096F) are <strong>not</strong> in scope:
29+
empirical testing against <code>golang.org/x/net/idna v0.53.0</code>
30+
confirms they do not fold to ASCII via UTS-46. The
31+
<code>Registration</code> profile is structurally covered by the rule
32+
but disallows every fold codepoint at the rune-validation stage, so a
33+
caller that respects the returned <code>error</code> never sees a
34+
smuggled literal from that profile in practice.
35+
</p>
36+
<p>
2537
The library contains no IP-literal detection. A caller that applies UTS-46
2638
mapping to an attacker-controlled host string and consumes the result in a
2739
network sink without rechecking against IP-literal parsers receives a
@@ -57,48 +69,59 @@ Use a strict IDNA profile option that returns an error if the mapped
5769
output parses as an IP literal, if your IDNA library exposes one.
5870
</li>
5971
<li>
60-
Apply the explicit safe pattern: after <code>idna.ToASCII</code>, trim a
61-
single trailing dot and call <code>net.ParseIP</code> (or
62-
<code>netip.ParseAddr</code>) on the result, then reject on non-nil. The
63-
trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to
64-
<code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its
65-
own yet is still an IP literal for routing purposes.
72+
Apply the explicit safe pattern: after the IDNA mapping call, strip
73+
trailing dots from the result and parse it. Reject if
74+
<code>net.ParseIP</code> returns a non-<code>nil</code> address, or if
75+
<code>netip.ParseAddr</code> returns no error (note the inverted
76+
convention: <code>netip.ParseAddr</code> reports a successfully parsed
77+
address via <code>err == nil</code>, not via a non-zero return). The
78+
trailing-dot strip is required because <code>"0.¹.0.0."</code> maps to
79+
<code>"0.1.0.0."</code>, which a bare <code>net.ParseIP</code> rejects
80+
on its own yet is still an IP literal for routing purposes; the strip
81+
exposes the literal so the parser sees it.
6682
</li>
6783
</ol>
6884
</recommendation>
6985

7086
<example>
7187
<p>
72-
Vulnerable pattern. <code>net.ParseIP</code> is called only before
73-
<code>idna.ToASCII</code>, so the smuggled literal slips through:
88+
Vulnerable pattern. The host string is mapped through the IDNA profile
89+
and reaches a network sink with no post-IDNA IP-literal recheck:
7490
</p>
7591

7692
<sample src="IdnaIpLiteralSmuggleBad.go"/>
7793

7894
<p>
79-
Safe pattern. Post-IDNA trailing-dot trim followed by
95+
Safe pattern. Post-IDNA trailing-dot strip followed by
8096
<code>net.ParseIP</code> recheck:
8197
</p>
8298

8399
<sample src="IdnaIpLiteralSmuggleGood.go"/>
84100

85101
<p>
86-
The safe pattern accepts three equivalent trailing-dot trim forms:
102+
The safe pattern accepts three trailing-dot strip forms. They are
103+
<strong>not</strong> equivalent in coverage:
87104
</p>
88105
<ul>
89-
<li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles
90-
the fullwidth and ideographic dot variants that produce multiple
91-
trailing ASCII dots after UTS-46 mapping.</li>
92-
<li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form.
93-
Sufficient for most inputs but incomplete for the multi-dot
94-
variant.</li>
106+
<li><code>strings.TrimRight(ace, ".")</code>: strict form. Strips
107+
all trailing dots, so the multi-dot residue produced when UTS-46
108+
maps the fullwidth dot U+FF0E or the ideographic dot U+3002 next
109+
to ASCII dots is fully removed.</li>
110+
<li><code>strings.TrimSuffix(ace, ".")</code>: lenient form. Strips
111+
only one trailing dot. Sufficient for the canonical
112+
<code>"0.1.0.0."</code> shape but leaves residue if multiple
113+
trailing dots were produced by mapping.</li>
95114
<li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>:
96-
manual slice form. Equivalent to <code>TrimSuffix</code> in
97-
effect.</li>
115+
manual single-dot slice. Behaves identically to
116+
<code>TrimSuffix</code> in coverage and inherits the same
117+
multi-dot-residue limitation.</li>
98118
</ul>
99119
<p>
100-
After trimming, call <code>netip.ParseAddr</code> (preferred) or
101-
<code>net.ParseIP</code> on the result and reject if it parses as an IP literal.
120+
Callers whose threat model includes the multi-trailing-dot variant
121+
should prefer <code>strings.TrimRight</code>. After the strip, parse
122+
with <code>netip.ParseAddr</code> (preferred) or <code>net.ParseIP</code>
123+
and reject if the value parses as an IP literal (<code>err == nil</code>
124+
for the former, non-<code>nil</code> return for the latter).
102125
</p>
103126
</example>
104127

go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
/**
22
* @name IDNA digit-fold IP-literal smuggling via UTS-46 NFKC mapping
3-
* @description An untrusted hostname flows through `golang.org/x/net/idna`
4-
* mapping (which folds 100 non-ASCII Unicode digit codepoints to
5-
* ASCII via UTS-46 NFKC) and reaches a security-relevant
6-
* hostname sink without a post-IDNA IP-literal recheck. A
7-
* caller that calls `net.ParseIP` only BEFORE `idna.ToASCII`
8-
* will accept a smuggled IPv4 literal such as `"0.¹.0.0"`
9-
* (which maps to `"0.1.0.0"`). Scope is IPv4 only because
10-
* IPv6 colons are rejected by IDNA rune-validation before
11-
* UTS-46 mapping runs.
3+
* @description An untrusted hostname flows through
4+
* `(*golang.org/x/net/idna.Profile).ToASCII` or `.ToUnicode`
5+
* on a digit-folding profile (which folds 100 non-ASCII
6+
* Unicode digit codepoints to ASCII via UTS-46 NFKC) and
7+
* reaches a security-relevant hostname sink without a
8+
* post-IDNA IP-literal recheck. A caller that omits the
9+
* recheck (or only runs `net.ParseIP` BEFORE the mapping
10+
* call) will accept a smuggled IPv4 literal such as
11+
* `"0.¹.0.0"` (which maps to `"0.1.0.0"`). Scope is IPv4
12+
* only because IPv6 colons are rejected by IDNA
13+
* rune-validation before UTS-46 mapping runs.
1214
* @id go/idna-ip-literal-smuggle
1315
* @kind path-problem
1416
* @problem.severity warning
@@ -30,5 +32,5 @@ from
3032
Flow::PathNode sink
3133
where Flow::flowPath(source, sink)
3234
select sink.getNode(), source, sink,
33-
"Untrusted hostname from $@ flows through `idna.ToASCII` (which performs UTS-46 NFKC digit folding) and reaches this hostname sink without a post-IDNA `net.ParseIP` recheck (after a trailing-dot trim).",
35+
"Untrusted hostname from $@ flows through a `golang.org/x/net/idna` mapping call (which performs UTS-46 NFKC digit folding) and reaches this hostname sink without a post-IDNA `net.ParseIP` (or `netip.ParseAddr`) recheck on the trailing-dot-stripped value.",
3436
source.getNode(), "this user-controlled value"

go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qll

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,19 @@
55
* Background
66
* ----------
77
* `golang.org/x/net/idna` applies UTS-46 NFKC mapping inside
8-
* `(*Profile).ToASCII`, which folds 100 non-ASCII Unicode digit codepoints
9-
* to their ASCII equivalents. The 100 codepoints span 8 families: Latin-1
10-
* superscripts, mathematical superscripts and subscripts, circled digits,
11-
* fullwidth digits, mathematical bold and sans-serif and double-struck
12-
* and monospace digits, and segmented digits. A caller that runs
13-
* `net.ParseIP` BEFORE `idna.ToASCII` will reject non-ASCII inputs as
14-
* non-IP, pass them to the IDNA library, and then receive a valid ASCII
15-
* IPv4 literal back as the "domain name" output. The post-IDNA result
16-
* silently bypasses any downstream IP-literal guard because the caller
17-
* never re-checks. Scope is IPv4 only. IPv6 colons are rejected by IDNA
18-
* rune-validation before UTS-46 mapping runs, so no IPv6 smuggle path
19-
* exists.
8+
* `(*Profile).ToASCII` and `(*Profile).ToUnicode`, which fold 100
9+
* non-ASCII Unicode digit codepoints to their ASCII equivalents. The
10+
* 100 codepoints span Latin-1 superscripts, mathematical superscripts
11+
* and subscripts, circled digits, fullwidth digits, the Mathematical
12+
* Alphanumeric Symbols block (bold, double-struck, sans-serif,
13+
* sans-serif-bold, and monospace digit styles), and segmented digits.
14+
* Devanagari digits are not in scope; they pass through Punycode rather
15+
* than fold to ASCII. A caller that omits a post-IDNA IP-literal
16+
* recheck (or that only checks BEFORE the IDNA call) will accept a
17+
* smuggled IPv4 literal back as the "domain name" output and pass it
18+
* to a downstream allowlist, SSRF guard, or routing decision unguarded.
19+
* Scope is IPv4 only. IPv6 colons are rejected by IDNA rune-validation
20+
* before UTS-46 mapping runs, so no IPv6 smuggle path exists.
2021
*
2122
* Modeling
2223
* --------
@@ -28,10 +29,15 @@
2829
* - `TPreIdna` : untrusted hostname before IDNA mapping
2930
* - `TPostIdna` : mapped output flowing toward a security-relevant sink
3031
*
31-
* `(*idna.Profile).ToASCII` (and the package-level `idna.ToASCII`,
32-
* `Lookup.ToASCII`, `MapForLookup().ToASCII`) is modeled as a
33-
* state-transition additional flow step that flips
34-
* `TPreIdna -> TPostIdna`.
32+
* `(*idna.Profile).ToASCII` and `(*idna.Profile).ToUnicode` on the
33+
* digit-folding profiles (`Lookup`, `Display`, `Registration`, and any
34+
* profile constructed via `idna.New(idna.MapForLookup(), ...)`) are
35+
* modeled as state-transition additional flow steps that flip
36+
* `TPreIdna -> TPostIdna`. The package-level `idna.ToASCII` helper is
37+
* intentionally NOT modeled because it dispatches to
38+
* `Punycode.process(...)`, which has a nil UTS-46 mapping and so
39+
* cannot produce the digit-fold smuggle. The `Punycode` profile is
40+
* excluded for the same reason.
3541
*
3642
* The barrier is `net.ParseIP`, `net.ParseCIDR`, `netip.ParseAddr`, or
3743
* `netip.ParsePrefix` consumed in `TPostIdna`. The safe pattern requires

0 commit comments

Comments
 (0)