|
6 | 6 | <overview> |
7 | 7 | <p> |
8 | 8 | The Go module <code>golang.org/x/net/idna</code> implements UTS-46 IDNA |
9 | | -processing. During the <code>Lookup</code> and <code>MapForLookup</code> |
10 | | -profiles, <code>(*Profile).ToASCII</code> applies an NFKC-based character map |
11 | | -that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong> |
12 | | -across 8 families to their ASCII equivalents. The 8 families are: |
| 9 | +processing. On the <code>Lookup</code> and <code>Display</code> profiles |
| 10 | +(and any profile constructed via <code>idna.New(idna.MapForLookup(), ...)</code>), |
| 11 | +both <code>(*Profile).ToASCII</code> and <code>(*Profile).ToUnicode</code> |
| 12 | +apply an NFKC-based character map that folds <strong>100 distinct |
| 13 | +non-ASCII Unicode digit codepoints</strong> to their ASCII equivalents. |
| 14 | +The 100 codepoints partition into the following Unicode-block ranges: |
13 | 15 | </p> |
14 | 16 | <ul> |
15 | 17 | <li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li> |
16 | 18 | <li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li> |
17 | 19 | <li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li> |
18 | 20 | <li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li> |
19 | 21 | <li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li> |
20 | | - <li>Mathematical bold, sans-serif, double-struck, and monospace digits |
| 22 | + <li>Mathematical Alphanumeric Symbols digits, spanning bold, |
| 23 | + double-struck, sans-serif, sans-serif-bold, and monospace styles |
21 | 24 | (U+1D7CE..U+1D7FF): 50 codepoints</li> |
22 | 25 | <li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li> |
23 | 26 | </ul> |
24 | 27 | <p> |
| 28 | +Devanagari digits (U+0966..U+096F) are <strong>not</strong> in scope: |
| 29 | +empirical testing against <code>golang.org/x/net/idna v0.53.0</code> |
| 30 | +confirms they do not fold to ASCII via UTS-46. The |
| 31 | +<code>Registration</code> profile is structurally covered by the rule |
| 32 | +but disallows every fold codepoint at the rune-validation stage, so a |
| 33 | +caller that respects the returned <code>error</code> never sees a |
| 34 | +smuggled literal from that profile in practice. |
| 35 | +</p> |
| 36 | +<p> |
25 | 37 | The library contains no IP-literal detection. A caller that applies UTS-46 |
26 | 38 | mapping to an attacker-controlled host string and consumes the result in a |
27 | 39 | network sink without rechecking against IP-literal parsers receives a |
@@ -57,48 +69,59 @@ Use a strict IDNA profile option that returns an error if the mapped |
57 | 69 | output parses as an IP literal, if your IDNA library exposes one. |
58 | 70 | </li> |
59 | 71 | <li> |
60 | | -Apply the explicit safe pattern: after <code>idna.ToASCII</code>, trim a |
61 | | -single trailing dot and call <code>net.ParseIP</code> (or |
62 | | -<code>netip.ParseAddr</code>) on the result, then reject on non-nil. The |
63 | | -trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to |
64 | | -<code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its |
65 | | -own yet is still an IP literal for routing purposes. |
| 72 | +Apply the explicit safe pattern: after the IDNA mapping call, strip |
| 73 | +trailing dots from the result and parse it. Reject if |
| 74 | +<code>net.ParseIP</code> returns a non-<code>nil</code> address, or if |
| 75 | +<code>netip.ParseAddr</code> returns no error (note the inverted |
| 76 | +convention: <code>netip.ParseAddr</code> reports a successfully parsed |
| 77 | +address via <code>err == nil</code>, not via a non-zero return). The |
| 78 | +trailing-dot strip is required because <code>"0.¹.0.0."</code> maps to |
| 79 | +<code>"0.1.0.0."</code>, which a bare <code>net.ParseIP</code> rejects |
| 80 | +on its own yet is still an IP literal for routing purposes; the strip |
| 81 | +exposes the literal so the parser sees it. |
66 | 82 | </li> |
67 | 83 | </ol> |
68 | 84 | </recommendation> |
69 | 85 |
|
70 | 86 | <example> |
71 | 87 | <p> |
72 | | -Vulnerable pattern. <code>net.ParseIP</code> is called only before |
73 | | -<code>idna.ToASCII</code>, so the smuggled literal slips through: |
| 88 | +Vulnerable pattern. The host string is mapped through the IDNA profile |
| 89 | +and reaches a network sink with no post-IDNA IP-literal recheck: |
74 | 90 | </p> |
75 | 91 |
|
76 | 92 | <sample src="IdnaIpLiteralSmuggleBad.go"/> |
77 | 93 |
|
78 | 94 | <p> |
79 | | -Safe pattern. Post-IDNA trailing-dot trim followed by |
| 95 | +Safe pattern. Post-IDNA trailing-dot strip followed by |
80 | 96 | <code>net.ParseIP</code> recheck: |
81 | 97 | </p> |
82 | 98 |
|
83 | 99 | <sample src="IdnaIpLiteralSmuggleGood.go"/> |
84 | 100 |
|
85 | 101 | <p> |
86 | | -The safe pattern accepts three equivalent trailing-dot trim forms: |
| 102 | +The safe pattern accepts three trailing-dot strip forms. They are |
| 103 | +<strong>not</strong> equivalent in coverage: |
87 | 104 | </p> |
88 | 105 | <ul> |
89 | | - <li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles |
90 | | - the fullwidth and ideographic dot variants that produce multiple |
91 | | - trailing ASCII dots after UTS-46 mapping.</li> |
92 | | - <li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form. |
93 | | - Sufficient for most inputs but incomplete for the multi-dot |
94 | | - variant.</li> |
| 106 | + <li><code>strings.TrimRight(ace, ".")</code>: strict form. Strips |
| 107 | + all trailing dots, so the multi-dot residue produced when UTS-46 |
| 108 | + maps the fullwidth dot U+FF0E or the ideographic dot U+3002 next |
| 109 | + to ASCII dots is fully removed.</li> |
| 110 | + <li><code>strings.TrimSuffix(ace, ".")</code>: lenient form. Strips |
| 111 | + only one trailing dot. Sufficient for the canonical |
| 112 | + <code>"0.1.0.0."</code> shape but leaves residue if multiple |
| 113 | + trailing dots were produced by mapping.</li> |
95 | 114 | <li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>: |
96 | | - manual slice form. Equivalent to <code>TrimSuffix</code> in |
97 | | - effect.</li> |
| 115 | + manual single-dot slice. Behaves identically to |
| 116 | + <code>TrimSuffix</code> in coverage and inherits the same |
| 117 | + multi-dot-residue limitation.</li> |
98 | 118 | </ul> |
99 | 119 | <p> |
100 | | -After trimming, call <code>netip.ParseAddr</code> (preferred) or |
101 | | -<code>net.ParseIP</code> on the result and reject if it parses as an IP literal. |
| 120 | +Callers whose threat model includes the multi-trailing-dot variant |
| 121 | +should prefer <code>strings.TrimRight</code>. After the strip, parse |
| 122 | +with <code>netip.ParseAddr</code> (preferred) or <code>net.ParseIP</code> |
| 123 | +and reject if the value parses as an IP literal (<code>err == nil</code> |
| 124 | +for the former, non-<code>nil</code> return for the latter). |
102 | 125 | </p> |
103 | 126 | </example> |
104 | 127 |
|
|
0 commit comments