Go: add experimental query for IDNA digit-fold IP-literal smuggling#21784
Go: add experimental query for IDNA digit-fold IP-literal smuggling#21784astrogilda wants to merge 3 commits intogithub:mainfrom
Conversation
Adds go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/. The query detects callers that pass an untrusted hostname through golang.org/x/net/idna (UTS-46 NFKC mapping folds 100 non-ASCII Unicode digit codepoints to ASCII) into a network sink without rechecking the post-IDNA value as an IP literal after a trailing-dot trim. Without the recheck, an input such as "0.[U+00B9].0.0" maps to "0.1.0.0" and slips past any pre-IDNA net.ParseIP guard, smuggling an IPv4 literal into net.JoinHostPort, net.Dial, http.Request.URL.Host, tls.Config.ServerName, http.Cookie.Domain, and the DNS resolver entry points. Implementation uses TaintTracking::GlobalWithState with two flow states (TPreIdna, TPostIdna). The IDNA mapping call is a state transition step. The barrier is a trailing-dot trim followed by net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix in TPostIdna; a bare ParseIP without the prior trim does not sanitize because "0.1.0.0." is rejected by ParseIP yet remains a valid IP literal for routing. IPv6 is out of scope because the colon is a UTS-46 disallowed rune. Tests: 23 unique sink alerts on the positive fixture (which includes the canonical canonicalAddr-shape wrapper), no false positives on the negative fixture (safe pattern, Punycode profile, non-IDNA hostname use). codeql test run reports 1 of 1 PASSED.
There was a problem hiding this comment.
Pull request overview
Adds a new experimental Go security query for detecting IDNA UTS-46 digit-fold cases where untrusted hostnames become IPv4 literals after IDNA normalization and then flow into network-relevant sinks without a post-IDNA IP recheck.
Changes:
- Adds a new stateful taint-tracking library/query under
go/ql/src/experimental/CWE-918/plus qhelp and example code. - Adds positive and negative inline-expectation fixtures covering sinks, barriers, and a witness-binding regression case.
- Adds a dedicated Go test module and expected output for the new experimental query.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/positives.go |
Positive inline-expectation cases for vulnerable IDNA-to-sink flows. |
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/negatives.go |
Negative and regression fixtures for safe post-IDNA rechecks and non-sink cases. |
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/go.mod |
Standalone module/dependency setup for the new Go test fixture. |
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.qlref |
Connects the test directory to the experimental query and inline postprocessors. |
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.expected |
Generated expected output baseline for the inline-expectation test run. |
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleGood.go |
Safe example used by the query help documentation. |
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleBad.go |
Vulnerable example used by the query help documentation. |
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qll |
Core stateful taint-tracking model for IDNA transitions, barriers, and sinks. |
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql |
Query entry point and result metadata for the new path-problem query. |
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp |
End-user documentation, remediation guidance, and examples for the query. |
| // Compliant: post-IDNA TrimSuffix + netip.ParseAddr recheck before (*Resolver).LookupIPAddr. | ||
| func compliantResolverLookupIPAddr(req *http.Request) { | ||
| host := req.Header.Get("X-HOST-IPADDR-OK") | ||
| ace, err := idna.Lookup.ToASCII(host) | ||
| if err != nil { | ||
| return | ||
| } | ||
| candidate := strings.TrimSuffix(ace, ".") | ||
| if _, parseErr := netip.ParseAddr(candidate); parseErr == nil { | ||
| return | ||
| } | ||
| r := &net.Resolver{} | ||
| r.LookupIPAddr(context.Background(), ace) // OK: post-IDNA recheck barrier |
| // --- Class 8: Devanagari digits (U+0966..U+096F) --- | ||
| // UTS-46 NFKC folds Devanagari digits to ASCII equivalents. | ||
| // "१.0.0.0" (U+0967 DEVANAGARI ONE) -> "1.0.0.0" | ||
| func smuggleDevanagariDigit(req *http.Request) { | ||
| host := req.Header.Get("X-HOST-DEVANAGARI") // $ Source | ||
| ace, _ := idna.Lookup.ToASCII(host) | ||
| http.Get("https://" + ace + "/") // $ Alert | ||
| } | ||
|
|
||
| // --- Class 8 second positive: Devanagari digit U+0969 -> "3", | ||
| // (*net.Resolver).LookupIPAddr sink --- | ||
| // "१.0.0.३" -> "1.0.0.3" | ||
| func smuggleDevanagariResolverLookupIPAddr(req *http.Request) { | ||
| host := req.Header.Get("X-HOST-DEVANAGARI-TWO") // $ Source | ||
| ace, _ := idna.Lookup.ToASCII(host) | ||
| r := &net.Resolver{} | ||
| r.LookupIPAddr(context.Background(), ace) // $ Alert | ||
| } | ||
|
|
| * `(*idna.Profile).ToASCII` (and the package-level `idna.ToASCII`, | ||
| * `Lookup.ToASCII`, `MapForLookup().ToASCII`) is modeled as a |
| ace, _ := idna.Punycode.ToASCII(host) | ||
| http.Get("https://" + ace + "/") // OK: no digit-fold profile | ||
| } | ||
|
|
| Vulnerable pattern. <code>net.ParseIP</code> is called only before | ||
| <code>idna.ToASCII</code>, so the smuggled literal slips through: |
| that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong> | ||
| across 8 families to their ASCII equivalents. The 8 families are: | ||
| </p> | ||
| <ul> | ||
| <li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li> | ||
| <li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li> | ||
| <li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li> | ||
| <li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li> | ||
| <li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li> | ||
| <li>Mathematical bold, sans-serif, double-struck, and monospace digits | ||
| (U+1D7CE..U+1D7FF): 50 codepoints</li> | ||
| <li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li> |
| single trailing dot and call <code>net.ParseIP</code> (or | ||
| <code>netip.ParseAddr</code>) on the result, then reject on non-nil. The | ||
| trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to | ||
| <code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its | ||
| own yet is still an IP literal for routing purposes. |
| The safe pattern accepts three equivalent trailing-dot trim forms: | ||
| </p> | ||
| <ul> | ||
| <li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles | ||
| the fullwidth and ideographic dot variants that produce multiple | ||
| trailing ASCII dots after UTS-46 mapping.</li> | ||
| <li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form. | ||
| Sufficient for most inputs but incomplete for the multi-dot | ||
| variant.</li> | ||
| <li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>: | ||
| manual slice form. Equivalent to <code>TrimSuffix</code> in | ||
| effect.</li> |
| // True-negative: caller uses idna.Display for human rendering only; the | ||
| // output never reaches a network sink in this function. | ||
| func displayOnly(req *http.Request) { | ||
| host := req.Header.Get("X-HOST-DISPLAY") | ||
| disp, _ := idna.Display.ToUnicode(host) | ||
| _ = disp // OK: never reaches a sink |
| * @description An untrusted hostname flows through `golang.org/x/net/idna` | ||
| * mapping (which folds 100 non-ASCII Unicode digit codepoints to | ||
| * ASCII via UTS-46 NFKC) and reaches a security-relevant | ||
| * hostname sink without a post-IDNA IP-literal recheck. A | ||
| * caller that calls `net.ParseIP` only BEFORE `idna.ToASCII` | ||
| * will accept a smuggled IPv4 literal such as `"0.¹.0.0"` | ||
| * (which maps to `"0.1.0.0"`). Scope is IPv4 only because | ||
| * IPv6 colons are rejected by IDNA rune-validation before |
Fix several drift issues between the prose and the predicates: - qhelp now lists the seven Unicode-block ranges that account for the 100 fold codepoints, replacing the prior "8 families" claim that did not match the bullet list. Adds an explicit note that Devanagari digits do not fold (verified empirically against golang.org/x/net/idna v0.53.0) and that Registration disallows every fold codepoint at rune validation. - qhelp recommendation corrects netip.ParseAddr semantics (parsed via err == nil, not via a non-nil return) and rewrites the trim-form list to state that TrimSuffix and the manual slice are not equivalent to TrimRight for multi-trailing-dot inputs. - qhelp example caption matches the actual sample, which shows the no-recheck shape, not a pre-IDNA ParseIP shape. - qll module docstring now covers both ToASCII and ToUnicode on the digit-folding profiles, lists the actual idna.New(MapForLookup) construction shape, and explicitly states that the package-level idna.ToASCII helper and the Punycode profile are excluded. - ql @description and select message reflect that the model covers both ToASCII and ToUnicode and that the no-recheck case is the primary anti-pattern, not just the pre-IDNA-ParseIP case.
Three fixture changes plus a baseline refresh: - Drop the two Devanagari positives. Empirical testing against golang.org/x/net/idna v0.53.0 confirms that U+0966..U+096F do not fold to ASCII via UTS-46; they pass through Punycode (xn--*) on all four profiles. Keeping the cases in would be misleading because the query fires structurally without the runtime smuggle ever existing. - Add two positives covering Profile.ToUnicode on Latin-1 and Math superscript inputs. The library runs validateAndMap before the encode-vs-decode branch, so ToUnicode produces the same digit-folded ASCII output as ToASCII for the in-scope codepoints. The earlier fixture only exercised ToASCII despite the model handling both. - Add three negatives. Two pin the ParseCIDR and ParsePrefix branches of the recheck-input predicate, which had no sink-reaching coverage before. The third pins the documented exclusion of the package-level idna.ToASCII helper against future broadening of the call matcher. Baseline refreshed via codeql test run --learn after the fixture changes shifted line numbers and the new select message text replaced the old one.
|
Pushed two follow-ups for the review pass. docs: align qhelp/qll/ql with implemented model (
test: tighten fixture coverage and refresh baseline (
Baseline refreshed via One precision gap I noticed during the round but did not pull into this commit, for transparency: the |
Summary
go/idna-ip-literal-smuggleundergo/ql/src/experimental/CWE-918/that detects the UTS-46 IDNA digit-fold IP-literal smuggling anti-pattern.TaintTracking::GlobalWithState) with two flow states (TPreIdna,TPostIdna) and a barrier that requires a trailing-dot trim followed bynet.ParseIP/netip.ParseAddrto clear post-IDNA flow.Motivation
golang.org/x/net/idnaapplies UTS-46 NFKC mapping inside(*Profile).ToASCIIand(*Profile).ToUnicodefor theLookup,Display,Registration, andMapForLookup-derived profiles. The mapping folds 100 distinct non-ASCII Unicode digit codepoints partitioned into seven Unicode-block ranges (Latin-1 superscripts, mathematical superscripts, mathematical subscripts, circled digits, fullwidth digits, the Mathematical Alphanumeric Symbols block covering bold, double-struck, sans-serif, sans-serif-bold, and monospace digit styles, and segmented digits) to their ASCII equivalents. Devanagari digits (U+0966..U+096F) are not in scope: empirically verified againstgolang.org/x/net/idna v0.53.0, they pass through Punycode rather than fold to ASCII. The library contains no IP-literal detection, so a caller that runsnet.ParseIPonly beforeidna.ToASCIIwill accept a smuggled IPv4 literal such as"0.¹.0.0"(which maps to"0.1.0.0") into a downstream network sink that assumed the value was a domain name.Single-state taint tracking is structurally insufficient: a pre-IDNA
net.ParseIPbarrier must NOT block flow that transitions through the IDNA call. The query therefore uses two flow states. Sources start inTPreIdna. The IDNA mapping call is modeled as a state-transition step that flipsTPreIdna -> TPostIdna. Sinks are flagged only inTPostIdna. The barrier isnet.ParseIP/net.ParseCIDR/netip.ParseAddr/netip.ParsePrefix, but only when the value reaching the parser was produced by a trailing-dot trim (strings.TrimRight(_, "."),strings.TrimSuffix(_, "."), or the manualif strings.HasSuffix(out, ".") { out = out[:len(out)-1] }slice form) of the IDNA output. The trim is required because"0.¹.0.0."maps to"0.1.0.0.", whichnet.ParseIPrejects on its own yet is still an IP literal for routing purposes.The detection-strategy design choice (CodeQL inter-procedural taint as the primary recall vehicle, with Semgrep OSS as a high-precision direct-call secondary sweep) is documented in the upstream rules repo at
docs/research/v0.1-detection-strategy.md. The short version: CodeQLTaintTracking::GlobalWithStateis inter-procedural by default and propagates through one-deep wrappers likeidnaASCIIwithout an explicitisAdditionalFlowStep. That matters because the only two production callsites of any UTS-46-mapping profile acrossgolang/go,kubernetes/kubernetes, andprometheus/prometheus(a 660 MB sweep) are wrapped by such a helper. A blanket structural rule of the form "ToASCII without recheck in same block" was considered and rejected: an empirical sweep across 31 production callsites in 19 repositories shows the rule would fire on PSL walkers, registrar pipelines, and TLS-SNI normalization sites where the mapped value never reaches a network-routing decision driven by attacker input, and the false-positive density is too high to ship as a default.Scope is IPv4 only. IPv6 colons are rejected by IDNA rune-validation before UTS-46 mapping runs, so no IPv6 smuggle path exists.
Verification
Compile and unit-test commands (run from a working tree of
github/codeql, with the staged files in place):Local results from CodeQL CLI 2.25.3:
query compile: clean (1/1 success, 17 s).test run: positive fixture produces 23 unique sink alerts plus 1 intended-positive on the adversarial-witness binding case; negative fixture produces 0 alerts. Inline-expectations match.The positive fixture exercises the canonical wrapper shape modeled on
golang.org/x/net/http/httpproxy.canonicalAddr(a one-deepidnaASCIIhelper consumed by bothnet.JoinHostPortandnet.Dial-class sinks). The query fires twice on this shape, once per sink. That is the empirical evidence thatTaintTracking::GlobalWithStatepropagates through the wrapper with no additional-flow-step model required.The negative fixture covers the safe pattern (post-IDNA
strings.TrimRight/strings.TrimSuffix/ manualHasSuffixslice followed bynet.ParseIPornetip.ParseAddr), the Punycode profile (which has a nil mapping function and is correctly excluded from theidnaMappingCallpredicate), and unrelated hostname use that never transits IDNA.Disposition by the upstream Go security team
The Go security team reviewed the underlying digit-fold behavior and declined to treat it as a library bug on 2026-04-29: their position is that UTS-46 is correctly implemented and the recheck obligation belongs to the caller. This query implements the caller-side detection that follows from that disposition. It is not a substitute for a library fix; it is the alert that surfaces the contract on the call site where the recheck is missing.
Source artefacts
The query, library, qhelp, examples, and test fixtures originate from the upstream rules repository
astrogilda/idna-ip-literal-smuggle-rules(codeql/subdirectory). The upstream repo also carries the corpus-eval scripts, the v0.1.x detection-strategy synthesis underdocs/research/, and complementary Semgrep and gopatch rules for the same anti-pattern.CLA
I will sign the CLA on first PR comment per the registry workflow.
References
golang.org/x/net/idnapackage documentation: https://pkg.go.dev/golang.org/x/net/idnaends-in-a-numberhost parser check: https://url.spec.whatwg.org/#ends-in-a-number-checkerCross-language analogs of the same caller-side IP-literal-validation gap that this query addresses for Go IDNA:
net.ParseIPaccepted IPv4 octets with leading zeros, allowing SSRF allowlist bypass via octal interpretation downstream.idna(Python) UTS-46 / Punycode handling input that drove disproportionate parser cost; same library family, different failure mode but illustrates the recurring pattern of UTS-46 input requiring caller-side validation that the library does not provide.