Skip to content

Swift: add String taint steps#11185

Merged
karimhamdanali merged 8 commits intogithub:mainfrom
karimhamdanali:swift-string-taint-steps
Nov 30, 2022
Merged

Swift: add String taint steps#11185
karimhamdanali merged 8 commits intogithub:mainfrom
karimhamdanali:swift-string-taint-steps

Conversation

@karimhamdanali
Copy link
Copy Markdown
Contributor

If a String is tainted, then all its fields should be tainted including computed properties such as utf8. This support follows a similar solution to what we do with URLs using TaintInheritingContent.

@karimhamdanali karimhamdanali requested a review from a team as a code owner November 9, 2022 13:59
@github-actions github-actions Bot added the Swift label Nov 9, 2022
this.getField().getEnclosingDecl().(ExtensionDecl).getExtendedTypeDecl().getFullName() =
"String"
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a sensible approach, but I will note a few rough edges in the case of String: String.count, String.isEmpty, String.first and String.last. These things can't contain a lot of data (especially String.isEmpty which is a Bool) so it would be very difficult for an attacker to utilize a theoretical taint path through any of them. In CPP in particular we've found taint flow through strlen (equivalent to String.count) usually produces many more false positive results than useful ones.

I'm not sure what the right solution should be. We could try to exclude the ones we don't want from this TaintInheritingContent, or switch to an explicit modelling approach. We could potentially block all taint through 'small' types everywhere, solving the problem for good (but would this limit some uses of taint?). Or we could stick to our principles that these are taint flows and expect queries to sanitize on an ad-hoc basis if they tend to find false positive results?

Copy link
Copy Markdown
Contributor

@d10c d10c Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, taint flow through small types is irrelevant for most "code injection" type queries. On the other hand, there was Heartbleed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, there have been plenty of real bugs related to tainted integers (malicious image width/height bugs spring to mind as a second example) - and as you suggest our recent focus on code injection queries may be biasing my viewpoint.

I remain somewhat nervous of taint flow through strlen / .count. In CPP we've seen a lot of code like this:

char *buffer = (char *)malloc(sizeof(char) * strlen(inputString));

which is usually a reasonable thing to do, even if inputString is tainted. There were other similar patterns, and in the end we blocked taint through strlen. This particular example ought to be uncommon in Swift, because you don't need to explicitly allocate like this when you're using a proper String class. I suppose we could take a wait-and-see approach on whether flow through .count is actually a problem (or a positive) on Swift?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be worth asking other language teams about this. Experience from C/C++ might not apply very well to this judgment call, but experience from Java/C#/JavaScript/Python and related languages might be a better indication of which direction which should go for Swift.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, we usually (we aren't 100% consistent) don't consider things like length to be propagating taint, because there are few opportunities in a memory-safe language where a tainted integer could be a problem. Although IMHO, I think we should have those models as well, because some queries actually look for tainted integers. Other queries not interested in that (like code injection) can always add a type-based sanitizer that filters out uninteresting types.

In Swift, where memory management can be a thing, I think modeling these kind of methods makes even more sense. Maybe String.count should be a specific sanitizer for overflow-like vulnerabilities instead of not modeling it at all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we opt for leaving .count et al as taint propagating steps, maybe we could have a library predicate like isConversionToSmallType(DataFlow::Node), so that queries that don't want this behavior can sanitize it out without reinventing the wheel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we keep all those fields/members as potential sources of taint until proven otherwise. As folks have mentioned here, even integers can be sources of taint in some cases. I'm wondering though if there's a way to implement @d10c suggestion without the query writer doing too much work here.

Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this. We should keep an eye on how flow through String.count works out in practice. And I've started a DCA run to check any impact on performance.

@geoffw0
Copy link
Copy Markdown
Contributor

geoffw0 commented Nov 15, 2022

The DCA run failed spectacularly. Before we start another, it might be a good idea to merge latest main into this PR so that everything is up-to-date.

@MathiasVP
Copy link
Copy Markdown
Contributor

Indeed, merging in main (or anything that includes #11181) should fix this.

@karimhamdanali
Copy link
Copy Markdown
Contributor Author

Sounds good. Let me do that then, because my won DCA run also failed on all levels! I'll start another DCA run once I merge/rebase with main.

@karimhamdanali karimhamdanali force-pushed the swift-string-taint-steps branch from 44ed216 to fc04766 Compare November 23, 2022 12:18
@karimhamdanali karimhamdanali requested a review from a team as a code owner November 23, 2022 12:18
@karimhamdanali karimhamdanali force-pushed the swift-string-taint-steps branch from fc04766 to 9bca57d Compare November 29, 2022 09:36
@karimhamdanali karimhamdanali force-pushed the swift-string-taint-steps branch from 9bca57d to f0d9dab Compare November 29, 2022 10:13
geoffw0
geoffw0 previously approved these changes Nov 30, 2022
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My DCA run succeeded and I don't see anything concerning in the results. 🎉

Now that we have support for taint through fields of String, we can now detect certain flows that we previously marked as [NOT DETECTED]. This commit updates the expected output of CWE-079 (and the in-code annotation of the accompanying test case) to reflect that update.
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants