Skip to content

Swift: Model withUnsafeBytes and similar closure methods#13827

Merged
MathiasVP merged 23 commits intogithub:mainfrom
geoffw0:closuremodels
Aug 25, 2023
Merged

Swift: Model withUnsafeBytes and similar closure methods#13827
MathiasVP merged 23 commits intogithub:mainfrom
geoffw0:closuremodels

Conversation

@geoffw0
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 commented Jul 27, 2023

(draft PR because this is currently sitting atop an early version of #13741 that is necessary to get some of the results; please ignore those commits, they will be rebased away once that PR is merged)

Model withUnsafeBytes and some similar closure methods.

Limitations:

  • I've used .ArrayElement to access content when we know the Collection is an Array. I've used two models - one with .ArrayElement and one with no content specifier - when the object could be an Array or another type of Collection. I've used no content specifier for pointers (UnsafePointer etc) since we don't have content for any of those (yet). Thus, some level of conflation (e.g. between ptr and ptr[0]) is inevitable at this stage. We should be able to improve accuracy as we implement further types of content flow.
  • another limitation is that certain methods and fields used in some of the tests (e.g. .baseAddress) are not yet modelled. There is an in-progress issue for these.
  • I'm not sure why taint from array elements isn't always flowing into closures, e.g. in int.swift line 19. It works if the qualifier is a tainted array, but not if it's an array with tainted content as in the test (despite there being models for both cases).
    • update: there's a fix in another PR that very likely addresses has addressed this.

TODO:

@geoffw0 geoffw0 marked this pull request as ready for review August 4, 2023 08:47
@geoffw0 geoffw0 requested a review from a team as a code owner August 4, 2023 08:47
@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 8, 2023

With the content flow improvements in main now, I think this is ready to review and merge. @rdmarsh2 would you mind taking a look?

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 8, 2023

(DCA LGTM)

@rdmarsh2
Copy link
Copy Markdown
Contributor

Should those really all be taint steps? I wonder if we could do value flow if we flowed through the UnsafeByte pointers as CollectionContent

@MathiasVP
Copy link
Copy Markdown
Contributor

Examples such as this (from https://developer.apple.com/documentation/swift/array/withunsafebufferpointer(_:)) certainly makes it seem like we should be using some kind of Content:

let numbers = [1, 2, 3, 4, 5]
let sum = numbers.withUnsafeBufferPointer { buffer -> Int in
    var result = 0
    for i in stride(from: buffer.startIndex, to: buffer.endIndex, by: 2) {
        result += buffer[i]
    }
    return result
}

it sounds like withUnsafeBufferPointer should be using ArrayContent, though

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 14, 2023

Should those really all be taint steps? I wonder if we could do value flow if we flowed through the UnsafeByte pointers as CollectionContent

Oh I see what you mean - though the container itself is not the same (it's a collection outside, but a pointer/buffer inside the closure), the pointed-to elements are identical to those in the original container (at least if the pointer / buffer is typed - so the exception would be any cases that convert to a so-called "Raw" pointer, which should just be taint flow).

Examples such as this ... certainly makes it seem like we should be using some kind of Content:

I think this PR was affected by not having CollectionElement and possibly the missing the MAD content flow bug fix at the time I wrote it. I'll update...

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 21, 2023

Updated:

  • merged in a more recent main
  • models are now using CollectionElement where appropriate
  • models are now using value flow where appropriate (content -> content flow in all but the with*Bytes methods)
  • fixed issues where the test was written incorrectly (without accounting for content)
  • allow subscripted reads from collection content (arrays can end up with collection content)

There's quite a lot going on in this PR now. I'll run DCA again to be safe.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 21, 2023

Just pushed a fix for the UnsafeJsEval test failure. The fix isn't perfect, because we have an ad-hoc model of .baseAddress that doesn't do content flow quite right (it's difficult to get it right). I have created a follow-up issue to come back to that.

@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 22, 2023

The DCA repeat-run looks fine. The PR checks all pass now. I would ideally like to merge this PR soon and before it gets any bigger...

Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a lot of nice to have things as follow-ups, but they're all planned as future work. So in the spirit of not having this PR linger for any longer I'll just merge it now :shipit:

Comment on lines +806 to +811
(
subscript.getBase().getType() instanceof ArrayType and
c.isSingleton(any(Content::ArrayContent ac))
or
c.isSingleton(any(Content::CollectionContent ac))
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that a future PR can either collapse ArrayContent and CollectionContent into a single IPA branch, or that we can ensure that a given read/store step always reads a single Content. But obviously, that's not for this PR to handle. So this LGTM for now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100%.

Comment on lines +230 to +232
* - elements of collections, such as `Set<Element>`.
* - elements of buffers, such as `UnsafeBufferPointer<Element>`.
* - the pointee of a pointer, such as `UnsafePointer<Pointee>`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a small performance win to split up these three things into separate Contents in the future. But we can delay that for now, obviously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsafeBufferPointer is in fact yet another type of Collection, so it falls into the existing discussion about merging or splitting collection content types properly.

UnsafePointer appears to not be a collection though. I'll create an issue for giving it its own content type.

(
isSink(node)
or
isAdditionalFlowStep(node, _)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... Why do we need implicit read steps at the additional flow steps, and not just at the sink? I assume this is related to the ad-hoc model you had to provide for .baseAddress?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that was why, and there's an issue for turning that ad-hoc model into a general one and getting rid of the remaining special cases here.

Comment on lines +34 to +35
c.getAReadContent() instanceof DataFlow::Content::ArrayContent or
c.getAReadContent() instanceof DataFlow::Content::CollectionContent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, I really hope we can merge these eventually. We're straining the dataflow library quite a lot with these duplications (i.e., if we have a path that should have an access path like [a, CollectionContent, b, CollectionContent] we're really forcing the dataflow library to have:

[Field[a], CollectionContent, Field[b], CollectionContent]
[Field[a], CollectionContent, Field[b], ArrayContent]
[Field[a], ArrayContent, Field[b], CollectionContent]
[Field[a], ArrayContent, Field[b], ArrayContent]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@MathiasVP MathiasVP merged commit 2fd627b into github:main Aug 25, 2023
@geoffw0
Copy link
Copy Markdown
Contributor Author

geoffw0 commented Aug 29, 2023

Thanks for reviewing and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants