Skip to content

Swift: add DataFlow::Content for arrays#13741

Merged
MathiasVP merged 21 commits intogithub:mainfrom
rdmarsh2:rdmarsh2/swift/array-content-flow
Aug 2, 2023
Merged

Swift: add DataFlow::Content for arrays#13741
MathiasVP merged 21 commits intogithub:mainfrom
rdmarsh2:rdmarsh2/swift/array-content-flow

Conversation

@rdmarsh2
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions github-actions Bot added the Swift label Jul 13, 2023
@rdmarsh2
Copy link
Copy Markdown
Contributor Author

Some more checking to do about Array member functions before I'm ready to merge this, but it seems to work so far.

@rdmarsh2 rdmarsh2 marked this pull request as ready for review July 18, 2023 14:08
@rdmarsh2 rdmarsh2 requested a review from a team as a code owner July 18, 2023 14:08
Comment thread swift/ql/lib/codeql/swift/dataflow/internal/DataFlowPrivate.qll Outdated
Comment thread swift/ql/lib/codeql/swift/dataflow/internal/DataFlowPrivate.qll Outdated
Comment thread swift/ql/lib/codeql/swift/dataflow/internal/DataFlowPrivate.qll Outdated
@MathiasVP
Copy link
Copy Markdown
Contributor

Also this needs a DCA run. But probably not before the above fixes have been implemented.

@MathiasVP
Copy link
Copy Markdown
Contributor

Oh, one more thing: We need to add ArrayContent to forceHighPrecision like Java does here: https://github.com/github/codeql/blob/main/java/ql/lib/semmle/code/java/dataflow/internal/DataFlowPrivate.qll#L391C3-L391C28. This will prevent an access-path blowup on certain evil databases.

Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not looking at this sooner.

This looks great. Thank you for including models-as-data support (ArrayElement), there are a number of models that should benefit from using this in the near future.


var matrix = [[source()]]
sink(arg: matrix[0])
sink(arg: matrix[0][0]) // $ flow=645
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does assignment to a multi-dimensional array work as well?

var matrix2 = [[1]]
matrix2[0][0] = source()
sink(arg: matrix2[0][0])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't, and now that I look at the cases I based this off of I'm not sure we handle nested fields or tuples properly either...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. Then we should probably create an issue for this (in all content types) rather than trying to address it for just arrays here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding some tests it seems to work for nested fields... I'm not sure why, though.

Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of "reverse reads": if readStep(n2.(PostUpdateNode).getPreUpdateNode(), f, n1.(PostUpdateNode).getPreUpdateNode()) then the dataflow library generates storeStep(n1, f, n2). So if this doesn't work for array content then it's either because the post-update/pre-update isn't working, or because the readStep isn't working.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually that there's a step through an InOutExpr between the store and read step:

#  650|     getElement(11): [AssignExpr]  ... = ...
#  650|       getDest(): [SubscriptExpr] ...[...]
#  650|         getBase(): [InOutExpr] &...
#  650|           getSubExpr(): [SubscriptExpr] ...[...]
#  650|             getBase(): [InOutExpr] &...
#  650|               getSubExpr(): [DeclRefExpr] matrix2

var arr3 = [1]
var arr4 = arr2 + arr3
sink(arg: arr3[0])
sink(arg: arr4[0]) // $ MISSING: flow=642
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a model of Array.+(_:_:) in ArraySummaries fix this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should.

sink(arg: arr4[0]) // $ MISSING: flow=642

var arr5 = Array(repeating: source(), count: 2)
sink(arg: arr5[0]) // $ MISSING: flow=654
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again it's possible a model of the initializer is all we need here. It can wait as follow-up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@geoffw0
Copy link
Copy Markdown
Contributor

geoffw0 commented Jul 19, 2023

DCA shows a 6.6x analysis slowdown, which hopefully @MathiasVP s suggestion will fix. We can tolerate some slowdown for an improvement like this.

@geoffw0
Copy link
Copy Markdown
Contributor

geoffw0 commented Jul 20, 2023

New DCA run LGTM. Not a hint of performance issues. Good work!

Comment on lines +483 to +484
component.getName() = "ArrayElement" and
content instanceof Content::ArrayContent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be pulled into its own predicate to follow the structure of parseField?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a considerably simpler case than parseField, I'd be happy either way TBH.

any(Argument arg | modifiable(arg)).getExpr(), any(MemberRefExpr ref).getBase(),
any(ApplyExpr apply).getQualifier(), any(TupleElementExpr te).getSubExpr()
any(ApplyExpr apply).getQualifier(), any(TupleElementExpr te).getSubExpr(),
any(SubscriptExpr se).getBase(), any(InOutExpr ioe).getSubExpr()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a better fix maybe be to hide InOutExpr from the AST entirely?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe - we might break some other things by doing that... I can try it though

Comment thread swift/ql/lib/codeql/swift/dataflow/internal/DataFlowPrivate.qll Outdated
Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment. We should probably also have a new DCA run to double check the impact of hiding InOutExprs from the main AST.

Comment thread swift/ql/lib/codeql/swift/controlflow/internal/ControlFlowGraphImpl.qll Outdated
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would be good to wait for a final 👍 from @MathiasVP as well.

@MathiasVP
Copy link
Copy Markdown
Contributor

MathiasVP commented Jul 31, 2023

LGTM. Would be good to wait for a final 👍 from @MathiasVP as well.

Would really like to have #13741 (comment) addressed as well before we merge. And also another DCA run afterwards.

@rdmarsh2
Copy link
Copy Markdown
Contributor Author

DCA looks clean

Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to merge (I keep forgetting I don't have array content already on main...)

Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP MathiasVP merged commit 89aa86a into github:main Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants