Data flow: Performance improvements#14255
Conversation
0c6b2eb to
3499f59
Compare
3499f59 to
ab182bc
Compare
ab182bc to
975243f
Compare
12073f7 to
bce66d8
Compare
|
Python evaluation looks quite acceptable (no real change). 👍 |
|
|
||
| pragma[nomagic] | ||
| private predicate dataFlowTakenCallEdgeIn0( | ||
| DataFlowCall call, DataFlowCallable c, ParamNodeEx p, FlowState state, Cc innercc, |
There was a problem hiding this comment.
Might as well be CcCall innercc for clarity, right?
| } | ||
|
|
||
| pragma[nomagic] | ||
| private predicate fwdFlowOut( |
There was a problem hiding this comment.
The introduction of this projection seems superfluous. It's only used once in fwdFlow0 and there its occurrence is as a trivial disjunct, so doing the projection there instead as we did before will be exactly the same work, except avoiding the additional materialisation.
There was a problem hiding this comment.
Oh, wait, the wider version is now inlined. Hmm, but that still shouldn't matter, right? Dropping this predicate will just postpone the projection and dedup to fwdFlow0 where the cost will be the same, but will save this materialisation.
There was a problem hiding this comment.
I think it should be OK to inline; only reason I nomagiced it is because it is sometimes convenient to be able to spot in the timing logs whether it is, say, flow out that is expensive. But I'll inline it.
| fwdFlowOutCand(call, ret, innercc, inner, out, apa, allowsFieldFlow) and | ||
| FwdTypeFlow::typeFlowValidEdgeOut(call, inner) and |
There was a problem hiding this comment.
Right, so just double-checking: You're choosing to make this non-linear join smaller by re-joining with fwdFlowIntoRet in a subsequent and additional non-linear join? It's a bit counter-intuitive at first, but presumably that's worth it?
| } | ||
| /** | ||
| * Exposes the inlined predicate `fwdFlowIn`, which is used to calculate both | ||
| * flow in and flow through. |
There was a problem hiding this comment.
So this means that the contents of this module is calculated in both contexts where one is essentially is a subset of the other. Hmm. All for the purpose of being able to project slightly earlier. This trade-off sounds dubious.
There was a problem hiding this comment.
This is the change that matters most to DB that I used to construct the performance numbers mentioned in the PR description. I reverified this by nomagicing the inlined fwdFlowIn, which results in
4m56s | 426 | 13.7s @ 223 | DataFlowImpl#248dabc3::MakeImpl#DataFlowImplSpecific#21008cd7::RubyDataFlow#::Impl#DataFlow#167ac380::DataFlowMake#DataFlowImplSpecific#21008cd7::RubyDataFlow#::Global#XSS#e59174e9::OrmTracking::Config#::C#::MkStage#Stage2#::Stage#Stage3Param#::FwdFlowIn#FwdFlowInNoRestriction#::fwdFlowIn#13#fffffffffffff@ae640x39
aschackmull
left a comment
There was a problem hiding this comment.
There are som obvious improvements here. Primarily I see the pruning of the call-context-dependent dispatch relation with the call edges from the prior stage as a likely important improvement. But there's also some changes that are harder to judge whether they are improvements or not. The overall picture appears to be a clear speedup, and the semantics appear to be correct, so I'm happy to merge as-is and then we can always iterate on the more dubious details in follow-up work.
4cf8eaf to
4fa93a0
Compare
| */ | ||
| private module FwdFlowIn<FwdFlowInInputSig I> { | ||
| pragma[nomagic] | ||
| private predicate callEdgeArgParamRestricted( |
Check warning
Code scanning / CodeQL
Candidate predicate not marked as `nomagic`
fwdFlowInpredicate into two predicates (using a parameterized module); one which is used to calculate flow in and one which is used to calculate flow through (fwdFlowIsEntered). This allows us to get rid of a lot of columns in the first predicate, specifically we don't need to record the argument that flows into the parameter. The second predicate does need to record the argument, however, we can restrict this to arguments/parameters that may actually flow through.fwdFlowIntoArg,fwdFlowIntoRet,dataFlowTakenCallEdgeIn0,fwdFlow1Param,dataFlowTakenCallEdgeOut0, andfwdFlow1Out.pathIntoCallable0by joining with a pruned dispatch relation.Timings before
Timings after