C++: add overflow detection to new range analysis#12599
C++: add overflow detection to new range analysis#12599MathiasVP merged 16 commits intogithub:mainfrom
Conversation
MathiasVP
left a comment
There was a problem hiding this comment.
I've done a first pass over the PR, but I think I'd like another round before I claim to fully understand what's going on. In the meantime, we should:
- Fix the merge conflicts. 700a566 makes me think some of the changes have already been merged into
main. - Check the performance on this. I suggest you follow the approach I had here: #11928 (comment)
| predicate typeBounds(SemType t, float lb, float ub) { | ||
| exists(SemIntegerType integralType, float limit | | ||
| integralType = t and limit = 2.pow(8 * integralType.getByteSize()) | ||
| | | ||
| if integralType instanceof SemBooleanType | ||
| then lb = 0 and ub = 1 | ||
| else | ||
| if integralType.isSigned() | ||
| then ( | ||
| lb = -(limit / 2) and ub = (limit / 2) - 1 | ||
| ) else ( | ||
| lb = 0 and ub = limit - 1 | ||
| ) | ||
| ) | ||
| or | ||
| // This covers all floating point types. The range is (-Inf, +Inf). | ||
| t instanceof SemFloatingPointType and lb = -(1.0 / 0.0) and ub = 1.0 / 0.0 | ||
| } |
There was a problem hiding this comment.
This looked super scary until I realized it was copy/pasted from SimpleRangeAnalysis.
| import cpp | ||
| import experimental.semmle.code.cpp.semantic.analysis.ModulusAnalysis | ||
| import experimental.semmle.code.cpp.semantic.Semantic | ||
| //import experimental.semmle.code.cpp.semantic.Semantic |
There was a problem hiding this comment.
Why is this commented out? Should we just delete it?
| signature module OverflowSig<DeltaSig D> { | ||
| predicate semExprDoesntOverflow(boolean positively, SemExpr expr); | ||
| } |
There was a problem hiding this comment.
I think this module signature (and its one predicate) could use some QLDoc.
There was a problem hiding this comment.
Nitpicking: could we also write the predicate as semExprDoesNotOverflow. It took me a while to parse Doesnt 😅
| typeBounds(expr.getSemType(), lb, ub) and | ||
| ConstantStage::initialBounded(expr, any(ConstantBounds::SemZeroBound b), delta, positively.booleanNot(), _, _, _) | ||
| | | ||
| positively = true and delta < ub | ||
| or | ||
| positively = false and delta > lb | ||
| ) |
There was a problem hiding this comment.
Just checking my understanding here: We're finding the minimum and maximum values from the the of expr, and then we're comparing those values with the bounds we can obtain from the constant stage.
I'm not sure I understand the logic, though. If we're trying to deduce that expr doesn't overflow positively, we're checking whether delta (which is the lower bound of expr) is less than the maximum value of the type? Shouldn't this be the other way around? Or have I confused myself?
| /* | ||
| predicate semExprOverflow(float delta, boolean upper, SemExpr expr) { | ||
| exists(float lb, float ub | typeBounds(expr.getSemType(), lb, ub) | | ||
| upper = false and delta < lb | ||
| or | ||
| upper = true and delta > ub | ||
| ) | ||
| } | ||
| */ |
| * due to a conversion. | ||
| */ | ||
| predicate exprMightOverflowPositively(Expr expr) { none() } | ||
| predicate exprMightOverflowPositively(Expr expr) { upperBound(expr) > exprMaxVal(expr) } |
There was a problem hiding this comment.
I was confused about this for a second: we only generate bounds for expressions that won't overflow (due to the disjunction in the bounded predicate), but here it looks like we're looking for expressions that actually do have an overflowing upper bound. However, this works because upperBound is defined to only use the constant stage of range-analysis which doesn't (and can't) have that disjunction that rules out overflowing expressions. Is this correct?
Could we write this explicitly in a comment here, or add QLDoc to upperBound and lowerBound to make this clear?
| int total = 0; | ||
| for (i = 0; i < 2; i = i+1) { | ||
| range(i); // $ range=<=1 range=>=0 | ||
| range(i); // $ range=<=1 MISSING: range=>=0 |
There was a problem hiding this comment.
Are we losing these results because we're now claiming that i+1 on line 41 overflows?
In general: Should we add an InlineExpectationsTest that highlights all the places where we're claiming an overflow might occur. This would give me a lot more confidence in what's actually going on here.
3c49432 to
204dbee
Compare
| positively = false and | ||
| ( | ||
| expr.getOpcode() instanceof Opcode::Negate or | ||
| expr.getOpcode() instanceof Opcode::SubOne | ||
| ) | ||
| or | ||
| positively = false and | ||
| expr.(SemDivExpr).getSemType() instanceof SemFloatingPointType |
There was a problem hiding this comment.
| positively = false and | |
| ( | |
| expr.getOpcode() instanceof Opcode::Negate or | |
| expr.getOpcode() instanceof Opcode::SubOne | |
| ) | |
| or | |
| positively = false and | |
| expr.(SemDivExpr).getSemType() instanceof SemFloatingPointType | |
| positively = false and | |
| ( | |
| expr.getOpcode() instanceof Opcode::Negate or | |
| expr.getOpcode() instanceof Opcode::SubOne or | |
| expr.(SemDivExpr).getSemType() instanceof SemFloatingPointType | |
| ) |
MathiasVP
left a comment
There was a problem hiding this comment.
Thanks for pushing through with this, Robert! I have a few comments, but it's mostly just to help my understanding.
Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>
|
DCA is showing a really bad join in |
Before:
```
Tuple counts for RangeAnalysisStage#38d7ce80::RangeStage#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisImpl#edd69a76::ConstantBounds#FloatDelta#0eab55d1::FloatOverflow#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::potentiallyOverflowingExpr#2#ff/2@36ed7auu after 42.1s:
365 ~0% {2} r1 = JOIN num#SemanticOpcode#e6f455a5::TNegate#f WITH SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs ON FIRST 1 OUTPUT false, Rhs.1 'expr'
0 ~0% {2} r2 = JOIN num#SemanticOpcode#e6f455a5::TSubOne#f WITH SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs ON FIRST 1 OUTPUT false, Rhs.1 'expr'
365 ~0% {2} r3 = r1 UNION r2
0 ~0% {2} r4 = JOIN num#SemanticOpcode#e6f455a5::TAddOne#f WITH project#SemanticExpr#91573b9a::SemKnownExpr#class#fff_10#join_rhs ON FIRST 1 OUTPUT true, Rhs.1 'expr'
2 ~0% {2} r5 = JOIN m#RangeAnalysisStage#38d7ce80::RangeStage#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisImpl#edd69a76::ConstantBounds#FloatDelta#0eab55d1::FloatOverflow#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::boundedPhiInp1#6#ffbfff WITH num#SemanticOpcode#e6f455a5::TMul#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'positively'
22026 ~0% {2} r6 = JOIN r5 WITH SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'positively', Rhs.1 'expr'
2 ~0% {2} r7 = JOIN m#RangeAnalysisStage#38d7ce80::RangeStage#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisImpl#edd69a76::ConstantBounds#FloatDelta#0eab55d1::FloatOverflow#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::boundedPhiInp1#6#ffbfff WITH num#SemanticOpcode#e6f455a5::TShiftLeft#f CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 'positively'
1978 ~0% {2} r8 = JOIN r7 WITH SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'positively', Rhs.1 'expr'
24004 ~0% {2} r9 = r6 UNION r8
24004 ~0% {2} r10 = r4 UNION r9
24369 ~0% {2} r11 = r3 UNION r10
2726 ~1% {2} r12 = JOIN project#SemanticExpr#91573b9a::SemDivExpr#fffff WITH project#SemanticExpr#91573b9a::SemKnownExpr#class#fff#2 ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'expr'
1900 ~2% {2} r13 = JOIN r12 WITH SemanticType#3725723c::SemFloatingPointType#ff ON FIRST 1 OUTPUT false, Lhs.1 'expr'
4500 ~0% {1} r14 = JOIN SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs WITH num#SemanticOpcode#e6f455a5::TAdd#f ON FIRST 1 OUTPUT Lhs.1 'expr'
0 ~0% {1} r15 = JOIN SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs WITH num#SemanticOpcode#e6f455a5::TPointerAdd#f ON FIRST 1 OUTPUT Lhs.1 'expr'
4500 ~0% {1} r16 = r14 UNION r15
4000 ~0% {2} r17 = JOIN r16 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff#2 ON FIRST 1 OUTPUT Rhs.1, Lhs.0 'expr'
7000 ~0% {2} r18 = JOIN r17 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'expr'
1264 ~0% {2} r19 = JOIN r18 WITH num#Sign#2ecc774b::TNeg#f ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'expr'
188324151 ~0% {2} r20 = JOIN r19 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'expr', Rhs.1
1000 ~0% {2} r21 = JOIN r20 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff ON FIRST 2 OUTPUT false, Lhs.0 'expr'
2900 ~0% {2} r22 = r13 UNION r21
3259 ~2% {2} r23 = JOIN r18 WITH num#Sign#2ecc774b::TPos#f ON FIRST 1 OUTPUT Lhs.0, Lhs.1 'expr'
1521124720 ~0% {2} r24 = JOIN r23 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1 'expr', Rhs.1
3000 ~2% {2} r25 = JOIN r24 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff ON FIRST 2 OUTPUT true, Lhs.0 'expr'
```
(I stopped evaluation midway.)
After:
```ql
Evaluated relational algebra for predicate RangeAnalysisStage#38d7ce80::RangeStage#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisImpl#edd69a76::ConstantBounds#FloatDelta#0eab55d1::FloatOverflow#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::potentiallyOverflowingExpr#2#ff@dc3a0712 with tuple counts:
26269 ~2% {2} r1 = _SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs_m#RangeAnalysisStage#38d7ce80::R__#shared UNION _SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs_SemanticType#3725723c::SemFloati__#shared
26269 ~2% {2} r2 = _num#SemanticOpcode#e6f455a5::TAddOne#f_project#SemanticExpr#91573b9a::SemKnownExpr#class#fff_10#joi__#shared UNION r1
41333 ~1% {2} r3 = JOIN _SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs_num#SemanticOpcode#e6f455a5::TAd__#shared WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
5806 ~2% {2} r4 = JOIN r3 WITH num#Sign#2ecc774b::TNeg#f ON FIRST 1 OUTPUT Lhs.1, Lhs.0
5806 ~1% {3} r5 = JOIN r4 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff#2 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0
3612 ~0% {2} r6 = JOIN r5 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 2 OUTPUT false, Lhs.2
18476 ~1% {2} r7 = JOIN r3 WITH num#Sign#2ecc774b::TPos#f ON FIRST 1 OUTPUT Lhs.1, Lhs.0
18476 ~1% {3} r8 = JOIN r7 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff#2 ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0
18444 ~2% {2} r9 = JOIN r8 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 2 OUTPUT true, Lhs.2
22056 ~0% {2} r10 = r6 UNION r9
24137 ~2% {2} r11 = JOIN _SemanticExpr#91573b9a::SemExpr::getOpcode#0#dispred#fb_10#join_rhs_num#SemanticOpcode#e6f455a5::TPo__#shared WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
16966 ~2% {1} r12 = JOIN r11 WITH num#Sign#2ecc774b::TPos#f ON FIRST 1 OUTPUT Lhs.1
16966 ~4% {2} r13 = JOIN r12 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff ON FIRST 1 OUTPUT Rhs.1, Lhs.0
24917 ~1% {2} r14 = JOIN r13 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
2781 ~0% {2} r15 = JOIN r14 WITH num#Sign#2ecc774b::TNeg#f ON FIRST 1 OUTPUT false, Lhs.1
2817 ~0% {1} r16 = JOIN r11 WITH num#Sign#2ecc774b::TNeg#f ON FIRST 1 OUTPUT Lhs.1
2817 ~0% {2} r17 = JOIN r16 WITH project#SemanticExpr#91573b9a::SemBinaryExpr#fffff ON FIRST 1 OUTPUT Rhs.1, Lhs.0
6922 ~0% {2} r18 = JOIN r17 WITH SignAnalysisCommon#4b1623af::SignAnalysis#FloatDelta#0eab55d1::FloatDelta#RangeUtils#6da26777::RangeUtil#FloatDelta#0eab55d1::FloatDelta#RangeAnalysisConstantSpecific#878f81e8::CppLangImplConstant##::semExprSign#1#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1
2765 ~1% {2} r19 = JOIN r18 WITH num#Sign#2ecc774b::TPos#f ON FIRST 1 OUTPUT true, Lhs.1
5546 ~0% {2} r20 = r15 UNION r19
27602 ~0% {2} r21 = r10 UNION r20
53871 ~0% {2} r22 = r2 UNION r21
return r22
```
|
@rdmarsh2 I've pushed a join-order fix, and I'll restart DCA. |
No description provided.