Skip to content

C++: Fix indirect taint#14571

Merged
jketema merged 4 commits intogithub:mainfrom
MathiasVP:fix-indirect-taint
Oct 24, 2023
Merged

C++: Fix indirect taint#14571
jketema merged 4 commits intogithub:mainfrom
MathiasVP:fix-indirect-taint

Conversation

@MathiasVP
Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP commented Oct 23, 2023

The motivation for this PR was that we failed to track this flow when the source is whatever indirect_source() points to:

void test_indirect_taint() {
  std::map<int, int*> m;
  int* p = indirect_source();
  m[1] = p;
  int* q = m[1];
  sink(q); // $ MISSING: ast ir
}

The fix was in our interpretation of models (in ModelUtil.qll) so that the model for operator[] correctly carried indirect flow from *p to m (this causes pointer/pointee conflation but since these are taint models that's fine).

Finally, fixing the interpretation of models revealed a missing flow step for smart pointers. Specifically, we didn't model that operator-> and get performed a load.

…truction()' on line 81 wasn't necessarily a 'CallInstruction' (it could be a conversion).
on smart pointers perform a load.
@MathiasVP MathiasVP requested a review from a team as a code owner October 23, 2023 19:13
@github-actions github-actions Bot added the C++ label Oct 23, 2023
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Oct 23, 2023
@MathiasVP MathiasVP force-pushed the fix-indirect-taint branch 2 times, most recently from d79ffe7 to 67ed12c Compare October 24, 2023 07:50
@MathiasVP MathiasVP requested a review from jketema October 24, 2023 07:50
@MathiasVP
Copy link
Copy Markdown
Contributor Author

MathiasVP commented Oct 24, 2023

@jketema I've assigned you as a reviewer since it fixes a problem you've observed. Let me know if you need any assistance on the reviewing side 🙇!

(the force-push was just me messing around - it's the same code that was tested on DCA)

Comment on lines +49 to +51
result.(IndirectArgumentOutNode).getIndirectionIndex() = indirectionIndex - 1 and
result.(IndirectArgumentOutNode).getCallInstruction() = call and
output.isParameterDerefOrQualifierObject(index, indirectionIndex)
output.isParameterDerefOrQualifierObject(index, indirectionIndex - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the subtractions needed here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not quite happy with that. The indirection index of IndirectArgumentOutNode is defined to be [1 .. n] (up to some number of indirections), but in the caller of this predicate (callOutput/3) the not exists(getIndirectReturnOutNode(call, indirectionIndex + d)) predicate expects the indirection index to be defined [0 .. n - 1] (because that's the indirection index range of IndirectReturnOutNode since it can be both a "direct return" with indirection index being 0, or an indirect return with indirection index being >= 1).

So while it's very difficult to reason about this indirection index, it's very easy to see when they're wrong since any other offsets added here breaks some of our tests 😂. I'd like to clean up this so that we don't need these subtractions at some point, but I'd rather not add too much more onto this PR.

Comment thread cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/ModelUtil.qll Outdated
Comment on lines -88 to -92
// The side effect of a call on the value pointed to by an argument or qualifier
exists(Operand operand, int indirectionIndex |
Ssa::outNodeHasAddressAndIndex(n, operand, indirectionIndex) and
Ssa::outNodeHasAddressAndIndex(result, operand, indirectionIndex + d)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this no longer needed?

Copy link
Copy Markdown
Contributor Author

@MathiasVP MathiasVP Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled by the result = callOutputWithIndirectionIndex(call, output, indirectionIndex + d) line because callOutputWithIndirectionIndex has a case for IndirectArgumentOutNode. So when we have:

n = callOutputWithIndirectionIndex(call, output, indirectionIndex) and
result = callOutputWithIndirectionIndex(call, output, indirectionIndex + d)

in callOutput, we get an equivalent condition.

Does that make sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially. My main problem is, that it's rather hard for me to map the old code on the new code, because it uses different predicates.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not an easy commit to review as it's not a clear line-for-line mapping 😢. The main point of the commit is to get an indirection index out of callOutput/2 (which has been renamed to callOutputWithIndirectionIndex) so that we can compute indirectionIndex + d on line 92 to "increment" the number of indirections of the resulting node by d.

@jketema jketema merged commit ba67217 into github:main Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants