Skip to content

C++: include stack-allocated arrays in off-by-one query#13116

Merged
rdmarsh2 merged 7 commits intogithub:mainfrom
rdmarsh2:rdmarsh2/cpp/cobo-array-vars
Jun 26, 2023
Merged

C++: include stack-allocated arrays in off-by-one query#13116
rdmarsh2 merged 7 commits intogithub:mainfrom
rdmarsh2:rdmarsh2/cpp/cobo-array-vars

Conversation

@rdmarsh2
Copy link
Copy Markdown
Contributor

Depends on #13045. Only a96cc6d is new.

Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a96cc6d looks good to me (with the exception of a small comment which will likely be resolved in the base PR anyway).

module ArrayAddressToDerefConfig implements DataFlow::StateConfigSig {
newtype FlowState =
additional TArray(Field f) or
additional TArray(Variable v) or
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in #13045, we can probably restrict these variables to be the ones satisfying pointerArithOverflow(_, v, _, _, _).

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/cpp/cobo-array-vars branch from a96cc6d to 6e230e1 Compare May 26, 2023 17:05
Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM once:

  • We have a happy DCA run, and
  • We've addressed the two Omittable 'exists' variable alerts. I think they highlight the possibility of a nice cleanup.

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/cpp/cobo-array-vars branch from d991181 to c9c93ca Compare June 1, 2023 16:52
@MathiasVP MathiasVP force-pushed the rdmarsh2/cpp/cobo-array-vars branch from 61b07b0 to e32f7d8 Compare June 24, 2023 23:36
@MathiasVP
Copy link
Copy Markdown
Contributor

MathiasVP commented Jun 26, 2023

I'm looking at results now, and I'll update this message as I go through them.

Done. See analysis below. The TLDR is: Samate is very happy with these (on main this query finds 0 Samate results, and with this PR we find 556 TPs), and all the other projects have a few that I'd classify as TPs, but this also very much exposes the need for some kind of barrier in this query (which we should do as a follow-up).

Samate

All the new results are classified as TPs in Samate 🎉.

PHP

Lots of new results (the query had 0 results on this project before), and now has 1952 results. The majority of results are in an unrolled implementation of md5, which I believe are all FPs. However, there are some results in this file that looks suspicious. For example here:

char trypath[MAXPATHLEN];
/* ... */
if (filename_length > (MAXPATHLEN - 2) || len > MAXPATHLEN || len + 1 + filename_length + 1 >= MAXPATHLEN) {
  break;
}
memcpy(trypath, ptr, len);
trypath[len] = '/';

ImageMagick

As PHP, we didn't have any results on this project before, but now we have ~3k results. The ones I've checked so far as been FPs like:

status=FunctionImage(image,PolynomialFunction,2,coefficients,exception);

(notice the 2!). This calls

MagickExport MagickBooleanType FunctionImage(Image *image,
  const MagickFunction function,const size_t number_parameters,
  const double *parameters,ExceptionInfo *exception)

where we flow to

q[i]=ApplyFunction(q[i],function,number_parameters,parameters, exception);

which calls

static Quantum ApplyFunction(Quantum pixel,const MagickFunction function,
  const size_t number_parameters,const double *parameters,
  ExceptionInfo *exception)

which has code that looks like:

width=(number_parameters >= 1) ? parameters[0] : 1.0;
center=(number_parameters >= 2) ? parameters[1] : 0.5;
range=(number_parameters >= 3) ? parameters[2] : 1.0; // <--- We raise an alert here.
bias=(number_parameters >= 4) ? parameters[3] : 0.5;

But this is infeasible because number_parameters is equal to 2 (which we got from the very outermost call). So this FP will be super difficult to get rid of properly since we don't our call contexts are at most 1 level deep.

Wireshark

We went from 10 to 450 results. Most of these new results actually looks like TPs to me. For example, this snippet looks kinda suspicious:

if( length < sizeof name + 1 )
{
  name[ length++ ] = (gchar) c;
}

irregardless of what kind of array name is (it's a gchar name[ TEXT_BUFFER_SIZE ]; in my database).

Similarly, this alert also looks super suspicious:

static guint cid_adjust[MAX_CID];
/* ... */
for (i = 0; i < MAX_CID; i++) {
  /* ... */
}
/* ... */
cid_index = i;
/* ... */
cid_adjust[cid_index] += cid_vernier[cid_index]; // <-- alert here.

OpenJDK

Lots of new results. Before we had 10 and now we have 290. There are some FPs related to (once again) us not realizing that a path that crosses a function boundary is infeasible. For example here:

double y[2];
/* ... */
n  =  __kernel_rem_pio2(tx,y,e0,nx,2,two_over_pi);

which flows to:

static int __kernel_rem_pio2(double *x, double *y, int e0, int nx, int prec, const int *ipio2) {

which flows to:

  switch(prec) {
  case 0:
    ...
    break;
  case 1:
  case 2:
    ...
    break;
  case 3:
    ...
    y[2] =  fw; // <-- alert here
  }

so detecting that this is safe would just involve a call context of depth 1 (which the dataflow library already supports). But we haven't implemented the required predicate for C/C++: https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll#L812

Vim

We go from ~70 results to ~380 results. The majority of them are code like this alert that start here:

typval_T argv[2];
/* ... */
if (eval_expr_typval(expr, argv, 1, &rettv) == FAIL)

which flows to:

    int
eval_expr_typval(typval_T *expr, typval_T *argv, int argc, typval_T *rettv)
{

which passes down argc and argv a few calls down, where we reach argvars (and where we know, but the query doesn't, that argcount is 2):

    int
call_internal_method(
	char_u	    *name,
	int	    argcount,
	typval_T    *argvars,
	typval_T    *rettv,
	typval_T    *basetv)
{

which finally flows to

for (i = 2; i < argcount; ++i)
	    argv[i + 1] = argvars[i];
    }
    else if (global_functions[fi].f_argtype == FEARG_4)
    {
	// base value goes fourth
	argv[0] = argvars[0];
	argv[1] = argvars[1];
	argv[2] = argvars[2]; // <-- alert here.

but that's infeasible because argcount is 2, so the loop body isn't entered.

Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this so that @rdmarsh2 can merge the PR once he's looked at my changes (i.e., e32f7d8 and onwards) and the analysis I did of the DCA results.

@rdmarsh2
Copy link
Copy Markdown
Contributor Author

I'm a little surprised e32f7d8 is a performance win, but DCA is definitely happier afterwards. LGTM

@rdmarsh2 rdmarsh2 merged commit 757f40c into github:main Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants