Skip to content

Refactor local query evaluation to prepare for debug adapter#2244

Merged
dbartol merged 23 commits intomainfrom
dbartol/debug-context
Mar 30, 2023
Merged

Refactor local query evaluation to prepare for debug adapter#2244
dbartol merged 23 commits intomainfrom
dbartol/debug-context

Conversation

@dbartol
Copy link
Copy Markdown
Contributor

@dbartol dbartol commented Mar 27, 2023

Review tip: This PR will probably make more sense if you step through the code. I suggest that you do so after you've read the PR description, but before you try to make sense of the diffs. Set a breakpoint at compileAndRunQuery(), select the Run query against a single database... command, and follow along.

This PR refactors our existing code path for local query evaluation so that it can be used by the CodeQL Debug Adapter, which will be added in an upcoming PR.

Our existing code path for local query evaluation goes through several layers of code, but there is no clear separation of concerns between those layers. As one extreme example, the LocalQueryInfo object, which handles the UI presentation of the query evaluation in the query history view, is passed all the way down the same function that directly invokes the query server. The same goes for the InitialQueryInfo object, which has a couple properties that only apply to the UI. The QueryEvaluationInfo object, also passed all the way down, mixed some very handy functions for determining the paths to various files in the query's output directory with a bunch of properties and functions used in the UI, and even serialized to disk for query history persistence. The same goes for DatabaseItem.

The separate code paths to handle the new query server vs. the legacy query server has quite a bit of duplicated and almost-duplicated code, which, due to the abovementioned comingling of concerns, was difficult to factor out.

The comingling of UI and non-UI responsibilities also makes it more difficult for the code paths that invoke the query server but do not populate the query history (example: AST generation) to evaluate a query, because they have to create extra UI-related state that will never be used.

The changes in this PR thoroughly refactor the whole code path to separate concerns. Specifically, the QueryRunner abstract class and everything below it are now focused on the mechanics of the actual evaluation via the query server, with no connection to the UI other than a progress callback and a Logger inteface. Everything else, including query history UI, timestamp file generation, and evaluator log summarization, has been hoisted out of QueryRunner and its implementations.

Details

The two implementations of QueryRunner now focus solely on evaluating the query. In particular, the implementation of compileAndRunQueryAgainstDatabaseCore() for the new query server does nothing but set up the message to send to the query server, send the message, and then return the results. The implementation for the legacy query server does much more work, but only because the legacy query server interface requires it to. The evaluation is initialized by calling createQueryRun() on QueryRunner. This holds information that will be needed even before the evaluation completes, like the path to the evaluation's output directory. The actual evaluation is invoked by calling evaluate() on the CoreQueryRun object, which will evantually return the evaluation results.

The interface to QueryRunner now defines its own minimal interfaces to specify the arguments to the evaluation (query, database, additional search paths) and the results of the evaluation, rather than reusing the UI-related types from the higher layers.

I left the layout and functionality of most of the UI-related types (LocalQueryInfo, etc.) alone, because those are serialized directly to disk and deserialized from JSON by slapping a new prototype on the raw JSON object. We could do better than this, but improving this wasn't necessary for this PR.

I factored all the properties of QueryEvaluationInfo that just computed output file paths out into a new base class QueryOutputDir. This new class is now used throughout the lower-level code to find paths, without depending on other state that is only necessary for results interpretation.

The top-level command handling for the various Run query... commands had recently been factored out into a separate local-queries.ts file, instead of being a bunch of ugly nested functions within the extension initialization code (thanks, @koesie10!). I've converted that file to a full on LocalQueries service, in the pattern used by most of our other components. All of the UI-related code that used to be scattered throughout the query evaluation code path is now here. All of the UI setup that's done before query evaluation starts (picking an output directory, creating a cancellation token source, etc.) is done via createLocalQueryRun(), and all of the post-evaluation updates are done by calling complete() on the LocalQueryRun object. As the comment in the code indicates, this looks a little clunky as-is, but it will be necessary when we try to hook it up to the debug adapter.

The diffs in local-queries.ts look scarier than they are. I didn't actually change any of the commands except to put them into the new LocalQueries class. They all just go through compileAndRunQuery(), which is where the real changes are.

The AST and Go To Reference code now talks directly to the QueryRunner without having to manufacture the various UI-related state that it doesn't need.

I separated the existing Logger interface into BaseLogger, with just the log() function, and Logger, which adds the show() function. Code that only generates log messages needs only BaseLogger. The distinction isn't critical in this PR, but becomes more important when I add the debug adapter, which will need a BaseLogger that routes the messages over the Debug Adapter Protocol, but has no way to implement the show() function.

@dbartol dbartol added code cleanup Not directly user-visible code improvements Complexity: High Requires careful design or review. tech-debt labels Mar 27, 2023
@dbartol dbartol marked this pull request as ready for review March 27, 2023 17:44
@dbartol dbartol requested a review from a team as a code owner March 27, 2023 17:44
@dbartol dbartol requested a review from robertbrignull March 28, 2023 14:47
Copy link
Copy Markdown
Member

@koesie10 koesie10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only done a partial review of the code by looking at the diff, but haven't matched the old code to the new code. These are just some things that I spotted while looking through it.

}

public getCommands(): LocalQueryCommands {
const runQuery = async (uri: Uri | undefined) =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these functions to be methods on the LocalQueries class (like in other classes where we use .bind(this))? That would reduce the nesting in this method which makes this easier to read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened as separate PR, #2245, to avoid making this PR's diffs even more confusing.

Comment on lines +124 to +128
evaluate: async (
progress: ProgressCallback,
token: CancellationToken,
logger: BaseLogger,
): Promise<CoreCompletedQuery> => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it actually be possible to make this a method on this class instead of having it returned as part of the interface? That would make it easier to deal with this result in other places and makes it easier to follow where this class is actually used (since it's still used implicitly when calling evaluate).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluate() needs to use several of the parameters already passed to createQueryRun(), so one way or another createQueryRun() has to return a new object to hold that state. I could make CoreQueryRun into a class, and make the state capture explicit. Would that make things more clear overall? Or are you suggesting that evaluate() be a member of QueryRunner but take a CoreQueryRun as a parameter?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also was going to make a comment asking whether the evaluate method needs to live on this interface. It's a bit of a shame that we then have to explicitly omit it when defining the CoreCompletedQuery interface.

I note there's only one implementation, but I see what you mean about it needing access to state available in createQueryRun. But in the two places where we call evaluate, we also called createQueryRun earlier in that same method, so the state would still be there and could be passed into evaluate as well / instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's not apparent in this PR, my motivation for having CoreQueryRun as an object with an evaluate() function, rather than having two different functions on QueryRunner that require several common parameters, was that the debug adapter needs to initialize an evaluation in one function, but then later await it in another function. A CoreQueryRun object seemed like the natural way to encapsulate everything the debug adapter would need to later evaluate the query. I'd rather not have to have the debugger capture multiple parameters from the createQueryRun() method just to pass them to a later evaluateQueryRun() method.

The real problem here is that I want a function that evaluates a query, which is asynchronous, and so would normally return a Promise<CoreQueryResults>. However, just starting the evaluation produces some properties, like outputDir, that I'll need to use before the promise completes. I could just have a single runQuery() function on QueryRunner, and have it return something like Promise<CoreQueryResult> & CoreQueryRun (note that CoreQueryRun is not inside the angle brackets). I'm pretty sure I've seen this done somewhere before. I'm not sure if other developers would find it more or less odd than the separate evaluate() function I have now.

I suppose I could just replace the evaluate property in CoreQueryRun with results: Promise<CoreQueryResults>. That seems less confusing than anything I suggest above, and also less confusing than what I originally wrote.

}
}

export class QueryEvaluationInfo extends QueryOutputDir {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the output dirs a property on this class instead of having it extend it? It seems like the query evaluation info isn't really an instance of the output dir, but does use the output dir.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I would do just that, but see the comment I just added for why we can't in this case.

break;
case messages.QueryResultType.TIMEOUT:
// This is the only legacy result type that doesn't exist for the new query server. Format the
// messasge here, and let the later code treat is as `OTHER_ERROR`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// messasge here, and let the later code treat is as `OTHER_ERROR`.
// message here, and let the later code treat is as `OTHER_ERROR`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant to s/is/it/, right? Fixed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, haha, the VS Code PR message UI doesn't show the diff between the original text and your suggestion, so I totally missed the typo you were fixing, and fixed another typo on the same line instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (both)

Comment on lines 448 to 450
return {
query: query.queryEvalInfo,
message,
result,
successful: result.resultType === messages.QueryResultType.SUCCESS,
logFileLocation: result.logFileLocation,
...translateLegacyResult(result),
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would return translateLegacyResult(result); also work here without the object spreading?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, weird, must be an artifact of how my changes evolved. Fixed.

* This function must be called when the evaluation completes, whether the evaluation was
* successful or not.
* */
public async complete(results: CoreQueryResults): Promise<void> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think another name would cover the intended functionality of this method better? Something like saveAndShowQueryResults or similar? complete seems quite abstract and generic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually going for "abstract and generic":) My idea was that code that invokes a local query run just knows that it has to call complete() when it's done, and it's up to complete() to do whatever it is that the UI needs to do when the evaluation completes. Today, that's "save and show query results", but it's up to local-queries.ts to decide what to do. Think of it like an onLocalRunCompleted event, where the handler chooses to mark the local query item as completed and show the results view.

): Promise<LocalQueryRun> {
await createTimestampFile(outputDir.querySaveDir);

if (this.queryRunner.customLogDirectory) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have this check both in this place and in the legacy query server? Is there a way to get to the legacy query server without going through this code path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've removed redundant one in the legacy code path.

Copy link
Copy Markdown
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done one pass through, but it's a big and complex PR so I haven't really understood everything. I've made some surface-level comments on things I noticed.

Overall I think it looks like a very good direction and I don't have any huge concerns. Decoupling the UI code from the backend execution code will be a massive help. It's just quite complex so it's hard to see the full effect of the changes.

);
void extLogger.log(
`Running contextual query ${query}; results will be stored in ${queryStorageDir}`,
`Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`,
`Running contextual query ${query}; results will be stored in ${queryRun.outputDir.querySaveDir}`,

Otherwise I think this will print the classic [Object object]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +181 to +187
const message = results.message
? redactableError`${results.message}`
: redactableError`Failed to run query`;
void extLogger.log(message.fullMessage);
void showAndLogExceptionWithTelemetry(
redactableError`Failed to run query: ${message}`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const message = results.message
? redactableError`${results.message}`
: redactableError`Failed to run query`;
void extLogger.log(message.fullMessage);
void showAndLogExceptionWithTelemetry(
redactableError`Failed to run query: ${message}`,
);
const message = results.message
? redactableError`Failed to run query: ${results.message}`
: redactableError`Failed to run query`;
void showAndLogExceptionWithTelemetry(message);

To avoid an error message of "Failed to run query: Failed to run query". Also, the showAndLogExceptionWithTelemetry method does log to extLogger internally, so logging the full message separately shouldn't be needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +212 to +214
public readonly queryHistoryManager: QueryHistoryManager,
private readonly databaseManager: DatabaseManager,
public readonly cliServer: CodeQLCliServer,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's ideal that LocalQueries and LocalQueryRun each have public members that each other access. I think it shows that things could be more encapsulated, or we're storing references that should instead be passed in when needed. What do you think? Unfortunately I haven't dug deep enough to have any concrete suggestions at this time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

I've just added the two services that LocalQueryRun needs as constructor parameters to LocalQueryRun, instead of having them exposed from LocalQueries. LocalQueryRun still keeps a reference to the LocalQueries service itself, so that it can call showResultsForCompletedQuery(), but that's just a public function on LocalQueries that is also used by other services, so I don't think that's a problem.

I've also made queryInfo private on LocalQueryRun. LocalQueries was only using it on the path where evaluation throws an exception instead of returning an "unsuccessful" result object. I've encapsulated that in the new fail() function on LocalQueryRun, analogous to the existing complete() function. I've also added a comment about how we have two different error paths for some reason.

Comment thread extensions/ql-vscode/src/run-queries-shared.ts Outdated
}
let jsonSummary: string | undefined = undefined;
let summarySymbols: string | undefined = undefined;
if (isCanary()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but I'm surprised this feature is still behind a canary flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still under the canary flag because we're not ready to support it for external users. It's been working fine, but performance debugging is still an internal-only feature for now.


/**
* Calls the appropriate CLI command to generate a human-readable log summary.
* @param qs The query server client.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param qs The query server client.
* @param cliServer The cli server client.

import { extLogger, Logger } from "../common";
import * as messages from "../pure/legacy-messages";
import { InitialQueryInfo, LocalQueryInfo } from "../query-results";
import * as newMessages from "../pure/new-messages";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of importing from newMessages here, can you put the shared messages in pure/messages-shared?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only symbols we reference via newMessages are the QueryResultType status codes. We translate between the legacy ones and the new ones, and expose only the new ones via the QueryRunner.

dbItem.name
}): their target languages are different. Please select a different database and try again.`,
query.queryPath,
)} cannot be run against the selected database: their target languages are different. Please select a different database and try again.`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Would be nice to continue using the database name here, but not sure if that value is available with this refactoring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were the new query server, I probably would have put more effort into that. Since it's just for the legacy server, it didn't seem worthwhile.

Comment on lines +494 to +497
// It's odd that we have two different ways for a query evaluation to fail: by throwing an
// exception, and by returning a result with a failure code. This is how the code worked
// before the refactoring, so it's been preserved, but we should probably figure out how
// to unify both error handling paths.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because it depends on where the error happens. If the error happens in the query server, then we get a result with a failure code. It can also happen that the error happens in the IDE, in which case, we need to handle the thrown exception.

It would be nice to figure out a way to unify.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably easy enough to unify, but I didn't want to attempt that in this PR.

progress,
token,
item.databaseItem,
// If possible, only show databases with the right language (otherwise show all databases).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment incorrect? It looks like we show an error if there are no suitable databases found. I'm not sure we want to show all databases.

Copy link
Copy Markdown
Contributor Author

@dbartol dbartol Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the findLanguage() call below can return undefined if the user cancels the prompt to select a language, and in that case the filtering by language (and the error message) won't happen.

(And this is not new code)

Comment on lines +565 to +566
`The following databases were skipped:\n${skippedDatabases.join(
"\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The popup box is not very large. I don't think it can handle more than 3-4 lines. Have you tried it when there are lots of skipped DBs? Maybe better to concatenate by a ,.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not new code, although the diff view might make it look like it is. I only changed compiledAndRunQuery() and its callees.

Comment thread extensions/ql-vscode/src/queryRunner.ts
Comment thread extensions/ql-vscode/src/queryRunner.ts Outdated
generateEvalLog: boolean,
additionalPacks: string[],
queryStorageDir: string,
id: string | undefined,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: is this cleaner than what you have? And then you can remove actualId below.

Suggested change
id: string | undefined,
id: string = `${basename(query.queryPath)}-${nanoid()}`,

Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, this is a good refactoring. I added a bunch of comments. I see no major issues with this.

Dave Bartolomeo and others added 2 commits March 28, 2023 18:24
Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>
Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment reformatting, but looks good otherwise. I have not tried this out locally, so I assume this works for you.

Comment thread extensions/ql-vscode/src/run-queries-shared.ts Outdated
Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>
@dbartol dbartol enabled auto-merge March 30, 2023 14:10
@dbartol dbartol merged commit dd9f6bf into main Mar 30, 2023
@dbartol dbartol deleted the dbartol/debug-context branch March 30, 2023 14:23
@koesie10 koesie10 mentioned this pull request Apr 3, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code cleanup Not directly user-visible code improvements Complexity: High Requires careful design or review. tech-debt

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants