Refactor local query evaluation to prepare for debug adapter#2244
Refactor local query evaluation to prepare for debug adapter#2244
Conversation
koesie10
left a comment
There was a problem hiding this comment.
I've only done a partial review of the code by looking at the diff, but haven't matched the old code to the new code. These are just some things that I spotted while looking through it.
| } | ||
|
|
||
| public getCommands(): LocalQueryCommands { | ||
| const runQuery = async (uri: Uri | undefined) => |
There was a problem hiding this comment.
Can we move these functions to be methods on the LocalQueries class (like in other classes where we use .bind(this))? That would reduce the nesting in this method which makes this easier to read.
There was a problem hiding this comment.
Opened as separate PR, #2245, to avoid making this PR's diffs even more confusing.
| evaluate: async ( | ||
| progress: ProgressCallback, | ||
| token: CancellationToken, | ||
| logger: BaseLogger, | ||
| ): Promise<CoreCompletedQuery> => { |
There was a problem hiding this comment.
Would it actually be possible to make this a method on this class instead of having it returned as part of the interface? That would make it easier to deal with this result in other places and makes it easier to follow where this class is actually used (since it's still used implicitly when calling evaluate).
There was a problem hiding this comment.
evaluate() needs to use several of the parameters already passed to createQueryRun(), so one way or another createQueryRun() has to return a new object to hold that state. I could make CoreQueryRun into a class, and make the state capture explicit. Would that make things more clear overall? Or are you suggesting that evaluate() be a member of QueryRunner but take a CoreQueryRun as a parameter?
There was a problem hiding this comment.
I also was going to make a comment asking whether the evaluate method needs to live on this interface. It's a bit of a shame that we then have to explicitly omit it when defining the CoreCompletedQuery interface.
I note there's only one implementation, but I see what you mean about it needing access to state available in createQueryRun. But in the two places where we call evaluate, we also called createQueryRun earlier in that same method, so the state would still be there and could be passed into evaluate as well / instead.
There was a problem hiding this comment.
While it's not apparent in this PR, my motivation for having CoreQueryRun as an object with an evaluate() function, rather than having two different functions on QueryRunner that require several common parameters, was that the debug adapter needs to initialize an evaluation in one function, but then later await it in another function. A CoreQueryRun object seemed like the natural way to encapsulate everything the debug adapter would need to later evaluate the query. I'd rather not have to have the debugger capture multiple parameters from the createQueryRun() method just to pass them to a later evaluateQueryRun() method.
The real problem here is that I want a function that evaluates a query, which is asynchronous, and so would normally return a Promise<CoreQueryResults>. However, just starting the evaluation produces some properties, like outputDir, that I'll need to use before the promise completes. I could just have a single runQuery() function on QueryRunner, and have it return something like Promise<CoreQueryResult> & CoreQueryRun (note that CoreQueryRun is not inside the angle brackets). I'm pretty sure I've seen this done somewhere before. I'm not sure if other developers would find it more or less odd than the separate evaluate() function I have now.
I suppose I could just replace the evaluate property in CoreQueryRun with results: Promise<CoreQueryResults>. That seems less confusing than anything I suggest above, and also less confusing than what I originally wrote.
| } | ||
| } | ||
|
|
||
| export class QueryEvaluationInfo extends QueryOutputDir { |
There was a problem hiding this comment.
Could we make the output dirs a property on this class instead of having it extend it? It seems like the query evaluation info isn't really an instance of the output dir, but does use the output dir.
There was a problem hiding this comment.
Normally I would do just that, but see the comment I just added for why we can't in this case.
| break; | ||
| case messages.QueryResultType.TIMEOUT: | ||
| // This is the only legacy result type that doesn't exist for the new query server. Format the | ||
| // messasge here, and let the later code treat is as `OTHER_ERROR`. |
There was a problem hiding this comment.
| // messasge here, and let the later code treat is as `OTHER_ERROR`. | |
| // message here, and let the later code treat is as `OTHER_ERROR`. |
There was a problem hiding this comment.
You meant to s/is/it/, right? Fixed.
There was a problem hiding this comment.
Oh, haha, the VS Code PR message UI doesn't show the diff between the original text and your suggestion, so I totally missed the typo you were fixing, and fixed another typo on the same line instead.
| return { | ||
| query: query.queryEvalInfo, | ||
| message, | ||
| result, | ||
| successful: result.resultType === messages.QueryResultType.SUCCESS, | ||
| logFileLocation: result.logFileLocation, | ||
| ...translateLegacyResult(result), | ||
| }; |
There was a problem hiding this comment.
Would return translateLegacyResult(result); also work here without the object spreading?
There was a problem hiding this comment.
Oh, weird, must be an artifact of how my changes evolved. Fixed.
| * This function must be called when the evaluation completes, whether the evaluation was | ||
| * successful or not. | ||
| * */ | ||
| public async complete(results: CoreQueryResults): Promise<void> { |
There was a problem hiding this comment.
Do you think another name would cover the intended functionality of this method better? Something like saveAndShowQueryResults or similar? complete seems quite abstract and generic.
There was a problem hiding this comment.
I was actually going for "abstract and generic":) My idea was that code that invokes a local query run just knows that it has to call complete() when it's done, and it's up to complete() to do whatever it is that the UI needs to do when the evaluation completes. Today, that's "save and show query results", but it's up to local-queries.ts to decide what to do. Think of it like an onLocalRunCompleted event, where the handler chooses to mark the local query item as completed and show the results view.
| ): Promise<LocalQueryRun> { | ||
| await createTimestampFile(outputDir.querySaveDir); | ||
|
|
||
| if (this.queryRunner.customLogDirectory) { |
There was a problem hiding this comment.
Do we need to have this check both in this place and in the legacy query server? Is there a way to get to the legacy query server without going through this code path?
There was a problem hiding this comment.
Good catch. I've removed redundant one in the legacy code path.
robertbrignull
left a comment
There was a problem hiding this comment.
I've done one pass through, but it's a big and complex PR so I haven't really understood everything. I've made some surface-level comments on things I noticed.
Overall I think it looks like a very good direction and I don't have any huge concerns. Decoupling the UI code from the backend execution code will be a massive help. It's just quite complex so it's hard to see the full effect of the changes.
| ); | ||
| void extLogger.log( | ||
| `Running contextual query ${query}; results will be stored in ${queryStorageDir}`, | ||
| `Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`, |
There was a problem hiding this comment.
| `Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`, | |
| `Running contextual query ${query}; results will be stored in ${queryRun.outputDir.querySaveDir}`, |
Otherwise I think this will print the classic [Object object]
| const message = results.message | ||
| ? redactableError`${results.message}` | ||
| : redactableError`Failed to run query`; | ||
| void extLogger.log(message.fullMessage); | ||
| void showAndLogExceptionWithTelemetry( | ||
| redactableError`Failed to run query: ${message}`, | ||
| ); |
There was a problem hiding this comment.
| const message = results.message | |
| ? redactableError`${results.message}` | |
| : redactableError`Failed to run query`; | |
| void extLogger.log(message.fullMessage); | |
| void showAndLogExceptionWithTelemetry( | |
| redactableError`Failed to run query: ${message}`, | |
| ); | |
| const message = results.message | |
| ? redactableError`Failed to run query: ${results.message}` | |
| : redactableError`Failed to run query`; | |
| void showAndLogExceptionWithTelemetry(message); |
To avoid an error message of "Failed to run query: Failed to run query". Also, the showAndLogExceptionWithTelemetry method does log to extLogger internally, so logging the full message separately shouldn't be needed.
| public readonly queryHistoryManager: QueryHistoryManager, | ||
| private readonly databaseManager: DatabaseManager, | ||
| public readonly cliServer: CodeQLCliServer, |
There was a problem hiding this comment.
I don't think it's ideal that LocalQueries and LocalQueryRun each have public members that each other access. I think it shows that things could be more encapsulated, or we're storing references that should instead be passed in when needed. What do you think? Unfortunately I haven't dug deep enough to have any concrete suggestions at this time.
There was a problem hiding this comment.
I agree.
I've just added the two services that LocalQueryRun needs as constructor parameters to LocalQueryRun, instead of having them exposed from LocalQueries. LocalQueryRun still keeps a reference to the LocalQueries service itself, so that it can call showResultsForCompletedQuery(), but that's just a public function on LocalQueries that is also used by other services, so I don't think that's a problem.
I've also made queryInfo private on LocalQueryRun. LocalQueries was only using it on the path where evaluation throws an exception instead of returning an "unsuccessful" result object. I've encapsulated that in the new fail() function on LocalQueryRun, analogous to the existing complete() function. I've also added a comment about how we have two different error paths for some reason.
| } | ||
| let jsonSummary: string | undefined = undefined; | ||
| let summarySymbols: string | undefined = undefined; | ||
| if (isCanary()) { |
There was a problem hiding this comment.
Unrelated to this PR, but I'm surprised this feature is still behind a canary flag.
There was a problem hiding this comment.
It's still under the canary flag because we're not ready to support it for external users. It's been working fine, but performance debugging is still an internal-only feature for now.
|
|
||
| /** | ||
| * Calls the appropriate CLI command to generate a human-readable log summary. | ||
| * @param qs The query server client. |
There was a problem hiding this comment.
| * @param qs The query server client. | |
| * @param cliServer The cli server client. |
| import { extLogger, Logger } from "../common"; | ||
| import * as messages from "../pure/legacy-messages"; | ||
| import { InitialQueryInfo, LocalQueryInfo } from "../query-results"; | ||
| import * as newMessages from "../pure/new-messages"; |
There was a problem hiding this comment.
Instead of importing from newMessages here, can you put the shared messages in pure/messages-shared?
There was a problem hiding this comment.
The only symbols we reference via newMessages are the QueryResultType status codes. We translate between the legacy ones and the new ones, and expose only the new ones via the QueryRunner.
| dbItem.name | ||
| }): their target languages are different. Please select a different database and try again.`, | ||
| query.queryPath, | ||
| )} cannot be run against the selected database: their target languages are different. Please select a different database and try again.`, |
There was a problem hiding this comment.
Minor: Would be nice to continue using the database name here, but not sure if that value is available with this refactoring.
There was a problem hiding this comment.
If this were the new query server, I probably would have put more effort into that. Since it's just for the legacy server, it didn't seem worthwhile.
| // It's odd that we have two different ways for a query evaluation to fail: by throwing an | ||
| // exception, and by returning a result with a failure code. This is how the code worked | ||
| // before the refactoring, so it's been preserved, but we should probably figure out how | ||
| // to unify both error handling paths. |
There was a problem hiding this comment.
I think this is because it depends on where the error happens. If the error happens in the query server, then we get a result with a failure code. It can also happen that the error happens in the IDE, in which case, we need to handle the thrown exception.
It would be nice to figure out a way to unify.
There was a problem hiding this comment.
I think it's probably easy enough to unify, but I didn't want to attempt that in this PR.
| progress, | ||
| token, | ||
| item.databaseItem, | ||
| // If possible, only show databases with the right language (otherwise show all databases). |
There was a problem hiding this comment.
Is this comment incorrect? It looks like we show an error if there are no suitable databases found. I'm not sure we want to show all databases.
There was a problem hiding this comment.
I think the findLanguage() call below can return undefined if the user cancels the prompt to select a language, and in that case the filtering by language (and the error message) won't happen.
(And this is not new code)
| `The following databases were skipped:\n${skippedDatabases.join( | ||
| "\n", |
There was a problem hiding this comment.
The popup box is not very large. I don't think it can handle more than 3-4 lines. Have you tried it when there are lots of skipped DBs? Maybe better to concatenate by a ,.
There was a problem hiding this comment.
This is not new code, although the diff view might make it look like it is. I only changed compiledAndRunQuery() and its callees.
| generateEvalLog: boolean, | ||
| additionalPacks: string[], | ||
| queryStorageDir: string, | ||
| id: string | undefined, |
There was a problem hiding this comment.
Minor: is this cleaner than what you have? And then you can remove actualId below.
| id: string | undefined, | |
| id: string = `${basename(query.queryPath)}-${nanoid()}`, |
aeisenberg
left a comment
There was a problem hiding this comment.
At a high level, this is a good refactoring. I added a bunch of comments. I see no major issues with this.
Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>
aeisenberg
left a comment
There was a problem hiding this comment.
Minor comment reformatting, but looks good otherwise. I have not tried this out locally, so I assume this works for you.
Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>
Review tip: This PR will probably make more sense if you step through the code. I suggest that you do so after you've read the PR description, but before you try to make sense of the diffs. Set a breakpoint at
compileAndRunQuery(), select theRun query against a single database...command, and follow along.This PR refactors our existing code path for local query evaluation so that it can be used by the CodeQL Debug Adapter, which will be added in an upcoming PR.
Our existing code path for local query evaluation goes through several layers of code, but there is no clear separation of concerns between those layers. As one extreme example, the
LocalQueryInfoobject, which handles the UI presentation of the query evaluation in the query history view, is passed all the way down the same function that directly invokes the query server. The same goes for theInitialQueryInfoobject, which has a couple properties that only apply to the UI. TheQueryEvaluationInfoobject, also passed all the way down, mixed some very handy functions for determining the paths to various files in the query's output directory with a bunch of properties and functions used in the UI, and even serialized to disk for query history persistence. The same goes forDatabaseItem.The separate code paths to handle the new query server vs. the legacy query server has quite a bit of duplicated and almost-duplicated code, which, due to the abovementioned comingling of concerns, was difficult to factor out.
The comingling of UI and non-UI responsibilities also makes it more difficult for the code paths that invoke the query server but do not populate the query history (example: AST generation) to evaluate a query, because they have to create extra UI-related state that will never be used.
The changes in this PR thoroughly refactor the whole code path to separate concerns. Specifically, the
QueryRunnerabstract class and everything below it are now focused on the mechanics of the actual evaluation via the query server, with no connection to the UI other than a progress callback and aLoggerinteface. Everything else, including query history UI, timestamp file generation, and evaluator log summarization, has been hoisted out ofQueryRunnerand its implementations.Details
The two implementations of
QueryRunnernow focus solely on evaluating the query. In particular, the implementation ofcompileAndRunQueryAgainstDatabaseCore()for the new query server does nothing but set up the message to send to the query server, send the message, and then return the results. The implementation for the legacy query server does much more work, but only because the legacy query server interface requires it to. The evaluation is initialized by callingcreateQueryRun()onQueryRunner. This holds information that will be needed even before the evaluation completes, like the path to the evaluation's output directory. The actual evaluation is invoked by callingevaluate()on theCoreQueryRunobject, which will evantually return the evaluation results.The interface to
QueryRunnernow defines its own minimal interfaces to specify the arguments to the evaluation (query, database, additional search paths) and the results of the evaluation, rather than reusing the UI-related types from the higher layers.I left the layout and functionality of most of the UI-related types (
LocalQueryInfo, etc.) alone, because those are serialized directly to disk and deserialized from JSON by slapping a new prototype on the raw JSON object. We could do better than this, but improving this wasn't necessary for this PR.I factored all the properties of
QueryEvaluationInfothat just computed output file paths out into a new base classQueryOutputDir. This new class is now used throughout the lower-level code to find paths, without depending on other state that is only necessary for results interpretation.The top-level command handling for the various
Run query...commands had recently been factored out into a separatelocal-queries.tsfile, instead of being a bunch of ugly nested functions within the extension initialization code (thanks, @koesie10!). I've converted that file to a full onLocalQueriesservice, in the pattern used by most of our other components. All of the UI-related code that used to be scattered throughout the query evaluation code path is now here. All of the UI setup that's done before query evaluation starts (picking an output directory, creating a cancellation token source, etc.) is done viacreateLocalQueryRun(), and all of the post-evaluation updates are done by callingcomplete()on theLocalQueryRunobject. As the comment in the code indicates, this looks a little clunky as-is, but it will be necessary when we try to hook it up to the debug adapter.The diffs in
local-queries.tslook scarier than they are. I didn't actually change any of the commands except to put them into the newLocalQueriesclass. They all just go throughcompileAndRunQuery(), which is where the real changes are.The AST and Go To Reference code now talks directly to the
QueryRunnerwithout having to manufacture the various UI-related state that it doesn't need.I separated the existing
Loggerinterface intoBaseLogger, with just thelog()function, andLogger, which adds theshow()function. Code that only generates log messages needs onlyBaseLogger. The distinction isn't critical in this PR, but becomes more important when I add the debug adapter, which will need aBaseLoggerthat routes the messages over the Debug Adapter Protocol, but has no way to implement theshow()function.