Refactor local query evaluation to prepare for debug adapter by dbartol · Pull Request #2244 · github/vscode-codeql

dbartol · 2023-03-27T16:08:46Z

Review tip: This PR will probably make more sense if you step through the code. I suggest that you do so after you've read the PR description, but before you try to make sense of the diffs. Set a breakpoint at compileAndRunQuery(), select the Run query against a single database... command, and follow along.

This PR refactors our existing code path for local query evaluation so that it can be used by the CodeQL Debug Adapter, which will be added in an upcoming PR.

Our existing code path for local query evaluation goes through several layers of code, but there is no clear separation of concerns between those layers. As one extreme example, the LocalQueryInfo object, which handles the UI presentation of the query evaluation in the query history view, is passed all the way down the same function that directly invokes the query server. The same goes for the InitialQueryInfo object, which has a couple properties that only apply to the UI. The QueryEvaluationInfo object, also passed all the way down, mixed some very handy functions for determining the paths to various files in the query's output directory with a bunch of properties and functions used in the UI, and even serialized to disk for query history persistence. The same goes for DatabaseItem.

The separate code paths to handle the new query server vs. the legacy query server has quite a bit of duplicated and almost-duplicated code, which, due to the abovementioned comingling of concerns, was difficult to factor out.

The comingling of UI and non-UI responsibilities also makes it more difficult for the code paths that invoke the query server but do not populate the query history (example: AST generation) to evaluate a query, because they have to create extra UI-related state that will never be used.

The changes in this PR thoroughly refactor the whole code path to separate concerns. Specifically, the QueryRunner abstract class and everything below it are now focused on the mechanics of the actual evaluation via the query server, with no connection to the UI other than a progress callback and a Logger inteface. Everything else, including query history UI, timestamp file generation, and evaluator log summarization, has been hoisted out of QueryRunner and its implementations.

Details

The two implementations of QueryRunner now focus solely on evaluating the query. In particular, the implementation of compileAndRunQueryAgainstDatabaseCore() for the new query server does nothing but set up the message to send to the query server, send the message, and then return the results. The implementation for the legacy query server does much more work, but only because the legacy query server interface requires it to. The evaluation is initialized by calling createQueryRun() on QueryRunner. This holds information that will be needed even before the evaluation completes, like the path to the evaluation's output directory. The actual evaluation is invoked by calling evaluate() on the CoreQueryRun object, which will evantually return the evaluation results.

The interface to QueryRunner now defines its own minimal interfaces to specify the arguments to the evaluation (query, database, additional search paths) and the results of the evaluation, rather than reusing the UI-related types from the higher layers.

I left the layout and functionality of most of the UI-related types (LocalQueryInfo, etc.) alone, because those are serialized directly to disk and deserialized from JSON by slapping a new prototype on the raw JSON object. We could do better than this, but improving this wasn't necessary for this PR.

I factored all the properties of QueryEvaluationInfo that just computed output file paths out into a new base class QueryOutputDir. This new class is now used throughout the lower-level code to find paths, without depending on other state that is only necessary for results interpretation.

The top-level command handling for the various Run query... commands had recently been factored out into a separate local-queries.ts file, instead of being a bunch of ugly nested functions within the extension initialization code (thanks, @koesie10!). I've converted that file to a full on LocalQueries service, in the pattern used by most of our other components. All of the UI-related code that used to be scattered throughout the query evaluation code path is now here. All of the UI setup that's done before query evaluation starts (picking an output directory, creating a cancellation token source, etc.) is done via createLocalQueryRun(), and all of the post-evaluation updates are done by calling complete() on the LocalQueryRun object. As the comment in the code indicates, this looks a little clunky as-is, but it will be necessary when we try to hook it up to the debug adapter.

The diffs in local-queries.ts look scarier than they are. I didn't actually change any of the commands except to put them into the new LocalQueries class. They all just go through compileAndRunQuery(), which is where the real changes are.

The AST and Go To Reference code now talks directly to the QueryRunner without having to manufacture the various UI-related state that it doesn't need.

I separated the existing Logger interface into BaseLogger, with just the log() function, and Logger, which adds the show() function. Code that only generates log messages needs only BaseLogger. The distinction isn't critical in this PR, but becomes more important when I add the debug adapter, which will need a BaseLogger that routes the messages over the Debug Adapter Protocol, but has no way to implement the show() function.

koesie10

I've only done a partial review of the code by looking at the diff, but haven't matched the old code to the new code. These are just some things that I spotted while looking through it.

koesie10 · 2023-03-28T09:38:53Z

+  }
+
+  public getCommands(): LocalQueryCommands {
+    const runQuery = async (uri: Uri | undefined) =>


Can we move these functions to be methods on the LocalQueries class (like in other classes where we use .bind(this))? That would reduce the nesting in this method which makes this easier to read.

Opened as separate PR, #2245, to avoid making this PR's diffs even more confusing.

koesie10 · 2023-03-28T10:11:37Z

+      evaluate: async (
+        progress: ProgressCallback,
+        token: CancellationToken,
+        logger: BaseLogger,
+      ): Promise<CoreCompletedQuery> => {


Would it actually be possible to make this a method on this class instead of having it returned as part of the interface? That would make it easier to deal with this result in other places and makes it easier to follow where this class is actually used (since it's still used implicitly when calling evaluate).

evaluate() needs to use several of the parameters already passed to createQueryRun(), so one way or another createQueryRun() has to return a new object to hold that state. I could make CoreQueryRun into a class, and make the state capture explicit. Would that make things more clear overall? Or are you suggesting that evaluate() be a member of QueryRunner but take a CoreQueryRun as a parameter?

I also was going to make a comment asking whether the evaluate method needs to live on this interface. It's a bit of a shame that we then have to explicitly omit it when defining the CoreCompletedQuery interface.

I note there's only one implementation, but I see what you mean about it needing access to state available in createQueryRun. But in the two places where we call evaluate, we also called createQueryRun earlier in that same method, so the state would still be there and could be passed into evaluate as well / instead.

While it's not apparent in this PR, my motivation for having CoreQueryRun as an object with an evaluate() function, rather than having two different functions on QueryRunner that require several common parameters, was that the debug adapter needs to initialize an evaluation in one function, but then later await it in another function. A CoreQueryRun object seemed like the natural way to encapsulate everything the debug adapter would need to later evaluate the query. I'd rather not have to have the debugger capture multiple parameters from the createQueryRun() method just to pass them to a later evaluateQueryRun() method.

The real problem here is that I want a function that evaluates a query, which is asynchronous, and so would normally return a Promise<CoreQueryResults>. However, just starting the evaluation produces some properties, like outputDir, that I'll need to use before the promise completes. I could just have a single runQuery() function on QueryRunner, and have it return something like Promise<CoreQueryResult> & CoreQueryRun (note that CoreQueryRun is not inside the angle brackets). I'm pretty sure I've seen this done somewhere before. I'm not sure if other developers would find it more or less odd than the separate evaluate() function I have now.

I suppose I could just replace the evaluate property in CoreQueryRun with results: Promise<CoreQueryResults>. That seems less confusing than anything I suggest above, and also less confusing than what I originally wrote.

koesie10 · 2023-03-28T10:13:18Z

+  }
+}
+
+export class QueryEvaluationInfo extends QueryOutputDir {


Could we make the output dirs a property on this class instead of having it extend it? It seems like the query evaluation info isn't really an instance of the output dir, but does use the output dir.

Normally I would do just that, but see the comment I just added for why we can't in this case.

koesie10 · 2023-03-28T12:08:29Z

+      break;
+    case messages.QueryResultType.TIMEOUT:
+      // This is the only legacy result type that doesn't exist for the new query server. Format the
+      // messasge here, and let the later code treat is as `OTHER_ERROR`.


Suggested change

// messasge here, and let the later code treat is as `OTHER_ERROR`.

// message here, and let the later code treat is as `OTHER_ERROR`.

You meant to s/is/it/, right? Fixed.

Oh, haha, the VS Code PR message UI doesn't show the diff between the original text and your suggestion, so I totally missed the typo you were fixing, and fixed another typo on the same line instead.

Fixed (both)

koesie10 · 2023-03-28T12:11:35Z

      return {
-        query: query.queryEvalInfo,
-        message,
-        result,
-        successful: result.resultType === messages.QueryResultType.SUCCESS,
-        logFileLocation: result.logFileLocation,
+        ...translateLegacyResult(result),
      };


Would return translateLegacyResult(result); also work here without the object spreading?

Oh, weird, must be an artifact of how my changes evolved. Fixed.

koesie10 · 2023-03-28T12:13:38Z

+   * This function must be called when the evaluation completes, whether the evaluation was
+   * successful or not.
+   * */
+  public async complete(results: CoreQueryResults): Promise<void> {


Do you think another name would cover the intended functionality of this method better? Something like saveAndShowQueryResults or similar? complete seems quite abstract and generic.

I was actually going for "abstract and generic":) My idea was that code that invokes a local query run just knows that it has to call complete() when it's done, and it's up to complete() to do whatever it is that the UI needs to do when the evaluation completes. Today, that's "save and show query results", but it's up to local-queries.ts to decide what to do. Think of it like an onLocalRunCompleted event, where the handler chooses to mark the local query item as completed and show the results view.

koesie10 · 2023-03-28T12:16:47Z

+  ): Promise<LocalQueryRun> {
+    await createTimestampFile(outputDir.querySaveDir);
+
+    if (this.queryRunner.customLogDirectory) {


Do we need to have this check both in this place and in the legacy query server? Is there a way to get to the legacy query server without going through this code path?

Good catch. I've removed redundant one in the legacy code path.

…nInfo`

robertbrignull

I've done one pass through, but it's a big and complex PR so I haven't really understood everything. I've made some surface-level comments on things I noticed.

Overall I think it looks like a very good direction and I don't have any huge concerns. Decoupling the UI code from the backend execution code will be a massive help. It's just quite complex so it's hard to see the full effect of the changes.

robertbrignull · 2023-03-28T10:34:37Z

  );
  void extLogger.log(
-    `Running contextual query ${query}; results will be stored in ${queryStorageDir}`,
+    `Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`,


Suggested change

`Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`,

`Running contextual query ${query}; results will be stored in ${queryRun.outputDir.querySaveDir}`,

Otherwise I think this will print the classic [Object object]

robertbrignull · 2023-03-28T15:48:58Z

+      const message = results.message
+        ? redactableError`${results.message}`
+        : redactableError`Failed to run query`;
+      void extLogger.log(message.fullMessage);
+      void showAndLogExceptionWithTelemetry(
+        redactableError`Failed to run query: ${message}`,
+      );


Suggested change

const message = results.message

? redactableError`${results.message}`

: redactableError`Failed to run query`;

void extLogger.log(message.fullMessage);

void showAndLogExceptionWithTelemetry(

redactableError`Failed to run query: ${message}`,

);

const message = results.message

? redactableError`Failed to run query: ${results.message}`

: redactableError`Failed to run query`;

void showAndLogExceptionWithTelemetry(message);

To avoid an error message of "Failed to run query: Failed to run query". Also, the showAndLogExceptionWithTelemetry method does log to extLogger internally, so logging the full message separately shouldn't be needed.

robertbrignull · 2023-03-28T15:51:01Z

+    public readonly queryHistoryManager: QueryHistoryManager,
+    private readonly databaseManager: DatabaseManager,
+    public readonly cliServer: CodeQLCliServer,


I don't think it's ideal that LocalQueries and LocalQueryRun each have public members that each other access. I think it shows that things could be more encapsulated, or we're storing references that should instead be passed in when needed. What do you think? Unfortunately I haven't dug deep enough to have any concrete suggestions at this time.

I agree.

I've just added the two services that LocalQueryRun needs as constructor parameters to LocalQueryRun, instead of having them exposed from LocalQueries. LocalQueryRun still keeps a reference to the LocalQueries service itself, so that it can call showResultsForCompletedQuery(), but that's just a public function on LocalQueries that is also used by other services, so I don't think that's a problem.

I've also made queryInfo private on LocalQueryRun. LocalQueries was only using it on the path where evaluation throws an exception instead of returning an "unsuccessful" result object. I've encapsulated that in the new fail() function on LocalQueryRun, analogous to the existing complete() function. I've also added a comment about how we have two different error paths for some reason.

aeisenberg · 2023-03-28T20:59:34Z

+  }
+  let jsonSummary: string | undefined = undefined;
+  let summarySymbols: string | undefined = undefined;
+  if (isCanary()) {


Unrelated to this PR, but I'm surprised this feature is still behind a canary flag.

It's still under the canary flag because we're not ready to support it for external users. It's been working fine, but performance debugging is still an internal-only feature for now.

aeisenberg · 2023-03-28T21:00:32Z

+
+/**
+ * Calls the appropriate CLI command to generate a human-readable log summary.
+ * @param qs The query server client.


Suggested change

* @param qs The query server client.

* @param cliServer The cli server client.

aeisenberg · 2023-03-28T21:10:04Z

+import { extLogger, Logger } from "../common";
 import * as messages from "../pure/legacy-messages";
-import { InitialQueryInfo, LocalQueryInfo } from "../query-results";
+import * as newMessages from "../pure/new-messages";


Instead of importing from newMessages here, can you put the shared messages in pure/messages-shared?

The only symbols we reference via newMessages are the QueryResultType status codes. We translate between the legacy ones and the new ones, and expose only the new ones via the QueryRunner.

aeisenberg · 2023-03-28T21:25:22Z

-        dbItem.name
-      }): their target languages are different. Please select a different database and try again.`,
+        query.queryPath,
+      )} cannot be run against the selected database: their target languages are different. Please select a different database and try again.`,


Minor: Would be nice to continue using the database name here, but not sure if that value is available with this refactoring.

If this were the new query server, I probably would have put more effort into that. Since it's just for the legacy server, it didn't seem worthwhile.

aeisenberg · 2023-03-28T21:37:55Z

+        // It's odd that we have two different ways for a query evaluation to fail: by throwing an
+        // exception, and by returning a result with a failure code. This is how the code worked
+        // before the refactoring, so it's been preserved, but we should probably figure out how
+        // to unify both error handling paths.


I think this is because it depends on where the error happens. If the error happens in the query server, then we get a result with a failure code. It can also happen that the error happens in the IDE, in which case, we need to handle the thrown exception.

It would be nice to figure out a way to unify.

I think it's probably easy enough to unify, but I didn't want to attempt that in this PR.

aeisenberg · 2023-03-28T21:40:55Z

-          progress,
-          token,
-          item.databaseItem,
+    // If possible, only show databases with the right language (otherwise show all databases).


Is this comment incorrect? It looks like we show an error if there are no suitable databases found. I'm not sure we want to show all databases.

I think the findLanguage() call below can return undefined if the user cancels the prompt to select a language, and in that case the filtering by language (and the error message) won't happen.

(And this is not new code)

aeisenberg · 2023-03-28T21:42:41Z

+          `The following databases were skipped:\n${skippedDatabases.join(
+            "\n",


The popup box is not very large. I don't think it can handle more than 3-4 lines. Have you tried it when there are lots of skipped DBs? Maybe better to concatenate by a ,.

This is not new code, although the diff view might make it look like it is. I only changed compiledAndRunQuery() and its callees.

aeisenberg · 2023-03-28T21:47:43Z

+    generateEvalLog: boolean,
+    additionalPacks: string[],
+    queryStorageDir: string,
+    id: string | undefined,


Minor: is this cleaner than what you have? And then you can remove actualId below.

Suggested change

id: string | undefined,

id: string = `${basename(query.queryPath)}-${nanoid()}`,

aeisenberg

At a high level, this is a good refactoring. I added a bunch of comments. I see no major issues with this.

Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>

…ode-codeql into dbartol/debug-context

aeisenberg

Minor comment reformatting, but looks good otherwise. I have not tried this out locally, so I assume this works for you.

Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>

Dave Bartolomeo added 7 commits March 24, 2023 17:27

Refactor query evaluation code to separate UI concerns

f99f465

Move UI-related code from QueryRunner into LocalQueries

d004f20

Fix test code

4019e6c

Add comments

543bada

Use LocalQueryRun in LocalQueries

348e923

Merge remote-tracking branch 'origin/main' into dbartol/debug-context

4ddf1c2

Fix test code

8c3fbb8

dbartol added code cleanup Not directly user-visible code improvements Complexity: High Requires careful design or review. tech-debt labels Mar 27, 2023

Fix test expectation

1acf312

dbartol marked this pull request as ready for review March 27, 2023 17:44

dbartol requested a review from a team as a code owner March 27, 2023 17:44

dbartol requested a review from robertbrignull March 28, 2023 14:47

koesie10 reviewed Mar 28, 2023

View reviewed changes

Dave Bartolomeo added 3 commits March 28, 2023 11:26

Add comment about QueryOutputDir as a base class of `QueryEvaluatio…

5251f06

…nInfo`

Remove unnecessary spread operator

bf0af78

Fix typo

fb7a0c8

robertbrignull reviewed Mar 28, 2023

View reviewed changes

Fix integration tests on Windows

b358f11

dbartol mentioned this pull request Mar 28, 2023

Move LocalQueries commands to be member functions #2245

Merged

Dave Bartolomeo added 5 commits March 28, 2023 12:52

Fix the _other_ typo

258cb55

Remove redundant check for custom log directory

2cb329b

Fix feedback

39e7caa

Better encapsulation of LocalQueryRun

f247687

Mock CancellationToken slightly better

a00db34

aeisenberg reviewed Mar 28, 2023

View reviewed changes

Dave Bartolomeo and others added 2 commits March 28, 2023 18:24

Update extensions/ql-vscode/src/run-queries-shared.ts

24ecd6f

Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>

Fix bad param reference

a8cd9c2

Dave Bartolomeo added 2 commits March 28, 2023 18:51

Use default value for id parameter of createQueryRun()

d84baed

Merge branch 'dbartol/debug-context' of https://github.com/github/vsc…

c712b29

…ode-codeql into dbartol/debug-context

aeisenberg approved these changes Mar 29, 2023

View reviewed changes

Comment thread extensions/ql-vscode/src/run-queries-shared.ts Outdated

Update extensions/ql-vscode/src/run-queries-shared.ts

41e0702

Co-authored-by: Andrew Eisenberg <aeisenberg@github.com>

aeisenberg approved these changes Mar 29, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into dbartol/debug-context

110f5f9

dbartol enabled auto-merge March 30, 2023 14:10

dbartol merged commit dd9f6bf into main Mar 30, 2023

dbartol deleted the dbartol/debug-context branch March 30, 2023 14:23

koesie10 mentioned this pull request Apr 3, 2023

Add initial data extensions editor #2257

Closed

3 tasks

	// messasge here, and let the later code treat is as `OTHER_ERROR`.
	// message here, and let the later code treat is as `OTHER_ERROR`.

	`Running contextual query ${query}; results will be stored in ${queryRun.outputDir}`,
	`Running contextual query ${query}; results will be stored in ${queryRun.outputDir.querySaveDir}`,

	* @param qs The query server client.
	* @param cliServer The cli server client.

		`The following databases were skipped:\n${skippedDatabases.join(
		"\n",

	id: string \| undefined,
	id: string = `${basename(query.queryPath)}-${nanoid()}`,

Conversation

dbartol commented Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Uh oh!

koesie10 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertbrignull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbartol commented Mar 27, 2023 •

edited

Loading

dbartol Mar 28, 2023 •

edited

Loading