Add retries of CLI integration tests by koesie10 · Pull Request #1801 · github/vscode-codeql

koesie10 · 2022-11-28T10:34:56Z

This will patch jest-runner-vscode to retry tests. See the separate commits for more information about each step.

Checklist

CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
Issues have been created for any UI or other user-facing changes made by this pull request.
[Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

This will patch `jest-runner-vscode` to retry tests. This is a temporary test to see if this will help with the flakiness of the CLI integration tests. The biggest problem with this is that it will launch multiple VSCode instances on every retry: - First try (not a retry): 1 instance - Second try: 2 instances - Third try: 3 instances - etc. I'm not sure why this is happening and can't really narrow it down to a specific cause. Even if I change the `runVSCode` call for the retry by a simple `cp.spawn` call, it still launches multiple instances.

Multiple VSCode instances were being launched when a second instance of VSCode was being spawned with the same user data directory. This is probably because VSCode restores the windows from the previous session, even when `-n`/`--new-window` is passed. This fixes it by patching `jest-runner-vscode` to always create a new temporary user data directory, rather than re-using the same one for all test suites.

This will update the `jest-runner-vscode` patch to retry tests that fail due to no test result being returned from the test runner. This will also add some retries to the `minimal-workspace` and `no-workspace` tests to help with flakiness.

elenatanasoiu · 2022-11-28T18:03:33Z

    ...baseConfig.extensionTestsEnv,
    INTEGRATION_TEST_MODE: "true",
  },
+  retries: 3,


Is there any guarantee that 3 retries will resolve the flakiness?

I know it's a lot to ask after so much effort doing the jest migration, but are we able to pinpoint what the failure reasons are at the moment for windows?

aeisenberg

In my experience with mocha, running the CLI tests on retries didn't work well. In fact, the retries usually masked the original problem. For the CLI tests, if there was a failure on the first try, the retries would always fail due to some state being left around (though I never dug too deeply into what it was). Perhaps the --new-window command will work, but please make sure that running a retry will have the possibility of being successful.

One suggestion is to inject a known error into CLI integration tests, push it up to CI and make sure that error (and only that error) happens for each retry.

aeisenberg · 2022-11-28T18:15:31Z

+            fs_1.default.rm(tempUserDir, { recursive: true }, () => {
+                promiseResolve();
+            });
+        };


Reading a diff of a diff is making my brain hurt. 😅

aeisenberg · 2022-11-28T18:17:19Z

@@ -11,7 +11,6 @@ const config = {
  launchArgs: [
    "--disable-gpu",
    "--extensions-dir=" + path.join(rootDir, ".vscode-test", "extensions"),


Should we be disabling potentially conflicting extensions here as well? There was a list of them when running with mocha. Not sure if these extensions are being explicitly disabled any more.

The extensions are being disabled in the configuration file specific to the cli-integration tests:

vscode-codeql/extensions/ql-vscode/src/vscode-tests/cli-integration/jest-runner-vscode.config.js

Lines 14 to 19 in 7ba9027

"--disable-extension",

"eamodio.gitlens",

"--disable-extension",

"github.codespaces",

"--disable-extension",

"github.copilot",

We are also disabling all extensions in the configuration for the no-workspace and minimal-workspace tests:

vscode-codeql/extensions/ql-vscode/src/vscode-tests/no-workspace/jest-runner-vscode.config.js

Line 6 in 7ba9027

launchArgs: [...(baseConfig.launchArgs ?? []), "--disable-extensions"],

This should match the previous Mocha setup in terms of which extensions are enabled/disabled.

koesie10 · 2022-11-29T08:00:35Z

Thanks for the reviews!

I've found three common errors on Windows, for which retries do seem to work:

"VSCode exited with exit code 1": I don't really know why, but VSCode occasionally seems to exit without running all tests. Usually, restarting VSCode and trying again works.
"No test result returned": Again, not sure why this happens, but sometimes VSCode exits without reporting all test results. Usually, restarting VSCode and trying again works.
Timeouts on the legacy-query.test.ts. For this one, retrying also usually works. I'm hesitant to increase the timeout since it's already set to 60 seconds per test/hook, so increasing it further might just prolong the time it takes until failure.

I'm not 100% confident that 3 retries will always be enough for resolving the flakiness, but this should at least reduce it to a somewhat reasonable level. Before this PR, we would sometimes get 4 failing CLI integration tests on Windows in a row, which is really frustrating to deal with and takes a lot of time just to get the tests to pass.

elenatanasoiu · 2022-11-29T14:59:33Z

Thanks for writing up the error cases. From what you've described, I think there's two possibilities:

We're not loading a dependency.
We're loading too many dependencies and they get in the way of each other.

Given that the tests do eventually pass when VSCode is restarted, I'm inclined to think it's the second option. Something hangs when VSCode is loaded and gets in the way of tests being run successfully. When you restart, this process is killed so it unblocks everything else.

Are you able to make the exit message more verbose when it does quit unexpectedly? E.g. spit out the entire log?

Also, is it always the same version of windows that fails?

koesie10 · 2022-11-29T15:10:32Z

We're loading too many dependencies and they get in the way of each other.

This is probably part of the problem. We are not bundling the extension (yet), so VSCode needs to load thousands of files to even load the extension. IO on Windows is a lot slower than on Linux, so this can take hundreds of milliseconds to seconds in the worst cases. In subsequent retries, it might be that the filesystem cache is simply "hot" and it takes less time to load all files.

Are you able to make the exit message more verbose when it does quit unexpectedly? E.g. spit out the entire log?

I don't think it's actually related to our tests, running tests in VSCode just seems to be somewhat unstable on Windows. We are already logging all information we have, since we only receive the exit code of the process and the extension host logs are already being logged. See for example this run where you can see that there are some logs of the extension host, but nothing specifically saying why VSCode exited.

You can see in that run that we're also running two instances of VSCode. That should also be fixed by this PR, but won't fix the flakiness.

Also, is it always the same version of windows that fails?

No, I don't think we can narrow it down to a specific CLI version.

elenatanasoiu · 2022-11-29T15:44:23Z

No, I don't think we can narrow it down to a specific CLI version.

Ah, I see. I thought it was a version of windows specifically used for GitHub Actions. But it's the CLI version. Gotcha.

Bundling the extension does sounds promising.

elenatanasoiu · 2022-11-29T15:45:31Z

@@ -41,7 +41,7 @@ index 8657ace..4d35409 100644
 -export default function runVSCode({ vscodePath, args, jestArgs, env, tests, globalConfig, filterOutput, onStart, onResult, onFailure, ipc, quiet, }: RunVSCodeOptions): Promise<void>;
 +export default function runVSCode(options: RunVSCodeOptions): Promise<void>;


On adding changes to jest-runner-vscode, was there a reason to prefer a patch directly in the project over forking the package?

The original idea was to upstream the changes (there's a PR open for the earlier patch), but it doesn't seem likely that the maintainer is actively checking PRs. I think we can definitely fork the package as a future improvement, although that makes it somewhat harder to make these sorts of changes quickly. We would also need to change the package to publish to our own npm package, and I'm not sure how involved it would be to publish it under the @github scope.

I'll create an issue for investigating whether this is something we can do, although I think it's better to keep it in the project for now so the Jest migration can be merged into main.

elenatanasoiu · 2022-11-29T15:47:54Z

It looks like running tests before Jest takes around 12-14 mins while running tests with Jest + retries takes 14 mins so there doesn't seem to be much of a time difference there.

elenatanasoiu

So just to recap:

We have a feature branch which contains all of the changes we've made to migrate us to Jest. Without merging that, we're somewhat blocked from writing new tests as they'll need converting. This PR is an attempt to unblock the feature branch so it can be merged.
We discovered that for some CLI versions, the migrated tests fail on windows.
We also discovered that multiple instances of VSCode were being started, even after specifically providing the --new-window or -n option. This PR is creating a fresh folder every time we run our tests and start a new VSCode instance. This will stop VSCode from trying to restore previous instances.
As far as I can tell, jest-runner-vscode helps us set up different sets of folder tests which require different setups for the VSCode instance. Perhaps a solution would be to aim for having a single set of tests in the future with a single VSCode instance?
We're able to add a retry option to jest-runner-vscode but the project is currently not actively maintained (last human PR was merged in December 2021). To get around this we've added the retries patch straight into node_modules.

Given that this work will also unblock us from switching to ESBuild, which in turn would make this extension a lot easier to load during tests, I'm feeling optimistic that we can revert the retry patch in the future.

aeisenberg

Thanks for the hard work on this. I know fixing these kinds of flaky tests is tedious and time consuming.

@elenatanasoiu are there specific CLI tests that are failing on specific versions of the CLI?

elenatanasoiu · 2022-11-29T17:29:31Z

They seem random:

Also some ubuntu ones seem to flake, but I'm unsure that's related:

koesie10 added 3 commits November 28, 2022 11:32

koesie10 force-pushed the jest-migration/retry-tests branch from e30ceab to 7ba9027 Compare November 28, 2022 12:06

koesie10 added the secexp label Nov 28, 2022

koesie10 requested a review from a team November 28, 2022 12:36

koesie10 marked this pull request as ready for review November 28, 2022 12:36

koesie10 requested a review from a team as a code owner November 28, 2022 12:36

elenatanasoiu reviewed Nov 28, 2022

View reviewed changes

aeisenberg reviewed Nov 28, 2022

View reviewed changes

koesie10 mentioned this pull request Nov 29, 2022

Fix loading results in second opened view #1806

Merged

3 tasks

elenatanasoiu reviewed Nov 29, 2022

View reviewed changes

elenatanasoiu approved these changes Nov 29, 2022

View reviewed changes

aeisenberg approved these changes Nov 29, 2022

View reviewed changes

koesie10 merged commit 6dc1f9f into jest-migration/integration-tests Nov 30, 2022

koesie10 deleted the jest-migration/retry-tests branch November 30, 2022 07:49

koesie10 mentioned this pull request Feb 13, 2023

Improve test setup #2058

Merged

3 tasks

	"--disable-extension",
	"eamodio.gitlens",
	"--disable-extension",
	"github.codespaces",
	"--disable-extension",
	"github.copilot",

		@@ -41,7 +41,7 @@ index 8657ace..4d35409 100644
		-export default function runVSCode({ vscodePath, args, jestArgs, env, tests, globalConfig, filterOutput, onStart, onResult, onFailure, ipc, quiet, }: RunVSCodeOptions): Promise<void>;
		+export default function runVSCode(options: RunVSCodeOptions): Promise<void>;

Conversation

koesie10 commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

elenatanasoiu Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

aeisenberg left a comment

Choose a reason for hiding this comment

Uh oh!

aeisenberg Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

aeisenberg Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

koesie10 Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

koesie10 commented Nov 29, 2022

Uh oh!

elenatanasoiu commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koesie10 commented Nov 29, 2022

Uh oh!

elenatanasoiu commented Nov 29, 2022

Uh oh!

elenatanasoiu Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

koesie10 Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

elenatanasoiu commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elenatanasoiu left a comment

Choose a reason for hiding this comment

Uh oh!

aeisenberg left a comment

Choose a reason for hiding this comment

Uh oh!

elenatanasoiu commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

koesie10 commented Nov 28, 2022 •

edited

Loading

elenatanasoiu commented Nov 29, 2022 •

edited

Loading

elenatanasoiu commented Nov 29, 2022 •

edited

Loading

elenatanasoiu commented Nov 29, 2022 •

edited

Loading