Skip to content

Add retries of CLI integration tests#1801

Merged
koesie10 merged 3 commits intojest-migration/integration-testsfrom
jest-migration/retry-tests
Nov 30, 2022
Merged

Add retries of CLI integration tests#1801
koesie10 merged 3 commits intojest-migration/integration-testsfrom
jest-migration/retry-tests

Conversation

@koesie10
Copy link
Copy Markdown
Member

@koesie10 koesie10 commented Nov 28, 2022

This will patch jest-runner-vscode to retry tests. See the separate commits for more information about each step.

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • [Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

This will patch `jest-runner-vscode` to retry tests. This is a temporary
test to see if this will help with the flakiness of the CLI integration
tests.

The biggest problem with this is that it will launch multiple VSCode
instances on every retry:
- First try (not a retry): 1 instance
- Second try: 2 instances
- Third try: 3 instances
- etc.

I'm not sure why this is happening and can't really narrow it down to a
specific cause. Even if I change the `runVSCode` call for the retry by
a simple `cp.spawn` call, it still launches multiple instances.
Multiple VSCode instances were being launched when a second instance of
VSCode was being spawned with the same user data directory. This is
probably because VSCode restores the windows from the previous session,
even when `-n`/`--new-window` is passed.

This fixes it by patching `jest-runner-vscode` to always create a new
temporary user data directory, rather than re-using the same one for
all test suites.
This will update the `jest-runner-vscode` patch to retry tests that fail
due to no test result being returned from the test runner.

This will also add some retries to the `minimal-workspace` and
`no-workspace` tests to help with flakiness.
@koesie10 koesie10 force-pushed the jest-migration/retry-tests branch from e30ceab to 7ba9027 Compare November 28, 2022 12:06
@koesie10 koesie10 requested a review from a team November 28, 2022 12:36
@koesie10 koesie10 marked this pull request as ready for review November 28, 2022 12:36
@koesie10 koesie10 requested a review from a team as a code owner November 28, 2022 12:36
...baseConfig.extensionTestsEnv,
INTEGRATION_TEST_MODE: "true",
},
retries: 3,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any guarantee that 3 retries will resolve the flakiness?

I know it's a lot to ask after so much effort doing the jest migration, but are we able to pinpoint what the failure reasons are at the moment for windows?

Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience with mocha, running the CLI tests on retries didn't work well. In fact, the retries usually masked the original problem. For the CLI tests, if there was a failure on the first try, the retries would always fail due to some state being left around (though I never dug too deeply into what it was). Perhaps the --new-window command will work, but please make sure that running a retry will have the possibility of being successful.

One suggestion is to inject a known error into CLI integration tests, push it up to CI and make sure that error (and only that error) happens for each retry.

+ fs_1.default.rm(tempUserDir, { recursive: true }, () => {
+ promiseResolve();
+ });
+ };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading a diff of a diff is making my brain hurt. 😅

@@ -11,7 +11,6 @@ const config = {
launchArgs: [
"--disable-gpu",
"--extensions-dir=" + path.join(rootDir, ".vscode-test", "extensions"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be disabling potentially conflicting extensions here as well? There was a list of them when running with mocha. Not sure if these extensions are being explicitly disabled any more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extensions are being disabled in the configuration file specific to the cli-integration tests:

"--disable-extension",
"eamodio.gitlens",
"--disable-extension",
"github.codespaces",
"--disable-extension",
"github.copilot",

We are also disabling all extensions in the configuration for the no-workspace and minimal-workspace tests:

launchArgs: [...(baseConfig.launchArgs ?? []), "--disable-extensions"],

This should match the previous Mocha setup in terms of which extensions are enabled/disabled.

@koesie10
Copy link
Copy Markdown
Member Author

Thanks for the reviews!

I've found three common errors on Windows, for which retries do seem to work:

  • "VSCode exited with exit code 1": I don't really know why, but VSCode occasionally seems to exit without running all tests. Usually, restarting VSCode and trying again works.
  • "No test result returned": Again, not sure why this happens, but sometimes VSCode exits without reporting all test results. Usually, restarting VSCode and trying again works.
  • Timeouts on the legacy-query.test.ts. For this one, retrying also usually works. I'm hesitant to increase the timeout since it's already set to 60 seconds per test/hook, so increasing it further might just prolong the time it takes until failure.

I'm not 100% confident that 3 retries will always be enough for resolving the flakiness, but this should at least reduce it to a somewhat reasonable level. Before this PR, we would sometimes get 4 failing CLI integration tests on Windows in a row, which is really frustrating to deal with and takes a lot of time just to get the tests to pass.

@elenatanasoiu
Copy link
Copy Markdown
Contributor

elenatanasoiu commented Nov 29, 2022

Thanks for writing up the error cases. From what you've described, I think there's two possibilities:

  1. We're not loading a dependency.
  2. We're loading too many dependencies and they get in the way of each other.

Given that the tests do eventually pass when VSCode is restarted, I'm inclined to think it's the second option. Something hangs when VSCode is loaded and gets in the way of tests being run successfully. When you restart, this process is killed so it unblocks everything else.

Are you able to make the exit message more verbose when it does quit unexpectedly? E.g. spit out the entire log?

Also, is it always the same version of windows that fails?

@koesie10
Copy link
Copy Markdown
Member Author

We're loading too many dependencies and they get in the way of each other.

This is probably part of the problem. We are not bundling the extension (yet), so VSCode needs to load thousands of files to even load the extension. IO on Windows is a lot slower than on Linux, so this can take hundreds of milliseconds to seconds in the worst cases. In subsequent retries, it might be that the filesystem cache is simply "hot" and it takes less time to load all files.

Are you able to make the exit message more verbose when it does quit unexpectedly? E.g. spit out the entire log?

I don't think it's actually related to our tests, running tests in VSCode just seems to be somewhat unstable on Windows. We are already logging all information we have, since we only receive the exit code of the process and the extension host logs are already being logged. See for example this run where you can see that there are some logs of the extension host, but nothing specifically saying why VSCode exited.

You can see in that run that we're also running two instances of VSCode. That should also be fixed by this PR, but won't fix the flakiness.

Also, is it always the same version of windows that fails?

No, I don't think we can narrow it down to a specific CLI version.

@elenatanasoiu
Copy link
Copy Markdown
Contributor

No, I don't think we can narrow it down to a specific CLI version.

Ah, I see. I thought it was a version of windows specifically used for GitHub Actions. But it's the CLI version. Gotcha.

Bundling the extension does sounds promising.

@@ -41,7 +41,7 @@ index 8657ace..4d35409 100644
-export default function runVSCode({ vscodePath, args, jestArgs, env, tests, globalConfig, filterOutput, onStart, onResult, onFailure, ipc, quiet, }: RunVSCodeOptions): Promise<void>;
+export default function runVSCode(options: RunVSCodeOptions): Promise<void>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On adding changes to jest-runner-vscode, was there a reason to prefer a patch directly in the project over forking the package?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea was to upstream the changes (there's a PR open for the earlier patch), but it doesn't seem likely that the maintainer is actively checking PRs. I think we can definitely fork the package as a future improvement, although that makes it somewhat harder to make these sorts of changes quickly. We would also need to change the package to publish to our own npm package, and I'm not sure how involved it would be to publish it under the @github scope.

I'll create an issue for investigating whether this is something we can do, although I think it's better to keep it in the project for now so the Jest migration can be merged into main.

@elenatanasoiu
Copy link
Copy Markdown
Contributor

elenatanasoiu commented Nov 29, 2022

It looks like running tests before Jest takes around 12-14 mins while running tests with Jest + retries takes 14 mins so there doesn't seem to be much of a time difference there.

Copy link
Copy Markdown
Contributor

@elenatanasoiu elenatanasoiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to recap:

  1. We have a feature branch which contains all of the changes we've made to migrate us to Jest. Without merging that, we're somewhat blocked from writing new tests as they'll need converting. This PR is an attempt to unblock the feature branch so it can be merged.
  2. We discovered that for some CLI versions, the migrated tests fail on windows.
  3. We also discovered that multiple instances of VSCode were being started, even after specifically providing the --new-window or -n option. This PR is creating a fresh folder every time we run our tests and start a new VSCode instance. This will stop VSCode from trying to restore previous instances.
  4. As far as I can tell, jest-runner-vscode helps us set up different sets of folder tests which require different setups for the VSCode instance. Perhaps a solution would be to aim for having a single set of tests in the future with a single VSCode instance?
  5. We're able to add a retry option to jest-runner-vscode but the project is currently not actively maintained (last human PR was merged in December 2021). To get around this we've added the retries patch straight into node_modules.

Given that this work will also unblock us from switching to ESBuild, which in turn would make this extension a lot easier to load during tests, I'm feeling optimistic that we can revert the retry patch in the future.

Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work on this. I know fixing these kinds of flaky tests is tedious and time consuming.

@elenatanasoiu are there specific CLI tests that are failing on specific versions of the CLI?

@elenatanasoiu
Copy link
Copy Markdown
Contributor

elenatanasoiu commented Nov 29, 2022

@koesie10 koesie10 merged commit 6dc1f9f into jest-migration/integration-tests Nov 30, 2022
@koesie10 koesie10 deleted the jest-migration/retry-tests branch November 30, 2022 07:49
@koesie10 koesie10 mentioned this pull request Feb 13, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants