Skip to content

Latest commit

 

History

History
156 lines (108 loc) · 11.1 KB

Helix.md

File metadata and controls

156 lines (108 loc) · 11.1 KB

Helix testing in ASP.NET Core

Helix is the distributed test platform that we use to run tests. We build a helix payload that contains the publish directory of every test project that we want to test send a job with with this payload to a set of queues for the various combinations of OS that we want to test for example: Windows.10.Amd64.ClientRS4.VS2017.Open, OSX.1100.Amd64.Open, Ubuntu.1804.Amd64.Open. Helix takes care of unzipping, running the job, and reporting results.

For more info about helix see: SDK, JobSender

Running helix tests locally

To run Helix tests for one particular test project:

.\eng\scripts\RunHelix.ps1 -Project path\mytestproject.csproj

This will restore, and then publish all the test project including some bootstrapping scripts that will install the correct dotnet runtime/sdk before running the test assembly on the helix machine(s), and upload the job to helix.

Overview of the helix usage in our pipelines

  • Required queues: Windows10, OSX, Ubuntu1804
  • Full queue matrix: Windows[10, 11], Ubuntu[1804, 2004], Debian11, Redhat7, Arm64 (Win10, Debian11)
  • The queues are defined in Helix.Common.props

aspnetcore-ci runs non quarantined tests against the required helix queues as a required PR check and all builds on all branches.

aspnetcore-helix-matrix runs non quarantined tests against all queues twice a day only on public main.

aspnetcore-quarantined-pr runs only quarantined tests against the required queues on PRs and on main every 4 hours.

aspnetcore-quarantined-tests runs only quarantined tests against all queues only on public main once a day at 11 PM.

You can always manually queue pipeline runs by clicking on the link to the pipeline -> Run Pipeline -> select your branch/tag and commit

Checkin process expectations

  • The normal PR process has aspnetcore-ci will ensure that the required queues are green.
  • If your changes are likely to have cross platform impact that would affect more than the required queues, you should kick off a manual aspnetcore-helix-matrix pipeline run against your branch before merging your PR. Even though aspnetcore-helix-matrix is not a required checkin gate, if your changes break this pipeline, you must either immediately revert your changes, or quarantine the test, its never ok to leave this pipeline in a broken state.

How do I look at the results of a helix run on Azure Pipelines?

The easiest way to look at a test failure is via the tests tab in azdo which now should show a summary of the errors and have attachments to the relevant console logs.

You can also drill down into the helix web apis if you take the HelixJobId from the Debug tab of a failing test, and the HelixWorkItemName and go to: https://helix.dot.net/api/2019-06-17/jobs/<jobId>/workitems/<workitemname> which will show you more urls you can drill into for more info.

An example of how to get the helix payload to inspect the contents of a test job more completely:

There's also a link embedded in the build.cmd log of the Tests: Helix x64 job on Azure Pipelines, near the bottom right that will look something like this:

  Sending Job to Ubuntu.1804.Amd64.Open...
  Sent Helix Job; see work items at https://helix.dot.net/api/jobs/c1b425c8-0fef-4cba-9dee-29344d7a61b8/workitems?api-version=2019-06-17
  Sending Job to Windows.11.Amd64.ClientPre.Open...
  Sent Helix Job; see work items at https://helix.dot.net/api/jobs/1fc117ce-d52a-4ea4-8896-3c289fdf8e17/workitems?api-version=2019-06-17
  Sending Job to OSX.1014.Amd64.Open...
  Sent Helix Job; see work items at https://helix.dot.net/api/jobs/53e2ca23-9efd-4299-8a8f-d9271265aeaa/workitems?api-version=2019-06-17
  Waiting for completion of job 1fc117ce-d52a-4ea4-8896-3c289fdf8e17 on Windows.11.Amd64.ClientPre.Open
  Waiting for completion of job c1b425c8-0fef-4cba-9dee-29344d7a61b8 on Ubuntu.1804.Amd64.Open
  Waiting for completion of job 53e2ca23-9efd-4299-8a8f-d9271265aeaa on OSX.1014.Amd64.Open
  Job 53e2ca23-9efd-4299-8a8f-d9271265aeaa on OSX.1014.Amd64.Open is completed with 139 finished work items.
  Job c1b425c8-0fef-4cba-9dee-29344d7a61b8 on Ubuntu.1804.Amd64.Open is completed with 138 finished work items.
  Job 1fc117ce-d52a-4ea4-8896-3c289fdf8e17 on Windows.11.Amd64.ClientPre.Open is completed with 170 finished work items.
  Stopping Azure Pipelines Test Run Ubuntu.1804.Amd64.Open
  Stopping Azure Pipelines Test Run Windows.11.Amd64.ClientPre.Open
  Stopping Azure Pipelines Test Run OSX.1014.Amd64.Open
D:\a\_work\1\s\.packages\microsoft.dotnet.helix.sdk\7.0.0-beta.21559.3\tools\Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(78,5): error : Work item Microsoft.AspNetCore.Identity.Test--net8.0 in job 53e2ca23-9efd-4299-8a8f-d9271265aeaa has failed. [D:\a\_work\1\s\eng\helix\helix.proj]
D:\a\_work\1\s\.packages\microsoft.dotnet.helix.sdk\7.0.0-beta.21559.3\tools\Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(78,5): error : Failure log: https://helix.dot.net/api/2019-06-17/jobs/53e2ca23-9efd-4299-8a8f-d9271265aeaa/workitems/Microsoft.AspNetCore.Identity.Test--net8.0/console [D:\a\_work\1\s\eng\helix\helix.proj]
##[error].packages\microsoft.dotnet.helix.sdk\7.0.0-beta.21559.3\tools\Microsoft.DotNet.Helix.Sdk.MultiQueue.targets(78,5): error : (NETCORE_ENGINEERING_TELEMETRY=Test) Work item Microsoft.AspNetCore.Identity.Test--net8.0 in job 53e2ca23-9efd-4299-8a8f-d9271265aeaa has failed.
Failure log: https://helix.dot.net/api/2019-06-17/jobs/53e2ca23-9efd-4299-8a8f-d9271265aeaa/workitems/Microsoft.AspNetCore.Identity.Test--net8.0/console

The https://helix.dot.net/ home page displays information about the available public queues (nothing about the related BYOC pools and queues or the internal Helix queues)

Some superficial information about both BYOC and Helix agents is available at https://github.com/dotnet/arcade/blob/8ca46105193bd25c95af49bc6cd3604aaefec980/Documentation/AzureDevOps/AzureDevOpsOnboarding.md#agent-queues

More detailed and always up-to-date information about all of the agents is available at https://helix.dot.net/api/2018-03-14/info/queues

What do I do if a test fails?

You can simulate how most tests run locally:

dotnet publish
cd <the publish directory>
dotnet vstest My.Tests.dll

Differences from running tests locally

Most tests that don't just work on helix automatically are ones that depend on the source code being accessible. The helix payloads only contain whatever is in the publish directories, so any thing else that test depends on will need to be included to the payload.

This can be accomplished by using the HelixContent property like so.

<ItemGroup>
  <HelixContent Include="$(RepoRoot)src\KeepMe.js"/>
  <HelixContent Include="$(RepoRoot)src\Project\**"/>
</ItemGroup>

By default, these files will be included in the root directory. To include these files in a different directory, you can use either the Link or LinkBase attributes to set the included path.

<ItemGroup>
  <HelixContent Include="$(RepoRoot)src\KeepMe.js" Link="$(MSBuildThisFileDirectory)\myassets\KeepMe.js"/>
  <HelixContent Include="$(RepoRoot)src\Project\**" LinkBase="$(MSBuildThisFileDirectory)\myassets"/>
</ItemGroup>

How to skip tests on helix

There are two main ways to opt out of helix

  • Skipping the entire test project via <BuildHelixPayload>false</BuildHelixPayload> in csproj (the default value for this is IsTestProject).
  • Skipping an individual test via [SkipOnHelix("url to github issue")].

Make sure to file an issue for any skipped tests and include that in a comment next to either of these

Process for updating helix matrix

Goal is to balance cost/flakiness against having some coverage of supported distros:

  • At the start of each product version, we pick a set of queues/versions/arches to run based on popularity and perceived risk, and how long is left in the support for that OS version.
  • Whenever a new OS is coming online, we ask CTI to do a run on it, and if there is support for it in helix, we submit a PR to update our helix-matrix to include it for it to check for any failures in it, but if there aren’t any, we don’t merge it.
  • Link to OS support calendar
  • Link to current list of queues

Example of adding a new docker image to helix

Investigating helix run time issues

Kusto has all of the helix job data, using a particular job id, with the following query you can get a breakdown of the test projects that take the longest. Ideally to take advantage of the largest fan out, we want smaller test projects since the longest running test project will be the gate for finishing the entire helix test job.

https://dataexplorer.azure.com/clusters/engsrvprod/databases/engineeringdata

WorkItems
| where JobName == "bc108374-750c-4084-853e-bc5b9b0d553e"
| where Name != JobName
| extend RunTime = Finished-Started
| top 20 by RunTime desc  
| project FriendlyName, RunTime

image