Skip to content

Testing Android and iOS apps on OSS CI using Nova reusable mobile workflow

Huy Do edited this page Apr 19, 2024 · 14 revisions

With the advent of new tools like ExecuTorch, it's now possible to run LLM inference locally for models such as llama2 on mobile devices. While it isn't hard to experiment with this new capability, test it out on your own devices, and see some results, it takes more efforts to automate this process and make it a part of the CI on various PyTorch-family repositories. To solve this challenge, PyTorch Dev Infra team are launching a new Nova reusable mobile workflow to do the heavy lifting for you when it comes to testing your mobile apps.

With this new reusable workflow, devs now can:

  1. Utilize our mobile infrastructure built on top of AWS Device Farm. It offers a wide variety of popular Android and iOS devices from phones to tablets.
  2. Write and run tests remotely on those devices like how you run them locally with your own phones.
  3. Go beyond the emulator to stress test and benchmark your local LLM inference solutions on actual devices. This helps accurately answer the questions about performance (how many token can be generated per second) and how much memory and power are needed.
  4. Debug hard-to-reproduce issues on devices that you don't have.
  5. Gather the results and share them via the familiar GitHub CI UX.

Quick Start

Let's say you are integrating a new ExecuTorch backend which improves llama2 inference performance. You have already run some prompts to confirm that the token per second (TPS) is higher that what's reported in https://github.com/pytorch/executorch/tree/main/examples/models/llama2#performance. The result looks good on your phones, so the next step is to confirm the value on CI. To do that, you will need a few things:

  1. Decide on a group of devices you want to run the test. Take Android as an example, you want to run it on similar Samsung Galaxy S2x devices. In this case, this group of devices has already been created in our infra under the ARN arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa.
  2. Build the app that you want to test. It would be in the .apk format for Android and .ipa format for iOS.
  3. Prepare the test to run. We are supporting two types of tests at the moment:
    1. Instrumented tests on Android https://developer.android.com/training/testing/instrumented-tests
    2. and XCTest on iOS https://developer.apple.com/documentation/xctest
  4. Prepare an optional zip archive of any data files you want to copy to the remote devices. For example, this could contain the exported models themselves.
    1. On Android, the archive will be extracted to /sdcard/ directory.
    2. On iOS, the files will be on the application sandbox.
  5. Take a look at the default test specification and customize one if necessary.
  6. Finally, for new projects, please reach out to PyTorch Dev Infra so that we can help with some onboarding tasks to:
    1. Setup the authentication from your repository to our infra, i.e. https://github.com/pytorch-labs/pytorch-gha-infra/pull/373
    2. Setup the project on AWS Device Farm
    3. Optionally, setup custom pools of devices to suit your needs.

Test specification

After having these items ready, the next step is to take a look at the test specification that codifies how the test is run. You probably could just use the default test spec that is provided, but knowing spec would come in handy if you need to customize it. Here are some examples:

  1. The Android test spec for ExecuTorch Llama app can be found at https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml. It prepares the folder /data/local/tmp/llama/ and copies the exported model xnnpack_llama2.pte together with the tokenizer tokenizer.bin there before running the test. $DEVICEFARM_DEVICE_UDID is an env variable set by AWS Device Farm to point to the target device, and the output will be written to $DEVICEFARM_LOG_DIR/instrument.log.
...
  test:
    commands:
      # By default, the following ADB command is used by Device Farm to run your Instrumentation test.
      # Please refer to Android's documentation for more options on running instrumentation tests with adb:
      # https://developer.android.com/studio/test/command-line#run-tests-with-adb
      - echo "Starting the Instrumentation test"
      - |
        adb -s $DEVICEFARM_DEVICE_UDID shell "am instrument -r -w --no-window-animation \
        $DEVICEFARM_TEST_PACKAGE_NAME/$DEVICEFARM_TEST_PACKAGE_RUNNER 2>&1 || echo \": -1\"" |
        tee $DEVICEFARM_LOG_DIR/instrument.log
...
  1. The generic iOS test spec used by ExecuTorch iOS demo app is at https://ossci-assets.s3.amazonaws.com/default-ios-device-farm-appium-test-spec.yml where it just invokes xcodebuild test-without-building on the target device.
  test:
    commands:
      - xcodebuild test-without-building -destination id=$DEVICEFARM_DEVICE_UDID -xctestrun $DEVICEFARM_TEST_PACKAGE_PATH/*.xctestrun  -derivedDataPath $DEVICEFARM_LOG_DIR

Note that if you have a custom test spec, you'll need to upload them somewhere downloadable by the workflow. In these above examples, they are from ossci-assets S3 bucket owned by PyTorch Dev Infra, but other direct links are fine too.

Example workflows

Let's bring everything together and go through an actual example in https://github.com/pytorch/executorch/blob/main/.github/workflows/android.yml where the comments highlight important components of the new workflow.

name: Android

on:
  ...

jobs:
  # Build all the demo apps 
  test-demo-android:
    name: test-demo-android
    uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
    strategy:
      matrix:
        include:
          - build-tool: buck2
    with:
      runner: linux.12xlarge
      docker-image: executorch-ubuntu-22.04-clang12-android
      submodules: 'true'
      ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
      timeout: 90
      # The apps are built using Nova reusable GH action, so we set the upload-artifact parameter here to make them available as artifacts on GitHub
      upload-artifact: android-apps
      script: |
        set -eux

        ... Building the apps ...
        
        # In Nova workflow, all the files under artifacts-to-be-uploaded folder will be uploaded
        mkdir -p artifacts-to-be-uploaded
        # Copy the app and its test suite to S3
        cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/debug/*.apk artifacts-to-be-uploaded/
        cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/androidTest/debug/*.apk artifacts-to-be-uploaded/
        # Also copy the share libraries
        cp cmake-out-android/lib/*.a artifacts-to-be-uploaded/

  # Upload the app and its test suite to gha-artifacts S3 bucket so that they can be downloaded by the subsequent test job
  upload-artifacts:
    needs: test-demo-android
    runs-on: linux.2xlarge
    steps:
      - name: Download the artifacts
        uses: actions/download-artifact@v3
        with:
          # The name here needs to match the name of the upload-artifact parameter
          name: android-apps
          path: ${{ runner.temp }}/artifacts/

      - name: Verify the artifacts
        shell: bash
        working-directory: ${{ runner.temp }}/artifacts/
        run: |
          ls -lah ./

      - name: Upload the artifacts to S3
        uses: seemethere/upload-artifact-s3@v5
        with:
          s3-bucket: gha-artifacts
          s3-prefix: |
            ${{ github.repository }}/${{ github.run_id }}/artifact
          retention-days: 14
          if-no-files-found: ignore
          path: ${{ runner.temp }}/artifacts/

  # Run the test on remote Android devices
  test-llama-app:
    needs: upload-artifacts
    permissions:
      id-token: write
      contents: read
    uses: pytorch/test-infra/.github/workflows/mobile_job.yml@main
    with:
      device-type: android
      runner: ubuntu-latest
      test-infra-ref: ''
      # This is the ARN of ExecuTorch project on AWS
      project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6

      # This is the custom Android device pool that only includes Samsung Galaxy S2x
      device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa

      # Uploaded to S3 from the previous job, the name of the app comes from the project itself
      android-app-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/app-debug.apk
      android-test-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/app-debug-androidTest.apk

      # The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml. A link
      # to download the spec also works here.
      test-spec: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/abd86868-fa63-467e-a5c7-218194665a77
      
      # The exported llama2 model and its tokenizer.  The archive can be downloaded from https://ossci-assets.s3.amazonaws.com/executorch-android-llama2-7b.zip.
      # A link to download the archive also works here, but keep in mind that some large exported models like llama2 7B is
      # few GB in size, so it would be faster to upload it to AWS beforehand and reuse the existing resource if possible
      extra-data: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/bd15825b-ddab-4e47-9fef-a9c8935778dd

In this example, pytorch/test-infra/.github/workflows/mobile_job.yml is the one doing the heavy lifting here. It can be tweaked with the following parameters:

  • device-type: either android or ios
  • project-arn: this value is fixed for each project, please reach out to PyTorch Dev Infra if you need to get one. There are 2 available projects atm.
    • arn:aws:devicefarm:us-west-2:308535385114:project:b531574a-fb82-40ae-b687-8f0b81341ae0 for PyTorch core.
    • and arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6 for ExecuTorch.
  • device-pool-arn: this is the pool of remote devices to run the test. By default, 5 random popular devices will be selected for the test. Please also reach out to PyTorch Dev Infra if you need something more specific. Please note that the app itself can limit the set of potential devices that can be used, for example, having IPHONEOS_DEPLOYMENT_TARGET set to 17 will exclude all devices with older iOS version.
  • test-spec: this is the test specification on how the test is run. Adb commands for Android, and xcodebuild commands for iOS can be used here.
  • extra-data: the archive with any extra data to copy over to the device before the test is run.

Some parameters are platform-specific. For Android, we have:

  • android-app-archive: the link to the Android app APK archive to run. It also accepts an existing ARN if the app has already been uploaded to AWS.
  • android-test-archive: the link to the Android instrumentation tests APK archive or an existing ARN. The test archive is built with ./gradlew assembleAndroidTest if Gradle is used.

For iOS, there are two other equivalent parameters for the app and the test suite. In this case, they need to be built for generic iOS device and not the simulator, for example xcodebuild build-for-testing -project <PATH_TO>.xcodeproj -scheme <THE_TEST_SUITE_TO_BUILD> -destination platform="iOS".

  • ios-ipa-archive: the link to the iOS app archive to run or an existing ARN if the app has already been uploaded to AWS.
  • ios-xctestrun-zip the link to the iOS xctestrun zip archive or an existing ARN of the archive.

Voila! You now have a workflow to run the tests on remote mobile devices.

How to get the test results

As a reusable GitHub workflow, we depends on GitHub UX to bring the test results back to devs via its console log. HUD support is minimal at the moment, but could be extended in the future. Here are some hands-on examples to illustrate how to read the console log.

Getting the token-per-second on Samsung S22 phones

Let's pick a concrete example of the test-llama-app job. In the output, the most important step of the job is Run Android tests on devices where it runs inference with llama2 on 4 different S22 devices (this happens in parallel if you wonder). The test itself consists of 3 steps Setup Suite, Tests Suite, and Teardown Suite. Each step returns its own .logcat file for manual inspection. The most important output is the Test_spec_output.txt where the test output is recorded. For simplicity, all the lines starting with the prefix [PyTorch] will be print to the console. In our example, the test observe a TPS of 6.74 running llama2 inference on Samsung Galaxy S22 5G.

Samsung Galaxy S22 5G PASSED with stats {'total': 3, 'passed': 3, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
  Setup Suite PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
    Setup Test PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
      Saving FILE Logcat.logcat (DEVICE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00000_00000_00000_Logcat.logcat
      Saving FILE TCP_dump_log.txt (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00000_00000_00001_TCP_dump_log.txt
      Saving LOG ListArtifactType.log.json (MESSAGE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00000_00000_LOG_ListArtifactType.log.json
  Tests Suite PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
    Tests PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
      Saving FILE Test_spec_output.txt (TESTSPEC_OUTPUT) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00000_Test_spec_output.txt
        [PyTorch] junit.framework.AssertionFailedError: The observed TPS 6.7432113 is less than the expected TPS 10.0
      Saving FILE Customer_Artifacts.zip (CUSTOMER_ARTIFACT) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00001_Customer_Artifacts.zip
      Saving FILE Customer_Artifacts_Log.txt (CUSTOMER_ARTIFACT_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00002_Customer_Artifacts_Log.txt
      Saving FILE Test_spec_shell_script.sh (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00003_Test_spec_shell_script.sh
      Saving FILE Test_spec_file.yml (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00004_Test_spec_file.yml
      Saving FILE Video.mp4 (VIDEO) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00005_Video.mp4
      Saving FILE Logcat.logcat (DEVICE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00006_Logcat.logcat
      Saving FILE TCP_dump_log.txt (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_00008_TCP_dump_log.txt
      Saving LOG ListArtifactType.log.json (MESSAGE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00001_00000_LOG_ListArtifactType.log.json
  Teardown Suite PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
    Teardown Test PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
      Saving FILE Logcat.logcat (DEVICE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00002_00000_00001_Logcat.logcat
      Saving FILE TCP_dump_log.txt (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00002_00000_00003_TCP_dump_log.txt
      Saving LOG ListArtifactType.log.json (MESSAGE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8734286527/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_9a5cdbc0-561c-4005-9bc8-6003fb40d76a_00002_00002_00000_LOG_ListArtifactType.log.json

Note that all other artifacts from AWS Device Farm such as the capture screen are also available.

Testing different ExecuTorch backends on iOS

Another example is to test different ExecuTorch backends on iOS. The structure of the log is the same with the 3 steps Setup Suite, Tests Suite, and Teardown Suite as shown in the section below for iPhone 11.

Apple iPhone 11 PASSED with stats {'total': 3, 'passed': 3, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
  Setup Suite PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
    Setup Test PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
      Saving FILE Syslog.syslog (DEVICE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00000_00000_00000_Syslog.syslog
      Saving FILE TCP_dump_log.txt (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00000_00000_00001_TCP_dump_log.txt
      Saving LOG ListArtifactType.log.json (MESSAGE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00000_00000_LOG_ListArtifactType.log.json
  Tests Suite PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
    Tests PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
      Saving FILE Test_spec_output.txt (TESTSPEC_OUTPUT) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00000_Test_spec_output.txt
      Saving FILE Test_spec_shell_script.sh (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00001_Test_spec_shell_script.sh
      Saving FILE Test_spec_file.yml (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00002_Test_spec_file.yml
      Saving FILE Customer_Artifacts.zip (CUSTOMER_ARTIFACT) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00003_Customer_Artifacts.zip
      Saving FILE Customer_Artifacts_Log.txt (CUSTOMER_ARTIFACT_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00004_Customer_Artifacts_Log.txt
      Saving FILE Video.mp4 (VIDEO) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00005_Video.mp4
      Saving FILE Syslog.syslog (DEVICE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00006_Syslog.syslog
      Saving FILE TCP_dump_log.txt (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_00008_TCP_dump_log.txt
      Saving LOG ListArtifactType.log.json (MESSAGE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00001_00000_LOG_ListArtifactType.log.json
  Teardown Suite PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
    Teardown Test PASSED with stats {'total': 1, 'passed': 1, 'failed': 0, 'warned': 0, 'errored': 0, 'stopped': 0, 'skipped': 0}
      Saving FILE Webkit_Log.webkitlog (WEBKIT_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00002_00000_00000_Webkit_Log.webkitlog
      Saving FILE Syslog.syslog (DEVICE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00002_00000_00002_Syslog.syslog
      Saving FILE TCP_dump_log.txt (RAW_FILE) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00002_00000_00004_TCP_dump_log.txt
      Saving LOG ListArtifactType.log.json (MESSAGE_LOG) at https://gha-artifacts.s3.amazonaws.com/device_farm/8641760330/1/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_e67a66d2-4e66-4bb2-a6f1-bc8d3b24a529_00002_00002_00000_LOG_ListArtifactType.log.json

From there, we can download the Test_spec_output.txt file to get output of the xcodebuild test-without-building command where we can see the tests are passing for CoreML, MPS, portable, and XNNPACK backends.

Testing started
Test suite 'All tests' started on 'PDX000194454 - DeviceFarm (715)'
Test suite '<bundle>' started on 'PDX000194454 - DeviceFarm (715)'
Test suite 'MobileNetClassifierTest' started on 'PDX000194454 - DeviceFarm (715)'
Test case 'MobileNetClassifierTest.testV3WithCoreMLBackend()' passed on 'PDX000194454 - DeviceFarm (715)' (0.931 seconds)
Test case 'MobileNetClassifierTest.testV3WithMPSBackend()' passed on 'PDX000194454 - DeviceFarm (715)' (2.860 seconds)
Test case 'MobileNetClassifierTest.testV3WithPortableBackend()' passed on 'PDX000194454 - DeviceFarm (715)' (1.226 seconds)
Test case 'MobileNetClassifierTest.testV3WithXNNPACKBackend()' passed on 'PDX000194454 - DeviceFarm (715)' (0.116 seconds)