Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race involving non-Sendable class may be induced when concurrently modified with task groups and actor isolation control #78360

Open
slice opened this issue Dec 24, 2024 · 3 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. crash Bug: A crash, i.e., an abnormal termination of software triage needed This issue needs more specific labels

Comments

@slice
Copy link

slice commented Dec 24, 2024

Description

By combining task groups, actor isolation control (isolated (any Actor)?), and a small trampoline function, a data race may be induced at runtime through concurrent access and modification of a Dictionary. Swift isn't supposed to let this code compile. When certain elements of this test case are removed, then an error is correctly emitted.

Reproduction

// xcrun swiftc -v -swift-version 6 -O Issue.swift && ./Issue

import Dispatch

@_optimize(none)
func blackHole(_: some Any) {}

func trampoline(
  _ operation: () async -> Void,
  isolation: isolated (any Actor)? = #isolation
) async {
  await operation()
}

final class State {
  var dict = [Int: Bool]()

  func spawnModifyingTasks() async {
    let absurd = 0...1_000_000
    await withTaskGroup(of: Bool.self) { group in
      for _ in absurd {
        group.addTask { .random() }
      }

      for await bool in group {
        dict[Int.random(in: absurd)] = bool
      }
    }
  }

  func continuouslyModify(isolation: isolated (any Actor)? = #isolation) async {
    await trampoline {
      await spawnModifyingTasks()
    }
  }
}

let state = State()

// concurrent modification
Task {
  await state.continuouslyModify()
}

// concurrent access
Task {
  while true { blackHole(state.dict.count) }
}

dispatchMain()

Stack dump

(lldb) r
Process 54454 launched: '/Users/skip/src/scraps/Contents' (arm64)
Process 54454 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8000000000000010)
    frame #0: 0x0000000100002674 Contents`closure #2 in  [inlined] generic specialization <Swift.Int, Swift.Bool> of Swift.Dictionary.count.getter : Swift.Int at Contents.swift:0 [opt]
   41   }
   42  
   43   // concurrent access
-> 44   Task {
   45     while true { blackHole(state.dict.count) }
   46   }
   47  
Note: this address is compiler-generated code in function generic specialization <Swift.Int, Swift.Bool> of Swift.Dictionary.count.getter : Swift.Int that has no source code associated with it.
Target 0: (Contents) stopped.
warning: Contents was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8000000000000010)
  * frame #0: 0x0000000100002674 Contents`closure #2 in  [inlined] generic specialization <Swift.Int, Swift.Bool> of Swift.Dictionary.count.getter : Swift.Int at Contents.swift:0 [opt]
    frame #1: 0x0000000100002674 Contents`closure #2 in  at Contents.swift:45:37 [opt]
    frame #2: 0x0000000100003948 Contents`(1) await resume partial function for partial apply forwarder for closure #1 () async -> () in Contents
    frame #3: 0x0000000100003138 Contents`specialized thunk for @escaping @isolated(any) @callee_guaranteed @async () -> (@out A) at <compiler-generated>:0 [opt]
    frame #4: 0x0000000100003948 Contents`(1) await resume partial function for partial apply forwarder for closure #1 () async -> () in Contents

Expected behavior

The code fails to compile with a concurrency-related diagnostic.

Environment

swift-driver version: 1.115.1 Apple Swift version 6.0.3 (swiftlang-6.0.3.1.10 clang-1600.0.30.1)
Target: arm64-apple-macosx15.0

Additional information

Removing the await trampoline { … } call correctly prevents compilation:

DataRace.playground/Contents.swift:30:11: error: sending 'self' risks causing data races
28 | 
29 |   func continuouslyModify(isolation: isolated (any Actor)? = #isolation) async {
30 |     await spawnModifyingTasks()
   |           |- error: sending 'self' risks causing data races
   |           `- note: sending 'isolation'-isolated 'self' to nonisolated instance method 'spawnModifyingTasks()' risks causing data races between nonisolated and 'isolation'-isolated uses
31 |   }
32 | }

However, even without the trampoline function, you can still induce a runtime crash by forwarding the actor isolation:

// […]

final class State {
  var dict = [Int: Bool]()

  func spawnModifyingTasks(isolation: isolated (any Actor)?) async {
    let absurd = 0...1_000_000
    await withTaskGroup(of: Bool.self) { group in
      for _ in absurd {
        group.addTask { .random() }
      }

      for await bool in group {
        dict[Int.random(in: absurd)] = bool
      }
    }
  }

  func continuouslyModify(isolation: isolated (any Actor)? = #isolation) async {
    await spawnModifyingTasks(isolation: isolation)
  }
}

// […]

This resultant binary compiled from this code also segfaults.

The trampoline function isn't as contrived as it seems because in the real code where the crash was occurring, trampoline is actually a helper function that provides latency measurements by trivially wrapping async closures.

This issue may very well be a duplicate of #76003 or #76016, but it might help with triage.

@slice slice added bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. crash Bug: A crash, i.e., an abnormal termination of software triage needed This issue needs more specific labels labels Dec 24, 2024
@slice
Copy link
Author

slice commented Dec 24, 2024

I can also reproduce the crash with the latest 6.1 snapshot toolchain as of 2024-12-24 (🎄):

Apple Swift version 6.1-dev (LLVM dd0f3b29e584617, Swift 12fb6cb92508551)
Target: arm64-apple-macosx15.0

@mattmassicotte
Copy link
Contributor

Looks at least somewhat related to #77301

@jamieQ
Copy link
Contributor

jamieQ commented Dec 27, 2024

yeah, i suspect the underlying issue here is likely similar or the same as in #76003 – the following reduction (which notably doesn't use any explicit isolation inheritance) will also trigger thread sanitizer:

final class NS {
  var state: [Int] = [1, 2, 3]

  func read() async {
    while true {
      try? await Task.sleep(for: .seconds(1))
      _ = self.state.last
    }
  }

  func mutate() async {
    while true {
      try? await Task.sleep(for: .seconds(1))
      let last = state.popLast()!
      state.insert(last, at: 0)
    }
  }
}

@MainActor
func runOnMain() async {
  let ns = NS()

  // read
  Task {
    await ns.read()
  }

  // write
  Task {
    await ns.mutate()
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. crash Bug: A crash, i.e., an abnormal termination of software triage needed This issue needs more specific labels
Projects
None yet
Development

No branches or pull requests

3 participants