Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WISE Bug]: Race Condition in Simulation #60

Open
3 tasks
boyter opened this issue Nov 23, 2024 · 2 comments
Open
3 tasks

[WISE Bug]: Race Condition in Simulation #60

boyter opened this issue Nov 23, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@boyter
Copy link

boyter commented Nov 23, 2024

Contact Details

bboyte01@gmail,com

What happened?

There is a race condition in the following code inside WISE_Project/cpp/Scenario.cpp lines 774-810

HRESULT Project::Scenario::Simulation_Step()
{
	if (!m_childArray.size())
		return m_scenario->Simulation_Step();

	HRESULT hr = S_OK;
	ULONG complete_count = 0;
	int si;
	int thread_id = -1;
#pragma omp parallel for num_threads(CWorkerThreadPool::NumberIdealProcessors()) firstprivate(thread_id)
	for (si = 0; si < m_childArray.size(); si++)
	{
		if (thread_id == -1)
		{
			thread_id = omp_get_thread_num();
            CWorkerThread::native_handle_type thread = CWorkerThreadPool::GetCurrentThread();
			CWorkerThreadPool::SetThreadAffinityToMask(thread, thread_id);
		}
		HRESULT hr1 = m_childArray[si]->Simulation_Step();
		if (FAILED(hr1))
			hr = hr1;
		if ((hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_EXTENTS) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_ASSET) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_FI90) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_FI95) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_FI100) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_RH) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_PRECIP) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_AREA) ||
			(hr1 == SUCCESS_SCENARIO_SIMULATION_COMPLETE_STOPCONDITION_BURNDISTANCE))
			complete_count++;
	}
	if (complete_count == m_childArray.size())
		return SUCCESS_SCENARIO_SIMULATION_COMPLETE;
	return hr;
}

My C++ is poor at best, but complete_count++; is not thread safe, and is executed in #pragma omp parallel

Since the final check if (complete_count == m_childArray.size()) relies on this being the size of the array there is the potential for this to fail, despite actually being successful where multiple threads write to the ULONG.

I have not been able to replicate this so far so no log output, but I can see this happening at scale.

Version

(Ubuntu 2020) v1.0.0-beta

What component are you seeing the problem on?

No response

Relevant log output

No response

Approvals Process

  • Testing For Issue
  • Executive Approval
  • Merge
@boyter boyter added the bug Something isn't working label Nov 23, 2024
@RobBryce
Copy link
Contributor

Integer increments are atomic operations so are thread safe. However, I'll change it to std::atomic<> to help with cache issues on multi-CPU machines. Good catch!

@boyter
Copy link
Author

boyter commented Nov 24, 2024

My C++ is pretty weak these days but last time I checked increments like this are not guaranteed either by the spec or the compiler to be a single operation. So while they can compile down to increment, they can also be load/increment/store. Again that depends on your compiler settings and if the target architecture supports it.

I shall defer to you on this though, its likely it was setup to support this. Nice to see adding atomic though just in case, and it also makes it easier to see it is expected :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants