-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathconclusion.tex
17 lines (9 loc) · 3.97 KB
/
conclusion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
\label{chapter:conclusion}
Because of hard physical limits, computer manufacturers have turned to providing more and more cores on a single machine. This phenomenon drives the biggest revolution of software development: software has to be programmed in a concurrent and parallel way in order to exploit the benefits of multi-core machines.
Building efficient and reliable concurrent software is still a challenging task. First, concurrency requires programmers to think in an unnatural way that humans find difficult. Second, existing languages and tools are inadequate to detect or prevent concurrency errors.
\section{Contributions}
This thesis helps boost the performance and ease the reasoning and debugging, by providing different tools and runtime systems. We present a novel processes-as-threads replacement library, the \sheriff{} framework, which providing per-thread memory protection and isolation on the page granularity. First, based on this framework, we provide \dthreads{} to ensure deterministic execution of multithreaded programs, even with race conditions. \dthreads{} outperforms the previous state-of-the-art runtime system (CoreDet) by a factor of 3, and is the new basis of all later deterministic multithreading systems. Second, we presents two tools based on the \Sheriff{} framework, \SheriffDetect{} and \SheriffProtect{}, to deal with false sharing problems of multithreaded programs, one of the notorious performance problems. \SheriffDetect{} is the first tool to correctly and precisely identify false sharing problems inside parallel applications. \SheriffProtect{} is the first generalized system to automatically mitigate false sharing problems, without the need of programmer intervention. Finally, we present another tool, \predator{}, to improve the effectiveness by revealing read-write false sharing problems and overcome a generalized issue of false sharing detection: Existing tools can only detect those observed false sharing problems; \Predator{} can predict potential false sharing that does not manifest in a given execution but may appear---and greatly degrade application performance—--in a slightly different execution environment. \Predator{} is the first false sharing tool that is able to automatically and precisely uncover false sharing problems in real applications, including MySQL and the Boost library.
\section{Future Work}
\dthreads{} performs synchronizations inside serial phases, which is susceptible to delays due to load imbalance between threads. To handle this problem, one direction of future work is to reduce the waiting time caused by load imbalance problem. We observed that the overhead of \dthreads{} depends on the number of synchronizations: with less synchronizations, \dthreads{} can achieve much better performance since it can amortize the overhead better. Another direction of future work is to design programs with \dthreads{}'s mechanism in mind, by extending a set of APIs, so that users can design programs with less load imbalance problem and less synchronizations. Thus, we could possibly achieve better performance.
This thesis also presents a set of tools to detect false sharing problems inside multithreaded programs. But false sharing problems can exist in the entire software stack, including hypervisors, operating systems, and applications using different threading libraries or other languages. In the future, we would like to extend the detection mechanism, coming from \predator{}, to the entire software stack. Also, we can leverage memory trace information to suggest fixes, in order to help programmers to eliminate false sharing.
\SheriffProtect{} introduces some performance overhead when a parallel program does not have false sharing problem inside. It is helpful if this protection mechanism can leverage the output of detection: we only use this mechanism to boost the performance if an application has some false sharing problems inside; further, we can employ isolation on specific objects in order to further reduce performance overhead.