forked from schadr/ChatToSucceed
-
Notifications
You must be signed in to change notification settings - Fork 0
/
introduction.tex
146 lines (99 loc) · 14.3 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
% !TEX root = thesis.tex
\startfirstchapter{Introduction}
The software industry often visible through some of their biggest companies such as Microsoft, Google, IBM, Dell, Apple, Oracle, and SAP represent several hundred billion US Dollars of profit a year.
For example the software industry in USA in 2002 was producing according to the US Census a total revenue of 103.7 billion USD\footnote{http://www.census.gov/prod/ec02/ec0251i06.pdf last visited May 10th, 2012}.
As many engineering companies those companies in the software industry strive to optimize their engineering processes to produce software of higher quality in less time.
Software engineering researchers all over the world have dedicated countless hours to improve the way software is developed.
Several fields some not directly aimed at increasing productivity such as developing better programming languages~\cite{conf:prog:lang}, smarter compilers~\cite{cong:comp:constr}, and better education in algorithms and data structures~\cite{conf:sigcse} contribute indirectly.
Other fields are more directly interested in productivity, among them are research in software processes~\cite{conf:icssp}, effort estimation~\cite{molkken:isese:2003,boehm:analse:2000}, and software failure prediction~\cite{conf:promise}.
The vast body of knowledge accumulated to improve the software engineering process is strongly biased towards analyzing the technical side: supporting coding activities (e.g.~\cite{bassil:iwpc:2001,mens:tse:2004}) and analyzing source code to improve quality~\cite{zimmermann:oopsla:2005,nagappan:icse:2006}.
Since producing source code is the main objective of software developer optimizing the coding aspect~\cite{bassil:iwpc:2001,mens:tse:2004} as well as analyzing the produced code for issues~\cite{nagappan:icse:2005,schroeter:isese:2006} lies at hand.
Others have focused on the people that produce the code. Studying their behavior around coding activities~\cite{latoza:icse:2006} how they communicate~\cite{ko:icse:2007,gopal:2002:comacm} and how their relations relate to productivity~\cite{gopal:2002:comacm} and quality~\cite{abreu:iwpse:2009,wolf:icse:2009}.
As in the former case there is much merit in focusing on the developer in the end she implements the features a software consists of and she inevitably introduces errors to the code base.
Both avenues, studying the human aspect and studying the technical aspect, yielded many useful results.
For example, on the human side, the organizational distance between developer is a good predictor of failure on file level~\cite{nagappan:icse:2008}, and on the technical side similar changes timely close are a good failure predictor~\cite{kim:icse:2007}.
Yet, to truly be able to optimize the software engineering process a more holistic view is needed that marries both the technical and social aspects.
One such way to marry those two aspects that as Conway stated are influencing each other~\cite{conway:datamination:1968} is to use the concept of socio-technical congruence in software engineering first formalized by Cataldo et al~\cite{cataldo:cscw:2006}.
They proposed to overlay networks constructed from social (who communicates with whom) and technical (whose code depends on whose source code) dependencies to get an overview of a projects social and technical interdependencies and derive insight through the miss-match between those two networks.
To be more precise, socio-technical congruence as defined by Catalto et al~\cite{cataldo:cscw:2006}, describes a measure that outlines how much the technical dependencies in the product are matched by social interactions among developers affected by these technical dependencies.
This directly follows from Conway's observations~\cite{conway:datamination:1968} that the communication structure of any given organization dictate the underlying technical dependencies.
In software engineering that roughly translates into the idea that the communication flow within software teams need to follow the module dependencies described by the software architecture.
This idea shows great promise in applying to software repositories such as versioning archives and issue trackers or other recorded communication.
Cataldo et al~\cite{cataldo:cscw:2006,cataldo:esem:2008} as well as other researchers~\cite{valetto:msr:2007,ehrlich:stc:2008} found that the better the satisfaction of the technical dependencies with social interaction is, the better productivity and to some extend software quality~\cite{kwan:tse:2011,bird:issre:2009,kwan:stc:2009}.
The ability to extract useful socio-technical measures from archives in an automated fashion enables the application to any software project that captures development data electronically.
However, we see three major issues with the concept of socio-technical congruence as it is currently used:
\begin{itemize}
\item The socio-technical congruence measure itself does not give much indication with respect to how to improve the over all situation other than to suggest people to talk to each other in case they share a technical dependency.
\item The idea of achieving high congruence is based on the notion that it is important to communicate along all technical dependencies, which is not necessarily true.
\item The analysis of socio-technical congruence can only be done post-mortem, which although valuable in a retrospective does not help in improving productivity or quality in an ongoing project.
\end{itemize}
% item 2
The issue of imbalance between technical and social relationships between developers is related to the problem of not knowing how to improve the socio-technical congruence other than by pointing out the technical relationships between developers that did not communicate with each other.
Given enough resources and time every technical dependency can be satisfied but this might run the risk of decreasing the productivity by introducing to many interruptions.
% item 3
Over-communication of technical dependencies might arise from the underlying assumption that every technical dependency warrants the dependent developers to communicate with each other.
We are not solely referring to the ability of developers to read environment traces~\cite{bolici:stc:2009} but also to the fact that some changes are either not meant to be communicated or that the system architecture was designed to accommodate certain changes (think of optimizations) that should not affect other developers.
% item 4
To fully leverage the concept of socio-technical congruence it is important to act on it.
The current concept is only shown to relate to performance and quality post-mortem.
To truly unlock the potential of the socio-technical congruence concept it needs to be extended such that it can make on demand recommendations to improve congruence between the social and technical relations.
% how do we address the issues
With this thesis we intend to address these issues in two ways:
\begin{description}
% item 3
\item[What technical dependencies need to be met with communication?]
Although to recommend every developer to talk to every other developer about their work seems to be the easiest solution to gaining a perfect socio-technical congruence index, as mentioned earlier, it could decrease productivity due to the heavy overhead caused by constant communication.
To address this issue we not only seek out which technical dependencies exist among developers, but we go one step further and try to find those technical dependencies when not accompanied by communication are the most harmful to the project.
Instead of focusing on recommending changes to the source code to remove technical dependencies we focus on improving the communication among developers.
Because changes to the technical dependencies would imply partly to re-architect the product, which is both time intensive and risky with respect to introducing unforeseeable complication, and thus economically undesirable.
Additionally as customers rarely derive any tangible benefits from re-architecting a product, there is little willingness to pay for this type of work.
% item 4
\item[How to make socio-technical congruence actionable?] Although socio-technical congruence can be continuously computed and the previously mentioned strategies can be applied in real time, they all take a more project centred perspective.
To support developers to engage in communication when necessary they need to be informed of potential issues with respect to socio-technical congruence as they arise.
Building on the concept of proximity proposed by Blincoe et al~\cite{blincoe:cscw:2012}, we studies in depth the development interactions of a student project and the relation between issues and their fine grained real-time code dependencies.
\end{description}
As Murphy et al~\cite{murphy:rsse:2010} pointed out, users of automated recommendation systems need to trust the system otherwise they will ignore it.
This is especially true when continuously reporting information to developers and trying to steer them into another direction.
Therefore, we investigate what the daily focus of developer is when it comes to communication to gauge if the level of recommendation provided by most methods derived or related to socio-technical congruence might bear fruit.
% some of the findings as a teaser
Socio-technical congruence forms a great basis to leverage several digitally recorded data treasures to generate useful and actionable information.
Patterns of developer pairs showed that there are developers when not talking to each other yet sharing a technical dependency harmed the quality of the project, yet most of those pairs are not statistically related to quality.
Furthermore, we found in a student project that certain issues experienced during development can be traced back to code dependencies that could have been detected in real time.
% the two top level research questions
Instead of using the outcome metric traditionally associated with socio-technical congruence, performance, we focus on build outcome as a metric for software quality.
Although build outcome is rarely considered when studying software quality, as it a course measure that often indicates multiple issues rather than a single specific one, as for instance it is the case with pre- and post-release failures.
Nonetheless, build success is fundamental in creating a product that can be shipped to a customer.
Often a successful build indicates that not only all test cases deemed important passed, a successful build towards the end of the release cycle often is the only indicator of customer acceptance with respect to requested features and their stability.
Hence, build success is of utmost importance to a business as it forms the very product the business hopes to sell.
Therefore the two guiding research questions we address in this thesis to investigate whether socio-technical congruence can be used to generate actionable knowledge that can increase build success are:
\begin{description}
\item[RQ 1:] Does Socio-Technical Congruence influence build success?
\item[RQ 2:] Can Socio-Technical Networks can be leverage to generate recommendations to improveme build success?
\end{description}
% methodology overview
We are using a mixed methods approach to explore these two research questions.
For \textbf{RQ~1} we employ data mining techniques by studying the artifacts such as task discussions and source code changes of a large industrial software project.
The second research question (\textbf{RQ~2}) requires both quantitative and qualitative analysis methods.
To find statistically relevant recommendations we employ data mining techniques, but to explore the usefulness and acceptance of such recommendations we make use of questionnaires, interviews, and observational studies.
\section{Thesis Contribution}
% contribution(s)
The contribution of this thesis lies showing that socio-technical congruence can be used to create preventative recommendation and that developer are generally accepting of recommendations of the form as well as in formulating a step-wise approach to derive such recommendations.
This stepwise approach select a proper scope and outcome metric that through data mining techniques such as frequent pattern mining.
This step-wise approach includes five steps:
\begin{enumerate}
\item Define scope of interest.
\item Define outcome metric.
\item Build social networks according to the scope in real time.
\item Build technical networks according to the scope in real time.
\item Generate actionable insights.
\end{enumerate}
Throughout this thesis we explore the feasibility of this approach by providing examples showing how to apply it as well as the acceptance of the outcome by software developers.
\section{Thesis Overview}
This thesis is divided into three parts.
In part one we start motivate the more detailed research by going over related work in Chapter~\ref{chap:bg}.
Before we delve into presenting our overarching methodology with explanations of frequently used constructs and analysis methods in Chapter~\ref{chap:meth}, we will present IBM Rational Team Concert (RTC) as well as some key factors of the development team as much of our research was done in collaboration with the RTC development team (Chapter~\ref{chap:rtc}).
Part two is comprised of Chapters~\ref{chap:soc-net} and~\ref{chap:stc-net2} that build the foundation for our approach, which we formulate in Chapter~\ref{chap:approach}, by investigating the relationship between social networks and build success and socio-technical networks, specifically gaps, and build success.
Knowing that the social network might lend itself to manipulations with positive affects with respect to build success as informed by the respective socio-technical network, we study the development history of the IBM Rational Team Concert development team for recurring patterns of developer pairs that do not coordinate and their statistical relationship to build success (Chapter~\ref{chap:stc-net}).
We continue by presenting a study in Chapter~\ref{chap:talk} investigating whether the recommendations resulting from those patterns are of use to developers and when the best time to present such recommendations is.
Before concluding this thesis with discussing how our approach to leverage socio-technical congruence (Chapter~\ref{chap:disc}) is supported by the evidence uncovered through our studies, we present a study in Chapter~\ref{chap:actionable} which showed evidence that
recommendations that our approach can generate could have prevented build failures.