stc-fse.tex

% !TEX root = thesis.tex
\startchapter{Leveraging Socio-Technical Networks}
\label{chap:stc-net}
\vspace{-6pt}
The first step towards exploring the usefulness of our approach is to test whether we can generate recommendations that break patterns related to build failure.
In order to do this, we focus on the following research question:

\begin{description}
  \item[RQ 2.1:] Can Socio-Technical-Networks be manipulated to increase build success? 
\end{description}

Intuitively, the idea is that two developers that share a technical dependency should talk, but in some cases when the technical relationship is trivial, instead of preventing failures such a recommendation might only create additional communication issues.
Thus, we explore to what extent the history of a project can be used to highlight the cases where developers that shared an unmet coordination need are found more often to failed builds.

In this chapter, we start by briefly presenting the methodology that is relevant to exploring our research question (cf. Section~\ref{sec:data} and~\ref{ch8:gaps}).
Then, we present the analysis and results we obtained in Section~\ref{ch:fse:result2} followed by a discussion of the results Sections~\ref{ch8:dis}.
In conclusion of this chapter, we offer an answer to our research question (cf. Section~\ref{sec:8:conclusions}).

\vspace{-5pt}
\section{Socio-Technical Coordination}
\label{sec:data}
We build our socio-technical networks according to the approach outlined in Chapter~\ref{chap:approach}:

\begin{enumerate}
\item Define scope of interest
\item Define outcome metric
\item Build social networks
\item Build technical networks
\end{enumerate}

As our scope, we select the software build and as outcome metric we use the build outcome.
We construct the social and technical networks from the Rational Team Concert development team repository as outlined in Chapter~\ref{chap:bg}.


\begin{table}[t]
\centering
\caption{Contingency table for technical pair (Adam, Bart) in relation to build
success or failure}
\begin{tabular}{rcc}
\toprule
 & successful & failed  \\
 \midrule
(Adam, Bart) & 3 & 13 \\
$\neg$ (Adam, Bart) & 224 & 86\\
\midrule
total&227&99\\\bottomrule
\end{tabular}
\label{tab:contingencytable}
\end{table}


\begin{table}[t]
\centering
\caption{Twenty most frequent \emph{technical pairs} that are failure-related}
\begin{tabular}{@{\hspace{.2cm}}ccc@{\hspace{.75cm}}c@{\hspace{.2cm}}}
\toprule
Pair & \#successful & \#failed & $p_x$\\
\midrule
(Cody, Daisy)	&	0&	12&	1		\\ %user11137.user4105.T
(Adam, Daisy)	&	1&	14&	0.9697	\\ %user11137.user2943.T
(Bart, Eve)	&	2&	11&	0.9265	\\ %user3493.user2435.T
(Adam, Bart)	&	3&	13&	0.9085	\\ %user3493.user2943.T
(Bart, Cody)	&	3&	13&	0.9085	\\ %user3493.user4105.T
(Adam, Eve)	&	4&	16&	0.9016	\\ %user2943.user2435.T
(Daisy, Ina)	&	3&	12&	0.9016	\\ %user11137.user13877.T
(Cody, Fred)	&	3&	10&	0.8843	\\ %user1976.user4105.T
(Bart, Herb)	&	3&	10&	0.8843	\\ %user3493.user6339.T
(Cody, Eve)	&	5&	15&	0.8730	\\ %user4105.user2435.T
(Adam, Jim)	&	4&	11&	0.8631	\\ %user2943.user9017.T
(Herb, Paul)	&	5&	12&	0.8462	\\ %user6339.user13875.T
(Cody, Fred)	&	5&	11&	0.8345	\\ %user11137.user1976.T
(Mike, Rob)	&	6&	13&	0.8324	\\ %user10979.user3385.T
(Adam, Fred)	&	6&	13&	0.8324	\\ %user2943.user1976.T
(Daisy, Fred)	&	8&	13&	0.7884	\\ %user3493.user1976.T
(Gill, Eve)		&	7&	10&	0.7661	\\ %user1264.user2435.T
(Daisy, Ina)	&	7&	10&	0.7661	\\ %user3493.user13873.T
(Fred, Ina)	&	8&	10&	0.7413	\\ %user1976.user13877.T
(Herb, Eve)	&	8&	10&	0.7413	\\ %user6339.user2435.T
\bottomrule
\end{tabular}
\label{tab:badtechpairs}
%}\hspace{1.3cm}
\end{table}
%

\begin{table}[t]
%\subfloat[The twenty corresponding \emph{socio-technical pairs}, which are not statistically related to failed builds.]{
\centering
\caption{The 20 most frequent statistically failure related technical pairs and the corresponding socio-technical pairs}
\begin{tabular}{@{\hspace{.2cm}}ccc@{\hspace{.75cm}}c@{\hspace{.2cm}}}
\toprule
Pair & \#successful & \#failed & $p_x$ \\
\midrule
(Cody, Daisy)	&	---&	---&	---\\
(Adam, Daisy)	&	---&	---&	---\\
(Bart, Eve)	&	1&	4&	0.9016\\
(Adam, Bart)	&	---&	---&	---\\
(Bart, Cody)	&	---&	---&	---\\
(Adam, Eve)	&	---&	---&	---\\
(Daisy, Ina)	&	---&	---&	---\\
(Cody, Fred)	&	1&	0&	0\\
(Bart, Herb)	&	1&	2&	0.8209\\
(Cody, Eve)	&	0&	3&	1\\
(Adam, Jim)	&	0&	1&	1\\
(Herb, Paul)	&	1&	0&	0\\
(Cody, Fred)	&	---&	---&	---\\
(Mike, Rob)	&	---&	---&	---\\
(Adam, Fred)	&	---&	---&	---\\
(Daisy, Fred)	&	---&	---&	---\\
(Gill, Eve)		&	---&	---&	---\\
(Daisy, Ina)	&	1&	0&	0\\
(Fred, Ina)	&	0&	2&	1\\
(Herb, Eve)	&	---&	---&	---\\
\bottomrule
\end{tabular}
%\caption{The twenty corresponding \emph{socio-technical pairs}, which are not statistically related to failed builds.}
%}
\label{tab:stechpairs}
%\label{tab:pairs}
\end{table}
%\addtocounter{table}{-1}


\section{Analysis of Socio-Technical Gaps}
\label{ch8:gaps}
The lack of communication between two developers that share a
technical dependency is referred to in the literature as a
socio-technical gap~\cite{valetto:msr:2007}. Because research suggests negative influence of such gaps, we are interested in analyzing pairs of developers that share a technical edge (implying coordination need), but not a social edge (implying
unmet coordination need), in socio-technical networks. We refer to these pairs of
developers as \emph{technical pairs} (there is a gap), and to those who do
share a socio-technical edge (there is no gap) as \emph{socio-technical pairs}. 

We analyze the technical pairs in relation to build failure. 
Our analysis proceeds in four steps:

\begin{enumerate}
\item Identify all technical pairs from the socio-technical networks.
\item For each technical pair count occurrences in socio-technical networks of
builds that failed.
\item For each technical pair count occurrences in socio-technical networks of
successful builds.
\item Determine if the pair is significantly related to success or failure.
\end{enumerate}

For example, in Table~\ref{tab:contingencytable} we illustrate the analysis of
the technical pair (Adam, Bart). This pair appears in three successful builds and in
13 failed builds. Thus, it does not appear in 224 successful builds, which is the total number of successful builds minus the number of successful builds the pair appeared in, and it is absent in 86 failed builds.
A Fischer Exact Value test yields significance at a confidence level of $\alpha = .05$ with a p-value of $4.273\cdot10^{-5}$.

Note that we adjust the p-values of the Fischer Exact Value test to account for multiple hypotheses testing using the Bonferroni adjustment.
The adjustment is necessary because we deal with 961 technical pairs that need to be tested. 

To enable us to discuss the findings, and to look at whether closing socio-technical gaps
are needed to avoid build failure, or which of these gaps are more important to
close, we perform two additional analyses. 
First, we analyze whether the
socio-technical pairs also appear to be build failure-related or not, by
following the same steps as above for socio-technical pairs. 
%
Secondly, we prioritize the developer pairs using the coefficient $p_x$,
which represents the normalized likelihood of a build
to fail in the presence of the specific pair:

\begin{equation}
p_x\text{=}\frac{ \text{pair}_{\text{failed}} / \text{total}_{\text{failed}} }
                     { \text{pair}_{\text{failed}} / \text{total}_{\text{failed}} + \text{pair}_{\text{success}} / \text{total}_{\text{successs}}}
\end{equation}

The coefficient is comprised of four counts: (1) pair$_{\text{failed}}$, the number of failed builds where the pair occurred; (2) total$_{\text{failed}}$, the number of failed builds; (3) pair$_{\text{success}}$, the number of successful builds where the pair occurred; (4) total$_{\text{success}}$, the number of successful builds.
A value closer to one means the developer pair is strongly related to build
failure.
We are using the percentages of failed and successful pairs in this equation to avoid an imbalance between successful and failed builds.
In our study the number of successful builds was twice as large as the number of failed builds and thus the occurrences in successful builds might be skew the coefficient towards a lower failure likelihood.


%\addtocounter{table}{1}
\begin{table}[t!]
\centering
\caption{Logistic regression only showing the technical pairs from Table~\ref{tab:badtechpairs}, the intercept, and the confounding variables, the model reaches an AIC of 706 with all shown features being significant at $\alpha=0.001$ level (indicated by ***).}
\begin{tabular}{cccc}
\toprule
Feature & Coefficient & p-value & \\
\midrule
(Intercept)            &  7.897e+74 &  $<$ 2e-16 & ***\\
\\
(Cody, Daisy)  &  	-5.669e+75  &  $<$ 2e-16 & ***\\
(Adam, Daisy)  &   -9.846e+75  &   $<$ 2e-16 &***\\
(Bart, Eve)  	&   -1.258e+75  &$<$ 2e-16 &***\\
(Adam, Bart)  	&   -1.605e+76  & $<$ 2e-16 &***\\
(Bart, Cody)  	&   -3.419e+76  & $<$ 2e-16 &***\\
(Adam, Eve)  	&   -2.610e+76  & $<$ 2e-16 &***\\
(Daisy, Ina)  	&  	-8.105e+74 	 &  $<$ 2e-16 &***\\
(Cody, Fred)  	&   -5.348e+76  &$<$ 2e-16 &***\\
(Bart, Herb)  	&   -2.977e+76  &$<$ 2e-16 &***\\
(Cody, Eve)  	&   -2.315e+76  & $<$ 2e-16 &***\\
(Adam, Jim)  	&   -2.724e+76    & $<$ 2e-16 &***\\
(Herb, Paul)  	&   -1.636e+76      &$<$ 2e-16 &***\\
(Cody, Fred)  	&   -1.645e+74     & $<$ 2e-16 &***\\
(Mike, Rob)  	&   -1.327e+75    & $<$ 2e-16 &***\\
(Adam, Fred)  	&   -5.250e+76     & $<$ 2e-16 &***\\
(Daisy, Fred)  &   -2.455e+75      &$<$ 2e-16 &***\\
(Gill, Eve)	  	&   -7.162e+75    & $<$ 2e-16 &***\\
(Daisy, Ina)  	&   -5.325e+74      &$<$ 2e-16 &***\\
(Fred, Ina)  	&   -2.777e+75     & $<$ 2e-16 &***\\
(Herb, Eve)  	&   -1.799e+75     & $<$ 2e-16 &***\\
\\
\#Change Sets per Build     & \phantom{-}6.480e+60 &   $<$ 2e-16 &***\\
\#Files changed per Build            &-4.530e+60 &  $<$ 2e-16 &***\\
{\#Developers contributed per Build}  &   \phantom{-}3.386e+61 & $<$ 2e-16 &***\\
\#Work Items per Build    &  -3.690e+61   & $<$ 2e-16 &***\\
\bottomrule
\end{tabular}
\label{tab:regression}
\end{table}


\section{Results}
\label{ch:fse:result2}
We found a total of 2,872 developer pairs in all the constructed
socio-technical networks, out of which 961 were technical pairs. %. The Fischer Exact Value tests show While a Fischer Exact Value test determined 120 technical pairs that are significantly correlated with build failure, none are statistically related with successful builds. Due to space constraints we display only twenty pairs in Table~\ref{tab:badtechpairs}.
We choose to present the twenty that are most frequent across failed builds.

We rank the failure relating \emph{technical} pairs (cf. Table~\ref{tab:badtechpairs})
by the coefficient $p_{x}$. This coefficient indicates the strength of
relationship between the developer pair and build failure. For instance, the
developer pair (Adam, Bart) appears in 13 failed builds and in 3
successful builds. This means that pair$_{failed}$ = 13 and pair$_{success}$ = 3
with total$_{failed}$= 99 and total$_{success}$= 227 result in $p_x$= 0.9016.
Besides that, we report the number of successful builds the pair was observed with
(\#successful), as well as the number of failed builds the pairs was observed with
(\#failed). The $p_x$ values are all above 0.74, which implies that the likelihood
of failure is at least 74\% in all builds in which these developer pairs are
involved. 

We then checked for the 120 pairs whether the corresponding \emph{socio-technical} pairs are related to failure.
Only 23 of the 120 technical pairs had an existing corresponding socio-technical pair of which none were statistically related to build failure. 
In Table~\ref{tab:stechpairs} we show the socio-technical pairs that match the 20 technical pairs shown in Table~\ref{tab:badtechpairs} as well as the same information as in Table~\ref{tab:badtechpairs}.
If the corresponding socio-technical pair existed we computed the same statistics as for the technical pairs, but for those that existed we could not find statistical significance.
Note that we use fictitious names for confidentiality reasons.

The failure-related technical pairs span 48 out of the total 99 failed builds in
the project. Figure~\ref{fig:builddistribution} shows their distribution
 across the 48 failed builds. The histogram
illustrates that there are few builds that have a large number of failure related
builds (e.g. 4 with 18 or more pairs) but most builds only show a small number of
pairs (15 out of 48 failed builds have 4 or less). 
%Not only does 
This distribution of technical pairs indicates that the developer
pairs we found  did not concentrate in a small number of builds. 
In addition, it validates the assumption that it is
worthwhile to seek insights about developer coordination in failed builds.
%Moreover, this enables us to explain why two thirds of the builds failed.


\begin{figure}[t]
\centering
\vspace{-1cm}
\includegraphics[width=\columnwidth]{figures/builddistribution}
\vspace{-.75cm}
\caption{Histogram of how many builds have a certain number of failure-related technical pairs.}
\label{fig:builddistribution}
\end{figure}

\vspace{-9pt}
\section{Discussion}
\vspace{-8pt}
\label{ch8:dis}
These results show that there is a strong relationship between certain technical
developer pairs and increased likelihood of a build failure.
Out of the total of 120 technical pairs that increase the likelihood of a
build to fail, only 23 had an existing
corresponding socio-technical pair. Of these, none were statistically
related to build failure. This means that 97 pairs of developers that had a
technical dependency did not communicate with each other and
consequently increased the likelihood of a build failure. Our results not only
corroborate the past findings~\cite{cataldo:cscw:2006,cataldo:esem:2008} that socio-technical gaps
have a negative effect in software development, but more importantly, they indicate
that the analysis presented in this paper is able to identify the specific
socio-technical gaps, namely the actual developer pairs where the gaps occur
and increase the likelihood of build failure. 

A preliminary analysis of developer membership to teams shows that most
of the technical pairs related to build failure consist of developers belonging to
different teams. Nagappan et al.~\cite{nagappan:icse:2008} found that using the
organizational distance between people predicts failures. They reasoned that this
is due to the lack of awareness that people separated by organizational distance
work on. Although the Rational Team Concert team strongly emphasizes communication
regardless of team boundaries, it still seems that organizational distance has
an influence on its communication behaviour.


Although the analysis of technical pairs in relation to build failures
showed that they have a negative influence on the build result, it does not take
into account possible confounding variables. The question is whether there are developer pairs still affect the build outcome in the presence of confounding variables, such as number of developers involved in a build, number of work items
related to a build, number of change sets per build, and number of files changed.
We tested this using a logistical regression model including both developer pairs and confounding variables.

Overall, the logistic regression confirms the effect of the developer pairs
even in the presence of confounding variables, such as the number of files changed,
and numbers of developers (AIC of 706). 
Table~\ref{tab:regression} shows an excerpt from the complete logistical
regression that demonstrates that the twenty pairs previously identified are still
significant under the influence of the four confounding variables. Because we have 2872 developer pairs, space constraints prevent us from reporting the entire regression model. We chose to report on the coefficients for the twenty pairs that we reported in Table~\ref{tab:regression}, as well as the coefficients for the four control features.

All features shown in Table~\ref{tab:regression} are significant at the $\alpha=.001$ level (indicated by the p-value and ***).
We checked for all the other technical pairs that were reported significant using the approach described in the previous section and found that they are all significant in the regression model as well.
Moreover, the four control variables of the number of developers per build, number
of work items per build, number of change sets per build, and number of files changed per build are also significant.

Since our model failed builds with 0 and successful builds with 1, a negative coefficient means that the feature increases the chances of a failure.
All pairs reported in Table~\ref{tab:regression}, and the technical pairs that we identified to be related to build failure, have a negative coefficient.
 

\vspace{-5pt}
\section{Summary}
\label{sec:8:conclusions}
\vspace{-5pt}
We end this chapter by returning to the initial research question we set out to answer:
\begin{description}
  \item[RQ 2.1:] Can Socio-Technical-Networks be manipulated to increase build success? 
\end{description}

The results we presented in Section~\ref{ch:fse:result2} show that there are developer pairs with unmet coordination needs that negatively influence build success.
Furthermore, we demonstrated that the corresponding developer pair that fulfills its coordination need is less likely to be associated with build failure and thus theoretically decreasing the likelihood of build failure.
To put it in perspective, if any of the twenty most harmful patterns is present in a build, the build has at least  a 74\% chance to fail.

Besides the technical side of the approach to generate statistically relevant recommendations as Murphy and Murphy-Hill~\cite{murphy:rsse:2010} pointed out, developers need to accept such recommendations for them to be ultimately useful.