forked from schadr/ChatToSucceed
-
Notifications
You must be signed in to change notification settings - Fork 0
/
stc-fse.tex
946 lines (847 loc) · 53.1 KB
/
stc-fse.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
% !TEX root = thesis.tex
\startchapter{Socio-Technical Congruence and Failure}
\label{chap:stc-net}
The first step we take towards exploring the usefulness of our approach is to test whether we can generate recommendations that break patterns statistically related to build failure.
Thus we focus on the following research question:
\begin{description}
\item[RQ 2.1:] Can Socio-Technical-Networks be manipulated to increase build success?
\end{description}
Intuitively, the idea is that two developers that share a technical dependency should talk, but in some cases when the technical relationship is trivial instead of preventing failures such a recommendation might only create additional communication overhead.
Thus we explore to what extend the history of a project can be used and highlight those cases where developer that shared an unmet coordination need can be brought more often into relation with failed builds than with successful builds.
In the remainder of this chapter, we start with detailing the back ground relevant to studying communication in relationship to software builds or to integration issues in software development in a broader sense (Section~\ref{ch8:bg}).
Subsequently, we briefly go over the methodology that is relevant to exploring our research question (Section~\ref{sec:data} and~\ref{ch8:gaps}).
Then, we go over the analysis and results we obtained in Section~\ref{ch:fse:result2} followed by a discussion of the results and threats to validity in Sections~\ref{ch8:dis} and~\ref{ch8:threat} respectively.
We conclude this chapter with offering an answer to our research question and leading into the subsequent Chapter~\ref{chap:talk} (Section~\ref{sec:8:conclusions}).
\section{Background}
\label{ch8:bg}
In this section we discuss some background related to improving coordination among software developers.
We start with some motivation of the problem and continue with presenting work performed by other researchers.
\subsection{Motivation}
We hypothesize that with the ever growing size of software teams the lack of
effective coordination is the main source of integration failures. With the
ever growing complexity and sophistication of large software projects,
error-free integrations are not only important but difficult to achieve. The
development work that precedes integrations involves significant coordination of
developers that work in teams and need to rely on the code of others and its
stability. But often code is everything but stable, further contributing to
developers' need to coordinate to keep up with code changes that impact their work. This problem is
amplified in software builds where an entire team needs to integrate their work
and on which the development of new features depends. Not only do
failed builds destabilize the product~\cite{cusumano1997} but they also demotivate
software developers~\cite{holck2004}.
Despite their importance, keeping integrations builds error-free can be a very time consuming
process. A lightweight approach that can determine whether the build contains
failures before invoking the build process is thus very valuable to developers.
This lightweight approach could determine a builds outcome in minutes rather than hours or days. Having a faster way to
assess the quality of a build helps developers to continue working with newest
builds while being aware of its quality. Previous
research~\cite{wolf:icse:2009,hassan:ase:2006} trained predictive models to assess the quality of software builds without the need of invoking large test
suits. Although this research
reaches a high degree of accuracy in their predictions, knowing that a
build will fail does not necessarily help developers to actually prevent
the build from failing.
The goal of this research is to find a way to create actionable knowledge that
developers can act upon to avoid
integration failure.
Maintaining proper communication and awareness of work others perform is
important in any kind of project. Specifically in software engineering many studies found
that factors such as geographical and organizational distance have an impact on
communication and even effect software quality~\cite{nagappan:icse:2008}. In our
study we uncover the existence of pairs of
developers, that, if technically dependent in a build but not discussing their
dependencies, have a negative influence on the success on builds. This
actionable knowledge can be integrated in real-time recommender systems that
indicate, based on project historical data, which developer pairs tend to be
failure related. Developers and management can then devise strategies to
prevent the failure before build time.
\subsection{Related Work}
\label{sec:relwork}
In order to manage changes and maintain quality, developers must coordinate. In
software development, coordination is largely achieved through communicating with
people who depend on the work that you do \cite{kraut:1995coordination}. The
software engineering literature is recognizing the role of communication as
something that should be nurtured not eliminated and recent
collaborative software development environments aim to support developers'
social interactions along with artifact creation activities~\cite{nakakoji2010:rdc}.
Ehrlich et al.~\cite{ehrlich:icgse:2006} investgiated how social networks can be
used to leverage knowledge in distributed teams. Backstrom et
al.~\cite{backstrom:kdd:2006} took a more general approach and investigated the
evolution of large social networks and the information they hold. Chung et
al.~\cite{chung:cpr:07} reported in recent work about behavior of individuals
while performing knowledge intensive tasks. There have been a number of studies
that investigated communication structures to identify good
coordination practices
(e.g.~\cite{hinds:cscw:2006,hossain:cscw:2006,bird:fse:2008,hinds:hicss:2008}). In contrast to studies of the general development process, Marczak studied social
networks to identify best practices for requirements management
processes~\cite{marczak:re:2008}.
Inspired by Conways Law~\cite{conway:datamination:1968}, Cataldo et
al.~\cite{cataldo:cscw:2006,cataldo:esem:2008} formulated a coefficient that
measures the alignment of the social and technical networks defining the term of
socio-technical congruence. They observed that higher socio-technical congruence
leads to higher developer
productivity~\cite{cataldo:cscw:2006,cataldo:esem:2008}. Others used this
notion and coefficient to further investigate the effect of congruence
(e.g.~\cite{valetto:msr:2007}). Prior to Cataldo et
al.~\cite{cataldo:cscw:2006,cataldo:esem:2008} proposal,
Ducheneaut~\cite{ducheneaut:cscw:2005} investigated the evolution of social and
technical relationships of open source project participants to see how those
participants become a part of the community.
Recent studies started to relate the social with the technical
dimensions of software development to build predictive models. Pinzger et
al.~\cite{pinzger:fse:2008} successfully used social networks connecting
developers via code artifacts to predict failures. Meneely et
al.~\cite{meneely:fse:2008} used similar networks but excluded the code artifacts
and connected the developers directly. Two studies at Microsoft looked into the
geographical~\cite{bird:acm:2009} and organizational~\cite{nagappan:icse:2008}
distance between people that worked on the same binary and the relation to the
failure proneness of said binary. They found that the organizational distance is
a very powerful predictor of failure proneness of binaries whereas the
investigation of geographical distance has little to no effect. A recent
study~\cite{bird:issre:2009} combines the work of Pinzger et
al.~\cite{pinzger:fse:2008} and
Zimmermann~\cite{zimmermann:icse:2008} by creating
socio-technical networks that capture developer contributions and binary
interdependencies. They found this combination to be a more powerful predictor
that works for different software project and even prevails across multiple
revisions of a project.
%\begin{table}[t]
%\centering
%\begin{tabular}{rrccc}
%\toprule
%& & Successful & Failed & Total\\
%\midrule
%&min &1&1&1\\
%\#work items & avg & 16.68&26.52&19.63\\
%& max & 111&109&111\\
%\midrule
%& min & 1&1&1\\
%\#change sets & avg & 26.71&46.27&32.57\\
%& max & 227&194&227\\
%\midrule
%& min & 1&1&1\\
%\#Developers & avg & 19.62&28&22.16\\
%& max &64&71&71\\
%\bottomrule
%\end{tabular}
%\caption{Statistics on Rational Team Concert data: change sets, work items,
%and developers over successful (227), failed (99) and total builds (328).}
%\label{tab:jazzbuildinfo}
%\end{table}
\section{Socio-technical coordination}
\label{sec:data}
Following the approach we build our socio-technical networks according to the approach outlined in Chapter~\ref{chap:approach}:
\begin{enumerate}
\item Define scope of interest.
\item Define outcome metric.
\item Build social networks according to the scope in real time.
\item Build technical networks according to the scope in real time.
\end{enumerate}
As our scope we select the software build and as outcome metric whether a build failed or succeeded.
We construct the social and technical networks from the Rational Team Concert data as outline in Chapter~\ref{chap:bg}.
%\begin{figure}[t]
%\centering
%\subfloat[Evaluation results from the Support Vector Machine.] {
%\includegraphics[width=.5\columnwidth]{figures/precission-recall}
%\label{fig:prediction-svm}
%}
%\subfloat[Evaluation results from the Logistic Regression.] {
%\includegraphics[width=.5\columnwidth]{figures/precision-recall-logreg}
%\label{fig:prediction-logreg}
%}
%\caption{Plotting the precision (green downward pointing triangles) and recall (black hollow circles) of the support vector machine (left) and the logistic regression (right).}
%\label{fig:prediction}
%\end{figure}
%\section{Build Failure Prediction}
%\label{sec:prediction}
%To answer our first research question, we build a predictive model that uses the
%constructed socio-technical networks as input to predict whether a build succeeds
%or fails. Since we are interested in the practical application of this model we
%diverge from the standard evaluation tactic and use a more practical relevant
%approach. After we present the results of our prediction model we discuss their
%implications.
%
%\subsection{Model Evaluation}
%We train several prediction models, such as logistic regression, support vector
%machines, decision trees, and a bayesian classifiers, using features we
%extract from the constructed socio-technical networks. Each feature represents a pair of
%connected developers in the network and the type of the edge that they are
%connected with (i.e. social, technical or socio-technical).
%
%To accomplish a more practical evaluation, we order all our social networks by the time the build they were constructed from was tested.
%Having the networks ordered according to build time we evaluate our prediction model in the following five steps:
%
%\begin{enumerate}
%\item Get the first $n$ networks and perform a principle component analysis on the extracted features.
%\item Select the principal components that explain the most variance until 95\% of the total variance of the training set can be explained.
%\item Train the model using the principal components of the first $n$ networks.
%\item Test the model on network $n+1$ after transforming it to the determined principal components.
%\item Increase $n$ by 1 and repeat until $n+1$ is the size of complete data set.
%\end{enumerate}
%
%This evaluation technique is meant to simulate the actual usage of the prediction model.
%In software practice builds come in one by one.
%This means that whenever a build has been verified the model can be extended using the social network from the newest build for training.
%This method of evaluation is closer to the actual usage in the field than
%random splits or cross validation.
%
%We used two coefficients to assess the models quality at any given time: recall (Eq.~\ref{eq:recall}) and precision (Eq.~\ref{eq:precision}).
%The recall of a model describes the percentage of how many failed builds where predicted correctly.
%This translates into the formula:
%
%\begin{equation}
%\label{eq:recall}
%\text{recall}=\frac{\text{Correctly \emph{as failed} Predicted Builds}}{\text{All Failed Builds}}
%\end{equation}
%
%Precision on the other hand describes the percentage of how many of the \emph{as failed} predicted builds are actually failed builds. Thus we can express precision using the following formula:
%
%\begin{equation}
%\label{eq:precision}
%\text{precision}=\frac{\text{Correctly \emph{as failed} Predicted Builds}}{\text{All \emph{as failed} Predicted Builds}}
%\end{equation}
%
%Both recall and precision lie in the interval from 0 to 1, with 1 being best and 0 being worst.
%We compute the recall and precision for each iteration by accumulating the prediction results of all previous prediction results.
%For example if we went through 20 iterations we create a contingency table from the predicted results for the last 20 builds.
%Each prediction can have one of four outcomes: (1) correctly predicted \emph{as failed}, (2) correctly predicted as succeeded, (3) falsely predicted \emph{as failed}, and (4) falsely predicted as succeeded.
%Adding those numbers till the most recent prediction enables us to compute precision and recall.
%
%
%The current standard evaluation for failure prediction models in software engineering is to take the data set and generate a number of random splits (e.g.~\cite{zimmermann:icse:2008,schroeter:isese:2006,nagappan:icse:2008}).
%A random split partitions the data into a training and a testing data set, where the model is trained with the training data and then evaluated using the testing set.
%Other also used cross validation which creates $n$ partitions and tests with each partition while training with the remaining $n-1$ partitions (e.g.~\cite{wolf:icse:2009}).
%
%Although random splits and cross validation allow for a random combination, they completely ignore the explicit order.
%This leads to the problem that random splits and cross validation allow features that might emerge later and should not have been available to be used for training.
%For example, if a developer joins the project after it started, she cannot have been present in any of the networks previous to her date of joining the project.
%This means that the model at first cannot use any information about her connections to others.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%
%\subsection{Results}
%Of the prediction models we evaluated, we present the results for the support
%vector machine and the logistical regression. The support vector machine produced
%the best results whereas the logistical regression serves as comparison to the
%support vector machine results as well as indicating the reliability of
%the regression analysis presented later in Section~\ref{sec:pattern}.
%
%Figure~\ref{fig:prediction} shows the recall and precision values for the
%support vector machine and the logistical regression models. The green downward
%pointing triangles represent the precision of the model for each iteration, note that we started training each model with at least three data points. The black circles represent the recall of a model for each
%iteration.
%
%Both Subfigures in Figure~\ref{fig:prediction} can be divided into three
%sections. The first section is comprised of the first 70-80 iterations where,
%by a manual investigation, we observe that the support vector machine predicts almost everything to be successful in contrast to the unstable logistic regression.
%
%The middle section is characterized by a peek efficiency between 100 and 150
%iterations in both prediction models. Before that peak both models under-perform, where the support vector machine suffers more than the logistic regression from a small data set.
%
%In the last segment after 150-180 iteration the precision and recall values
%stabilize over both models. In contrast to the logistic regression the support vector machine obtains a higher precision and a lower recall with a slight upward trend.
%The support vector machine ended with a precision of $.68$ (median: $.67$) and a recall of $.49$ (median: $.45$) whereas the logistic regression obtained a precision and recall value of $.43$ (median: $.45$) and $.67$ (median: $.67$) respectively.
%
%
%
%
%
%
%
%
%
%
%
%
%\subsection{TODO Discussion}
%Although the overall performance of the models is not yet practical due to low
%precision and recall, these results are interesting. On the one hand, we consider
%answering our first research question with yes: we can use developer pairs to
%predict build failure. We consider the model to be sufficient because we, as Wolf
%et al.~\cite{wolf:icse:2009}, outperform a random guess which would have
%resulted in both recall and precision of less than $.33$, which is the percentage of failed
%builds. To our knowledge there have been only two studies that focused on
%predicting build outcomes: by Hassan et al.~\cite{hassan:ase:2006} and by Wolf et
%al.~\cite{wolf:icse:2009}. Our approach places itself between the results of both
%studies with our study being better than results obtained by Wolf et
%al.~\cite{wolf:icse:2009} but being worse than Hassans et
%al.~\cite{hassan:ase:2006} approach.
%
%Despite of outperforming the random guess the precision of our models is of concern because
%it indicates the rate at which we falsely report a build to fail. With a median
% precision of $.67$, every third of the \emph{as failed} build predicted by the support
% vector machine would be a false positive. Besides the low trust a developer
% can develop in the model, it also reports less than half of the failed builds
% as failed.
%
%We provide some possible explanations here. In the first part of the
%Figures~\ref{fig:prediction-svm} and~\ref{fig:prediction-logreg}, we observed
%that all models perform poorly as long as they have less than 100 builds to train
%on. After investigating the first 100 builds we found that new
%developers are continually appearing in the development pairs. This means that
%prediction models need to make predictions without having enough knowledge to
%train on, thus resorting to predict the build to be the most likely outcome, to be OK.
%
%A peak in the interval of 100-150 iterations occurred in both models for both recall and precision.
%Within this interval the team changed and new people joined the project and people where reallocated to work on new functionality which meant creating new dependencies and leaving old dependencies behind.
%This change in dependencies confused the model in a way that it did not perform as well.
%Both models stabilize over time with the support vector machine exhibiting a slight upward trend.
%
%Since the goal of this research is to find a way to create actionable knowledge
%to avoid build failure, building a prediction model was the first step to show
%that developer pairs have an effect on the actual prediction. The results from
%the prediction models is first evidence that developer relationships have an
%influence on the build. Next we continue with perusing our second research
%question that examined the relationship between particular developer
%pairs (i.e. technical pairs) and build results. This will help us
%investigate how we might prevent builds from failing by changing the
%nature of developer relations in these pairs.
\begin{table}[t]
\centering%\vspace{1cm}
\begin{tabular}{rcc}
\toprule
& successful & failed \\
\midrule
(Adam, Bart) & 3 & 13 \\
$\neg$ (Adam, Bart) & 224 & 86\\
\midrule
total&227&99\\\bottomrule
%User3493, User2943 & 3 & 13
\end{tabular}
\caption{Contingency table for technical pair (Adam, Bart) in relation to build
success or failure}
\label{tab:contingencytable}
\end{table}
%\addtocounter{table}{1}
\begin{table}[t]
\centering
%\subfloat[Twenty most frequent \emph{technical pairs} that are failure-related.]{
\begin{tabular}{@{\hspace{.2cm}}ccc@{\hspace{.75cm}}c@{\hspace{.2cm}}}
\toprule
Pair & \#successful & \#failed & $p_x$\\
\midrule
%Cody-Daisy& 0 & 12 & 1.0000 \\
%Adam-Ina & 0 & \phantom{1}8 & 1.0000 \\
%Adam-Kim& 0 & \phantom{1}8 & 1.0000 \\
%Adam-Nina & 0 & \phantom{1}6 & 1.0000 \\
%Fred-Gina& 0 & \phantom{1}6 & 1.0000 \\
%Gina-Oliver & 0 & \phantom{1}6 & 1.0000 \\
%Adam-Daisy& 1 & 14 & 0.9720\\%67 \\
%Bart-Daisy& 1 & \phantom{1}9 & 0.9572\\%127 \\
%Adam-Lisa& 1 & \phantom{1}8 & 0.9521\\%204 \\
%Bart-Eve & 2 & 11 & 0.9318\\%403 \\
%\textbf{Adam}-\textbf{Bart}& \textbf{3} & \textbf{13} & \textbf{0.9150}\\%485 \\
%Bart-Cody & 3 & 13 & 0.9150\\%485 \\
%Adam-Eve & 4 & 16 & 0.9086\\%162 \\
%Daisy-Ina & 3 & 12 & 0.9086\\%162 \\
%Cody-Fred& 3 & 10 & 0.8923\\%077 \\
%Bart-Herb & 3 & 10 & 0.8923\\%077 \\
%Cody-Eve & 5 & 15 & 0.8817\\%568 \\
%Adam-Jim & 4 & 11 & 0.8723\\%792 \\
%Herb-Paul & 5 & 12 & 0.8564\\%397 \\
%Mike-Rob& 6 & 13 & 0.8434\\%004\\
%Adam-Fred & 6 & 13 & 0.8434\\%004\\
%
%User11137, User4105 & 0 & 12 & 1.0000 \\
%User2943, User13877 & 0 & 8 & 1.0000 \\
%User7438, User2943 & 0 & 8 & 1.0000 \\
%User2943, User2810 & 0 & 6 & 1.0000 \\
%User8645, User1976 & 0 & 6 & 1.0000 \\
%User8645, User2267 & 0 & 6 & 1.0000 \\
%User11137, User2943 & 1 & 14 & 0.9675\\%908 \\
%User11137, User3493 & 1 & 9 & 0.9504\\%773 \\
%User6012, User2943 & 1 & 8 & 0.9446\\%298 \\
%User3493, User2435 & 2 & 11 & 0.9214\\%387 \\
%User3493, User2943 & 3 & 13 & 0.9023\\%53 \\
%User3493, User4105 & 3 & 13 & 0.9023\\%53 \\
%User2943, User2435 & 4 & 16 & 0.8950\\%695 \\
%User11137, User13877 & 3 & 12 & 0.8950\\%695 \\
%User1976, User4105 & 3 & 10 & 0.8766\\%716 \\
%User3493, User6339 & 3 & 10 & 0.8766\\%716 \\
%User4105, User2435 & 5 & 15 & 0.8648\\%208 \\
%User2943, User9017 & 4 & 11 & 0.8543\\%22 \\
%User6339, User13875 & 5 & 12 & 0.8365\\%498 \\
%User10979, User3385 & 6 & 13 & 0.8220\\%793\\
%User2943, User1976 & 6 & 13 & 0.8220\\%793 \\
%
(Cody, Daisy) & 0& 12& 1 \\ %user11137.user4105.T
(Adam, Daisy) & 1& 14& 0.9697 \\ %user11137.user2943.T
(Bart, Eve) & 2& 11& 0.9265 \\ %user3493.user2435.T
(Adam, Bart) & 3& 13& 0.9085 \\ %user3493.user2943.T
(Bart, Cody) & 3& 13& 0.9085 \\ %user3493.user4105.T
(Adam, Eve) & 4& 16& 0.9016 \\ %user2943.user2435.T
(Daisy, Ina) & 3& 12& 0.9016 \\ %user11137.user13877.T
(Cody, Fred) & 3& 10& 0.8843 \\ %user1976.user4105.T
(Bart, Herb) & 3& 10& 0.8843 \\ %user3493.user6339.T
(Cody, Eve) & 5& 15& 0.8730 \\ %user4105.user2435.T
(Adam, Jim) & 4& 11& 0.8631 \\ %user2943.user9017.T
(Herb, Paul) & 5& 12& 0.8462 \\ %user6339.user13875.T
(Cody, Fred) & 5& 11& 0.8345 \\ %user11137.user1976.T
(Mike, Rob) & 6& 13& 0.8324 \\ %user10979.user3385.T
(Adam, Fred) & 6& 13& 0.8324 \\ %user2943.user1976.T
(Daisy, Fred) & 8& 13& 0.7884 \\ %user3493.user1976.T
(Gill, Eve) & 7& 10& 0.7661 \\ %user1264.user2435.T
(Daisy, Ina) & 7& 10& 0.7661 \\ %user3493.user13873.T
(Fred, Ina) & 8& 10& 0.7413 \\ %user1976.user13877.T
(Herb, Eve) & 8& 10& 0.7413 \\ %user6339.user2435.T
\bottomrule
\end{tabular}
\caption{Twenty most frequent \emph{technical pairs} that are failure-related.}
\label{tab:badtechpairs}
%}\hspace{1.3cm}
\end{table}
%
\begin{table}[t]
%\subfloat[The twenty corresponding \emph{socio-technical pairs}, which are not statistically related to failed builds.]{
\centering
\begin{tabular}{@{\hspace{.2cm}}ccc@{\hspace{.75cm}}c@{\hspace{.2cm}}}
\toprule
Pair & \#successful & \#failed & $p_x$ \\
\midrule
(Cody, Daisy) & ---& ---& ---\\
(Adam, Daisy) & ---& ---& ---\\
(Bart, Eve) & 1& 4& 0.9016\\
(Adam, Bart) & ---& ---& ---\\
(Bart, Cody) & ---& ---& ---\\
(Adam, Eve) & ---& ---& ---\\
(Daisy, Ina) & ---& ---& ---\\
(Cody, Fred) & 1& 0& 0\\
(Bart, Herb) & 1& 2& 0.8209\\
(Cody, Eve) & 0& 3& 1\\
(Adam, Jim) & 0& 1& 1\\
(Herb, Paul) & 1& 0& 0\\
(Cody, Fred) & ---& ---& ---\\
(Mike, Rob) & ---& ---& ---\\
(Adam, Fred) & ---& ---& ---\\
(Daisy, Fred) & ---& ---& ---\\
(Gill, Eve) & ---& ---& ---\\
(Daisy, Ina) & 1& 0& 0\\
(Fred, Ina) & 0& 2& 1\\
(Herb, Eve) & ---& ---& ---\\
\bottomrule
\end{tabular}
%\caption{The twenty corresponding \emph{socio-technical pairs}, which are not statistically related to failed builds.}
\label{tab:stechpairs}
%}
\caption{The 20 most frequent statistically failure related technical pairs and the corresponding socio-technical pairs.}
%\label{tab:pairs}
\end{table}
%\addtocounter{table}{-1}
% \section{Pattern Analysis}
%\section{Which Pairs Induce Failure?}
%\label{sec:pattern}
%In this section we answer evaluate the last step of our approach to see whether we can generate recommendations that bear a statistical relationship to build failure.
%Thus, we first explain our analysis approach followed by the results obtained and a
%short discussion of the results.
\section{Analysis of Socio-Technical Gaps}
\label{ch8:gaps}
The lack of communication between two developers that share a
technical dependency is referred to in the literature as a
socio-technical gap~\cite{valetto:msr:2007}. Because research suggests negative influence of such gaps, we are interested in analyzing pairs of developers that share a technical edge (implying coordination need) but no social edge (implying
unmet coordination need) in socio-technical networks. We refer to these pairs of
developers as \emph{technical pairs} (there is a gap), and to those that do
share a socio-technical edge (there is no gap) as \emph{socio-technical pairs}.
To answer our second research question, we analyze the
technical pairs in relation to build
failure. Our analysis proceeds in four steps:
\begin{enumerate}
\item Identify all technical pairs from the socio-technical networks.
\item For each technical pair count occurrences in socio-technical networks of
failed builds.
\item For each technical pair count occurrences in socio-technical networks of
successful builds.
\item Determine if the pair is significantly related to success or failure.
\end{enumerate}
For example, in Table~\ref{tab:contingencytable} we illustrate the analysis of
the technical pair (Adam, Bart). This pair appears in 3 successful builds and in
13 failed builds. Thus it does not appear in 224 successful builds, which is the total number of successful builds minus the number of successful builds the pair appeared in, and it is absent in 86 failed builds.
A Fischer Exact Value test yields significance at a confidence level of $\alpha = .05$ with a p-value of $4.273\cdot10^{-5}$.
Note that we adjust the p-values of the Fischer Exact Value test to account for multiple hypothesis testing using the Bonferroni adjustment.
The adjustment is necessary because we deal with 961 technical pairs that need to be tested.
To enable us to discuss the findings as to whether closing socio-technical gaps
are needed to avoid build failure, or which of these gaps are more important to
close, we perform two additional analyses.
First we analyze whether the
socio-technical pairs also appear to be build failure-related or not, by
following the same steps as above for socio-technical pairs.
%
Secondly, we prioritize the developer pairs using the coefficient $p_x$,
which represents the normalized likelihood of a build
to fail in the presence of the specific pair:
\begin{equation}
p_x\text{=}\frac{ \text{pair}_{failed} / \text{total}_{failed} }
{ \text{pair}_{failed} / \text{total}_{failed} + \text{pair}_{success} / \text{total}_{successs}}
\end{equation}
The coefficient is comprised of four counts: (1) pair$_{failed}$, the number of failed builds where the pair occurred; (2) total$_{failed}$, the number of failed builds; (3) pair$_{success}$, the number of successful builds where the pair occurred; (4) total$_{success}$, the number of successful builds.
A value closer to one means that the developer pair is strongly related to build
failure.
%\addtocounter{table}{1}
\begin{table}[t!]
\centering
\begin{tabular}{cccc}
\toprule
Feature & Coefficient & p-value & \\
\midrule
%(Intercept) & 7.897e+74 & 3.743e+09 & 2.110e+65 & <2e-16 & ***\\
%\\
%user11137.user4105.T & -5.669e+75 & 2.421e+10 &-2.342e+65 & <2e-16 & ***\\
%user11137.user2943.T & -9.846e+75 & 7.788e+09 &-1.264e+66 & <2e-16 &***\\
%user3493.user2435.T & -1.258e+75 & 3.477e+10 &3.619e+64 &<2e-16 &***\\
%user3493.user2943.T & -1.605e+76 & 5.427e+10 &2.958e+65 & <2e-16 &***\\
%user3493.user4105.T & -3.419e+76 & 3.837e+10 &-8.910e+65 & <2e-16 &***\\
%user2943.user2435.T & -2.610e+76 & 2.966e+10 &8.801e+65 & <2e-16 &***\\
%user11137.user13877.T & -8.105e+74 & 3.036e+10 &-2.669e+64 & <2e-16 &***\\
%user1976.user4105.T & -5.348e+76 & 2.359e+10 &2.267e+66 &<2e-16 &***\\
%user3493.user6339.T & -2.977e+76 &1.028e+11 &-2.895e+65 &<2e-16 &***\\
%user4105.user2435.T & -2.315e+76 & 1.618e+10 &-1.431e+66 & <2e-16 &***\\
%user2943.user9017.T & -2.724e+76 &2.621e+10 &1.039e+66 & <2e-16 &***\\
%user6339.user13875.T & -1.636e+76 & 4.081e+08 &-4.010e+67 &<2e-16 &***\\
%user11137.user1976.T & -1.645e+74 &4.024e+09 &-4.087e+64 & <2e-16 &***\\
%user10979.user3385.T & -1.327e+75 &3.668e+09 &3.619e+65 & <2e-16 &***\\
%user2943.user1976.T & -5.250e+76 &1.269e+10 &-4.136e+66 & <2e-16 &***\\
%user3493.user1976.T & -2.455e+75 & 3.523e+10 &-6.970e+64 &<2e-16 &***\\
%user1264.user2435.T & -7.162e+75 &3.589e+09 &1.996e+66 & <2e-16 &***\\
%user3493.user13873.T & -5.325e+74 & 3.464e+10 &-1.537e+64 &<2e-16 &***\\
%user1976.user13877.T & -2.777e+75 & 7.334e+08 &3.786e+66 & <2e-16 &***\\
%user6339.user2435.T & -1.799e+75 & 1.584e+09 &1.136e+66 & <2e-16 &***\\
%\\
%\#Change Sets per Build & \phantom{-}6.480e+60 & 8.539e+06 & 7.589e+53 & <2e-16 &***\\
%\#Files changed per Build &-4.530e+60 & 3.072e+06 &-1.475e+54 & <2e-16 &***\\
%{\small \#Developers contributed per Build} & \phantom{-}3.386e+61 & 2.687e+07 & 1.260e+54 & <2e-16 &***\\
%\#Work Items per Build & -3.690e+61 & 1.859e+07 &-1.984e+54 & <2e-16 &***\\
%
(Intercept) & 7.897e+74 & $<$ 2e-16 & ***\\
\\
(Cody, Daisy) & -5.669e+75 & $<$ 2e-16 & ***\\
(Adam, Daisy) & -9.846e+75 & $<$ 2e-16 &***\\
(Bart, Eve) & -1.258e+75 &$<$ 2e-16 &***\\
(Adam, Bart) & -1.605e+76 & $<$ 2e-16 &***\\
(Bart, Cody) & -3.419e+76 & $<$ 2e-16 &***\\
(Adam, Eve) & -2.610e+76 & $<$ 2e-16 &***\\
(Daisy, Ina) & -8.105e+74 & $<$ 2e-16 &***\\
(Cody, Fred) & -5.348e+76 &$<$ 2e-16 &***\\
(Bart, Herb) & -2.977e+76 &$<$ 2e-16 &***\\
(Cody, Eve) & -2.315e+76 & $<$ 2e-16 &***\\
(Adam, Jim) & -2.724e+76 & $<$ 2e-16 &***\\
(Herb, Paul) & -1.636e+76 &$<$ 2e-16 &***\\
(Cody, Fred) & -1.645e+74 & $<$ 2e-16 &***\\
(Mike, Rob) & -1.327e+75 & $<$ 2e-16 &***\\
(Adam, Fred) & -5.250e+76 & $<$ 2e-16 &***\\
(Daisy, Fred) & -2.455e+75 &$<$ 2e-16 &***\\
(Gill, Eve) & -7.162e+75 & $<$ 2e-16 &***\\
(Daisy, Ina) & -5.325e+74 &$<$ 2e-16 &***\\
(Fred, Ina) & -2.777e+75 & $<$ 2e-16 &***\\
(Herb, Eve) & -1.799e+75 & $<$ 2e-16 &***\\
\\
\#Change Sets per Build & \phantom{-}6.480e+60 & $<$ 2e-16 &***\\
\#Files changed per Build &-4.530e+60 & $<$ 2e-16 &***\\
{\#Developers contributed per Build} & \phantom{-}3.386e+61 & $<$ 2e-16 &***\\
\#Work Items per Build & -3.690e+61 & $<$ 2e-16 &***\\
%
%
%
%user6012.user2943.T -1.198e+76 1.415e+10 -8.466e+65 <2e-16 ***\\
%user11137.user3493.T 4.917e+76 2.255e+10 2.180e+66 <2e-16 ***\\
%user2943.user13877.T -2.086e+76 3.598e+10 -5.796e+65 <2e-16 ***\\
%user8645.user1976.T -1.172e+75 4.535e+09 -2.585e+65 <2e-16 ***\\
%user8645.user2267.T 1.358e+76 2.934e+10 4.628e+65 <2e-16 ***\\
%user7438.user2943.T 1.562e+75 2.562e+10 6.096e+64 <2e-16 ***\\
%user10761.user9609.T -1.244e+68 1.972e+08 -6.307e+59 <2e-16 ***\\
%user11208.user9017.T -7.661e+73 6.520e+07 -1.175e+66 <2e-16 ***\\
%user11137.user8543.T -7.938e+74 1.813e+08 -4.378e+66 <2e-16 ***\\
%user11281.user8543.T 1.520e+75 3.323e+09 4.573e+65 <2e-16 ***\\
%user3818.user8543.T -1.655e+75 2.732e+10 -6.058e+64 <2e-16 ***\\
%user13877.user8543.T 1.802e+74 3.352e+09 5.377e+64 <2e-16 ***\\
%user9017.user13871.T -3.613e+74 6.052e+09 -5.970e+64 <2e-16 ***\\
%user8645.user11281.T -6.742e+73 1.058e+08 -6.371e+65 <2e-16 ***\\
%user2983.user9017.T -6.303e+73 8.157e+09 -7.727e+63 <2e-16 ***\\
%user10979.user13875.T 1.507e+75 1.803e+10 8.355e+64 <2e-16 ***\\
%user9017.user13874.T -2.791e+76 1.140e+11 -2.450e+65 <2e-16 ***\\
%user1264.user13874.T -8.654e+75 2.838e+10 -3.049e+65 <2e-16 ***\\
%user3493.user9017.T -2.786e+75 4.274e+09 -6.519e+65 <2e-16 ***\\
%user4105.user13874.T 1.206e+76 9.252e+10 1.303e+65 <2e-16 ***\\
%user2943.user13871.T -6.665e+75 3.166e+10 -2.105e+65 <2e-16 ***\\
%user9017.user1976.T -1.910e+63 3.850e+09 -4.960e+53 <2e-16 ***\\
%user10979.user2435.T 6.269e+63 3.579e+09 1.752e+54 <2e-16 ***\\
%user6639.user6339.T 3.075e+64 2.102e+10 1.463e+54 <2e-16 ***\\
%user6012.user9017.T -5.463e+63 3.603e+09 -1.516e+54 <2e-16 ***\\
%user6339.user4105.T 5.169e+63 3.628e+09 1.425e+54 <2e-16 ***\\
%user9172.user2435.T -1.212e+63 1.626e+09 -7.453e+53 <2e-16 ***\\
%user7438.user8543.T 5.042e+62 1.641e+09 3.073e+53 <2e-16 ***\\
%user11137.user13871.T 6.266e+62 5.193e+08 1.207e+54 <2e-16 ***\\
%user10979.user11281.T -5.506e+63 3.719e+09 -1.480e+54 <2e-16 ***\\
%user11137.user6012.T 2.332e+63 1.532e+09 1.522e+54 <2e-16 ***\\
%user11137.user13874.T -1.913e+64 1.393e+10 -1.373e+54 <2e-16 ***\\
%user1264.user4105.T 1.212e+63 1.420e+09 8.533e+53 <2e-16 ***\\
%user8645.user6096.T 6.633e+63 4.523e+09 1.467e+54 <2e-16 ***\\
%user11281.user9609.T 5.157e+62 7.503e+07 6.873e+54 <2e-16 ***\\
%user13871.user4163.C 8.406e+61 1.717e+08 4.897e+53 <2e-16 ***\\
%user11840.user9172.C -1.672e+61 4.148e+08 -4.031e+52 <2e-16 ***\\
%user11840.user9983.C 2.743e+62 2.898e+08 9.463e+53 <2e-16 ***\\
%user6727.user9983.C 1.659e+63 1.698e+09 9.771e+53 <2e-16 ***\\
%user11208.user6727.C -2.455e+63 1.596e+09 -1.538e+54 <2e-16 ***\\
%user3057.user13873.C 1.983e+62 1.260e+08 1.574e+54 <2e-16 ***\\
%user11137.user11208.C -9.030e+61 2.237e+08 -4.036e+53 <2e-16 ***\\
%user3982.user6012.C -2.567e+62 3.356e+08 -7.648e+53 <2e-16 ***\\
%user3818.user10979.C 1.868e+62 1.228e+08 1.522e+54 <2e-16 ***\\
%user6012.user9172.C 1.353e+62 1.089e+08 1.242e+54 <2e-16 ***\\
%user13877.user13875.C -1.013e+63 6.884e+08 -1.472e+54 <2e-16 ***\\
%user11208.user7438.C 8.180e+61 8.746e+07 9.353e+53 <2e-16 ***\\
%user10979.user6096.C -5.291e+63 3.648e+09 -1.450e+54 <2e-16 ***\\
%user6096.user7395.C -3.135e+61 1.839e+08 -1.705e+53 <2e-16 ***\\
%user4105.user5275.C 2.725e+63 9.779e+08 2.787e+54 <2e-16 ***\\
%user7146.user4163.C 1.171e+63 5.291e+08 2.214e+54 <2e-16 ***\\
%user11208.user2267.C 3.459e+61 3.525e+08 9.813e+52 <2e-16 ***\\
%user7224.user4163.C 2.375e+61 1.379e+08 1.723e+53 <2e-16 ***\\
%user13877.user2435.C 2.353e+63 1.838e+09 1.280e+54 <2e-16 ***\\
%user6012.user7146.C -6.092e+62 2.884e+08 -2.112e+54 <2e-16 ***\\
%user2983.user7224.C 2.961e+61 1.340e+08 2.209e+53 <2e-16 ***\\
%user13235.user1976.C 7.225e+63 5.490e+09 1.316e+54 <2e-16 ***\\
%user11208.user7372.C 5.809e+63 3.983e+09 1.458e+54 <2e-16 ***\\
%user13235.user13874.C 5.605e+63 3.718e+09 1.507e+54 <2e-16 ***\\
%user3057.user13874.C 5.278e+63 3.817e+09 1.383e+54 <2e-16 ***\\
%user11208.user6012.C 4.458e+61 1.043e+08 4.273e+53 <2e-16 ***\\
%user7002.user2020.C -5.853e+63 3.978e+09 -1.471e+54 <2e-16 ***\\
%user2744.user11208.C 7.773e+62 7.092e+08 1.096e+54 <2e-16 ***\\
%user9172.user5275.C 1.397e+64 9.128e+09 1.530e+54 <2e-16 ***\\
%user6677.user7224.C 2.952e+61 1.316e+08 2.244e+53 <2e-16 ***\\
%user3818.user6639.C 5.573e+61 2.800e+08 1.991e+53 <2e-16 ***\\
%user7002.user4105.C -2.017e+63 9.206e+08 -2.191e+54 <2e-16 ***\\
%user12149.user2943.C -1.655e+64 9.635e+09 -1.718e+54 <2e-16 ***\\
%user7438.user7372.C -9.768e+61 2.375e+08 -4.113e+53 <2e-16 ***\\
%user4955.user2306.C -4.290e+62 4.334e+08 -9.898e+53 <2e-16 ***\\
%user11137.user7438.C -8.275e+61 7.426e+08 -1.114e+53 <2e-16 ***\\
%user11208.user7002.C 1.709e+62 8.605e+07 1.986e+54 <2e-16 ***\\
%user3057.user1976.C 3.156e+60 4.486e+08 7.034e+51 <2e-16 ***\\
%user3982.user4163.C -1.497e+63 7.778e+08 -1.925e+54 <2e-16 ***\\
%user10761.user4105.C 5.332e+62 3.087e+08 1.727e+54 <2e-16 ***\\
%user7438.user6639.C -4.700e+62 1.319e+08 -3.564e+54 <2e-16 ***\\
%user10761.user6339.C 8.421e+61 1.068e+08 7.887e+53 <2e-16 ***\\
%user7438.user2267.C 5.206e+62 8.470e+07 6.146e+54 <2e-16 ***\\
%user11840.user2267.C -1.418e+62 3.676e+08 -3.858e+53 <2e-16 ***\\
%user3057.user4163.C 5.427e+63 3.853e+09 1.409e+54 <2e-16 ***\\
%user11208.user6639.C 6.044e+62 6.502e+08 9.295e+53 <2e-16 ***\\
%user6012.user4163.C 4.410e+61 9.535e+07 4.625e+53 <2e-16 ***\\
%user6677.user6379.C 1.691e+61 1.481e+08 1.142e+53 <2e-16 ***\\
%user10761.user3639.C -1.635e+62 1.878e+08 -8.709e+53 <2e-16 ***\\
%user9655.user10979.C -1.393e+61 8.225e+07 -1.693e+53 <2e-16 ***\\
%user4955.user9986.TC 5.594e+61 8.556e+07 6.538e+53 <2e-16 ***\\
%user4105.user1567.C 1.454e+62 9.109e+07 1.596e+54 <2e-16 ***\\
%user9609.user6012.T 5.290e+61 1.824e+08 2.900e+53 <2e-16 ***\\
%user11281.user2267.T -5.499e+63 3.725e+09 -1.476e+54 <2e-16 ***\\
%user2983.user8860.C 3.596e+62 1.531e+08 2.349e+54 <2e-16 ***\\
%user3672.user13875.C -6.807e+61 1.880e+08 -3.621e+53 <2e-16 ***\\
%user2452.user9983.C -3.179e+62 3.396e+08 -9.359e+53 <2e-16 ***\\
%user3756.user7372.C 7.300e+61 5.508e+07 1.325e+54 <2e-16 ***\\
%user3539.user13877.C 4.104e+62 1.941e+08 2.114e+54 <2e-16 ***\\
%user2460.user6103.C 8.697e+61 3.716e+08 2.340e+53 <2e-16 ***\\
%user6021.user9017.C 1.049e+62 7.032e+07 1.491e+54 <2e-16 ***\\
%user6901.user2038.T 2.862e+61 9.837e+07 2.909e+53 <2e-16 ***\\
%user6677.user5963.C 9.236e+60 7.371e+07 1.253e+53 <2e-16 ***\\
%user10761.user6677.C -4.323e+62 4.038e+08 -1.071e+54 <2e-16 ***\\
%user13874.user11208.C -5.371e+63 3.581e+09 -1.500e+54 <2e-16 ***\\
%user3982.user2943.C -6.644e+61 2.197e+08 -3.024e+53 <2e-16 ***\\
%user3057.user7286.C -2.138e+60 1.090e+08 -1.962e+52 <2e-16 ***\\
%user7111.user11623.C 1.263e+63 5.111e+08 2.471e+54 <2e-16 ***\\
%user9983.user6677.C 3.487e+62 3.489e+08 9.995e+53 <2e-16 ***\\
%user7438.user6677.TC -5.068e+63 3.650e+09 -1.388e+54 <2e-16 ***\\
%user11281.user7438.C 2.607e+61 9.562e+07 2.726e+53 <2e-16 ***\\
%user4686.user6391.C 3.592e+62 1.623e+08 2.214e+54 <2e-16 ***\\
%user10303.user4105.C 9.159e+61 7.919e+07 1.157e+54 <2e-16 ***\\
%user4955.user3818.C 1.993e+62 1.600e+08 1.246e+54 <2e-16 ***\\
%user13871.user6391.C -1.917e+62 2.296e+08 -8.351e+53 <2e-16 ***\\
%user11137.user3818.T -1.882e+62 1.920e+08 -9.800e+53 <2e-16 ***\\
%user9609.user11281.TC -1.094e+62 1.955e+08 -5.596e+53 <2e-16 ***\\
%user6677.user6639.C -4.807e+63 3.646e+09 -1.319e+54 <2e-16 ***\\
%user7689.user7286.C 2.220e+60 7.506e+07 2.958e+52 <2e-16 ***\\
%user7438.user7689.T -1.661e+62 1.214e+08 -1.368e+54 <2e-16 ***\\
%user6677.user2810.C -2.554e+60 6.772e+07 -3.771e+52 <2e-16 ***\\
%user1077.user13871.C -1.468e+61 8.322e+07 -1.764e+53 <2e-16 ***\\
%user11208.user13871.T -6.376e+61 1.523e+08 -4.187e+53 <2e-16 ***\\
%user6639.user11281.C -1.021e+62 8.631e+07 -1.183e+54 <2e-16 ***\\
%user6677.user6727.C -8.135e+61 1.169e+08 -6.959e+53 <2e-16 ***\\
%user10761.user1264.C -9.014e+60 2.694e+08 -3.346e+52 <2e-16 ***\\
%user2452.user2983.C 6.650e+61 9.949e+07 6.685e+53 <2e-16 ***\\
%user11281.user7438.TC 2.085e+62 7.164e+07 2.910e+54 <2e-16 ***\\
%user1264.user2943.C -1.568e+62 1.711e+08 -9.162e+53 <2e-16 ***\\
%user6339.user7689.T 3.002e+62 1.879e+08 1.597e+54 <2e-16 ***\\
%user10517.user11840.C -5.510e+63 3.837e+09 -1.436e+54 <2e-16 ***\\
%user7438.user6677.C 2.349e+61 7.951e+07 2.955e+53 <2e-16 ***\\
%user3057.user6021.C -5.461e+61 5.719e+07 -9.548e+53 <2e-16 ***\\
%user11137.user2038.C -7.344e+61 1.607e+08 -4.570e+53 <2e-16 ***\\
%user6727.user2983.C 3.770e+61 1.191e+08 3.165e+53 <2e-16 ***\\
%user11281.user10849.C -4.757e+61 1.248e+08 -3.812e+53 <2e-16 ***\\
%user6021.user5275.C 1.601e+62 1.424e+08 1.124e+54 <2e-16 ***\\
%user6339.user11281.TC 5.532e+63 3.717e+09 1.488e+54 <2e-16 ***\\
%user3982.user9172.C 2.099e+62 1.581e+08 1.328e+54 <2e-16 ***\\
%user7146.user3818.C 2.601e+60 8.291e+07 3.137e+52 <2e-16 ***\\
%user11208.user8543.T -4.465e+61 8.689e+07 -5.139e+53 <2e-16 ***\\
%user10979.user3539.C 1.514e+61 8.288e+07 1.826e+53 <2e-16 ***\\
%user8645.user11281.C 1.372e+62 4.692e+07 2.924e+54 <2e-16 ***\\
%user11281.user7002.C -1.453e+61 4.487e+07 -3.238e+53 <2e-16 ***\\
%user3982.user7146.C 1.576e+62 1.342e+08 1.174e+54 <2e-16 ***\\
%user9983.user7689.C -5.070e+63 3.610e+09 -1.404e+54 <2e-16 ***\\
%user2452.user6215.C 3.324e+61 6.875e+07 4.836e+53 <2e-16 ***\\
%user11281.user9609.C 1.806e+61 6.370e+07 2.835e+53 <2e-16 ***\\
%user11281.user2983.C -2.505e+61 8.608e+07 -2.910e+53 <2e-16 ***\\
%user11208.user11840.C 6.661e+61 1.573e+08 4.236e+53 <2e-16 ***\\
%user7224.user9609.C 6.556e+61 1.029e+08 6.374e+53 <2e-16 ***\\
%user13875.user13871.C -1.769e+62 2.469e+08 -7.166e+53 <2e-16 ***\\
%user11281.user1227.C 1.188e+61 7.815e+07 1.520e+53 <2e-16 ***\\
%user6037.user11281.C 5.378e+61 3.069e+07 1.752e+54 <2e-16 ***\\
%user2435.user6012.T 2.009e+62 8.135e+07 2.470e+54 <2e-16 ***\\
%user11281.user3818.C 1.975e+61 2.844e+07 6.942e+53 <2e-16 ***\\
%user11281.user6677.C 9.267e+61 8.526e+07 1.087e+54 <2e-16 ***\\
%user6339.user9017.C 5.143e+61 4.972e+07 1.035e+54 <2e-16 ***\\
%user2038.user3756.C 1.465e+58 9.491e+07 1.544e+50 <2e-16 ***\\
%user8860.user10623.C 2.737e+61 7.109e+07 3.850e+53 <2e-16 ***\\
%user7438.user7146.C -5.663e+60 9.529e+07 -5.943e+52 <2e-16 ***\\
%user10433.user5564.C -2.747e+61 1.538e+08 -1.787e+53 <2e-16 ***\\
%user6677.user11281.T -2.151e+62 7.131e+07 -3.016e+54 <2e-16 ***\\
%user11281.user3057.C 2.576e+61 4.683e+07 5.502e+53 <2e-16 ***\\
%user9609.user6677.C 2.740e+60 9.492e+07 2.887e+52 <2e-16 ***\\
%user6677.user10899.TC 8.748e+60 1.528e+08 5.724e+52 <2e-16 ***\\
%
%
\bottomrule
\end{tabular}
\caption{Logistic regression only showing the technical pairs from Table~\ref{tab:badtechpairs}, the intercept, and the confounding variables, the model reaches an AIC of 7006 with all shown features being significant at $\alpha=0.001$ level (indicated by ***).}
\label{tab:regression}
\end{table}
\section{Results}
\label{ch:fse:result2}
We found a total of 2872 developer pairs in all the constructed
socio-technical networks, out of which 961 were technical pairs. %. The Fischer Exact Value tests show While a Fischer Exact Value test determined 120 technical pairs that are significantly correlated with build failure, none are statistically related with successful builds. Due to space constraints we display only twenty pairs in Table~\ref{tab:badtechpairs}.
We choose to present the twenty that are most frequent across failed builds.
We rank the failure relating \emph{technical} pairs (see Tables~\ref{tab:badtechpairs})
by the coefficient $p_{x}$. This coefficient indicates the strength of
relationship between the developer pair and build failure. For instance, the
developer pair (Adam, Bart), appears in 13 failed builds and in 3
successful builds. This means that pair$_{failed}$ = 13 and pair$_{success}$ = 3
with total$_{failed}$= 99 and total$_{success}$= 227 result in $p_x$= 0.9016.
Besides that we report the number of successful builds the pair was observed with
(\#successful) as well as the number of failed builds the pairs was observed with
(\#failed). The $p_x$ values are all above 0.74, implying that the likelihood
of failure is at least 74\% in all builds in which these developers pairs are
involved.
We then checked for the 120 pairs whether the corresponding \emph{socio-technical} pairs are related to failure.
Only 23 of the 120 technical pairs had an existing corresponding socio-technical pair of which none were statistically related to build failure.
In Table~\ref{tab:stechpairs} we show the socio-technical pairs that match the 20 technical pairs shown in Table~\ref{tab:badtechpairs} as well as the same information as in Table~\ref{tab:badtechpairs}.
If the corresponding socio-technical pair existed we computed the same statistics as for the technical pairs, but for those that existed we could not find statistical significance.
Note that we use fictitious names for confidentiality reasons.
The failure-related technical pairs span 48 out of the total 99 failed builds in
the project. Figure~\ref{fig:builddistribution} shows their distribution
across the 48 failed builds. The histogram
illustrates that there are few builds that have a large number of failure related
builds, e.g. 4 with 18 or more pairs, but most builds only show a small number of
pairs (15 out of 48 failed builds have 4 or less).
%Not only does
This distribution of technical pairs indicate that the developer
pairs we found did not concentrate in a small number of builds.
In addition, it validates the assumption that it is
worthwhile seeking insights about developer coordination in failed builds.
%Moreover, this enables us to explain why two thirds of the builds failed.
\begin{figure}[t]
\centering
\vspace{-1cm}
\includegraphics[width=\columnwidth]{figures/builddistribution}
\vspace{-.75cm}
\caption{Histogram plotting how many builds have a certain number of failure-reated technical pairs.}
\label{fig:builddistribution}
\end{figure}
\section{Discussion}
\label{ch8:dis}
These results show that there is a strong relationship between certain technical
developer pairs and increased likelihood of a build failure.
Out of the total of 120 technical pairs that increase the likelihood of a
build to fail, only 23 had an existing
corresponding socio-technical pair. Of these, none were statistically
related to build failure. This means that 97 pairs of developers that had a
technical dependency did not communicate with each other and
consequently increased the likelihood of a build failure. Our results not only
corroborate past findings~\cite{cataldo:cscw:2006,cataldo:esem:2008} that socio-technical gaps
have a negative effect in software development. More importantly, they indicate
that the analysis presented in this paper is able to identify the specific
socio-technical gaps, namely the actual developer pairs where the gaps occur
and that increase the likelihood of build failure.
A preliminary analysis of developers membership to teams shows that most
of the technical pairs related to build failure consist of developer belonging to
different teams. Naggappan et al.~\cite{nagappan:icse:2008} found that using the
organizational distance between people predicts failures. They reasoned that this
is due to the lack of awareness what people separated by organizational distance
work on. Although the Rational Team Concert team strongly emphasizes communication
regardless of team boundaries, it still seems that organizational distance has
an influence on its communication behaviour.
Although the analysis of technical pairs in relation to build failures
showed that they have a negative influence on the build result, it does not take
into account possible confounding variables. The question is if there developer pairs still effect the build outcome in the presence of confounding variables, such as number of developers involved in a build, number of work items
related to a build, number of change sets per build, and number of files changed.
We tested this using a logistical regression model including both developer pairs and confounding variables.
Overall the logistic regression confirms the effect of the developer pairs
even in the presence of confounding variables, such as number of files changed
and numbers of developers (AIC of 7006).
Table~\ref{tab:regression} shows an excerpt from the complete logistical
regression which shows that the twenty pairs previously identified are still
significant under the influence of the four confounding variables. Because we have 2872 developer pairs, space constraints prevent us from reporting the entire regression model. We chose to report on the coefficients for the twenty pairs that we reported in Table~\ref{tab:regression} as well as the coefficients for the four control features.
All features shown in Table~\ref{tab:regression} are significant at the $\alpha=.001$ level (indicated by the p-value and ***).
We checked for all the other technical pairs that were reported significant using the approach described in the previous section and found that they are all significant in the regression model as well.
Moreover, the four control variables of number of developers per build, number
of work items per build, number of change sets per build, and number of files changed per build are also significant.
Since we model failed builds with 0 and successful builds with 1, a negative coefficient means that the feature increases the chances of a failure.
All pairs reported in Table~\ref{tab:regression} and the technical pairs that we identified to be related to build failure have a negative coefficient.
\section{Threats To Validity}
\label{ch8:threat}
During our study we identified two main threats.
One threat covers issues that arise from the underlying data we used.
The other threat deals with possible problems from the conceptualization of
constructs in our study.
\subsection{Data}
We performed all our analysis on one set of data, the Rational Team Concert
repository. This limits the generalizability of our findings, due to the fact
that we only made the observations within one project. The
project size and the project properties -- incorporating open
source practices such as open development and encouraging community involvement
-- make us believe that our findings still hold value.
Furthermore we only investigated three months of the project's lifetime. This
might lead to smaller significance of our results. However, since the three
months are directly before a major release of the project, this dataset contains
the most viable data for our analysis. In those three months a lack of necessary
coordination is the most harmful to the project.
Another threat that is inevitable in studies of software engineers is
the possible lack of recorded communication. This and the possibility of people
coordinating without communicating, such as reading each others source
code~\cite{bolici:stc:2009}, are mitigated in Rational Team Concert by its development process.
The Rational Team Concert team's development process demands that the developers
coordinate using workitem discussion.
\subsection{Conceptualization}
Our conceptualization of the three edges we use to construct the
socio-technical networks might introduce inaccuracy in our findings.
First, social edges are extracted from workitem discussions. We assumed that
every developer commenting on or subscribed to a work item reads all comments of that
work item. This assumption might not always be correct. By manual inspection of
a selected number of work items, however, we found that developers who
commented on a work item are aware of the other comments, confirming our assumption.
Second, the technical edges are not problematic by themselves, but they are not
complete, since there are more technical relationships between developers that
can be examined. For example, two developers can be connected if one developer
changes someone else's code. This however does not invalidate our findings, it
just suggests that there is room for improvement, which we should address
next.
Third, socio-technical edges on the other hand may suffer from the combination of
social and technical edges. For example, it is not necessarily true that the
discussion of two developers in a technical dependency is always about their
technical dependency.
In our study however, since the changes to
source code files we use to extract technical dependencies are attached to workitem discussion, we are
confident that they addressed the changes at least indirectly.
\section{Conclusion}
\label{sec:8:conclusions}
We end this chapter by bringing it back to the initial research questions we set out to answer:
\begin{description}
\item[RQ 2.1:] Can Socio-Technical-Networks be manipulated to increase build success?
\end{description}
The results we presented in Section~\ref{ch:fse:result2} show that there are developer pairs with unmet coordination needs that negatively influence build success.
Furthermore, we showed that the corresponding developer pair that fulfills its coordination need is less likely to be associated with build failure and thus theoretically decreasing the likelihood of build failure.
To put it in perspective if any of twenty most harmful patterns is present in a build the build had at least an 74\% chance to fail.
Besides the technical side of the approach to generate statistically relevant recommendations as Murphy and Murphy-Hill~\cite{murphy:rsse:2010} pointed out, developers need to accept such recommendation for them to be ultimately useful.