forked from schadr/ChatToSucceed
-
Notifications
You must be signed in to change notification settings - Fork 0
/
stcf.tex
621 lines (494 loc) · 52.4 KB
/
stcf.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
% !TEX root = thesis.tex
\startchapter{Socio-Technical Congruence and Failure}
\label{chap:stc-net2}
Knowing that social networks have an effect on build success opens the next question as to how or more precisely which parts of the social network should be changed to increase the likelihood for a build to succeed.
For this reason we turn to the concept of socio-technical congruence as it postulates that developer should communicate once their work intersects.
Thus in this section we explore the effect of socio-technical networks on build success:
\begin{description}
\item[RQ 1.2:] Does Socio-Technical Networks influence build success?
\end{description}
Although socio-technical congruence has only been studies in connection with productivity intuitively there should be a connection to software quality such as build success.
For example, imagine two developer modifying classes that share call and data dependencies and one developer making changes that violate certain assumptions the other developer relies on when using the modified code.
This might introduce an error that could have been prevented if both developer would have discussed their work.
Thus, we hypothesize that the concept of socio-technical congruence relates to software quality as well as productivity and might be used to point towards improvements in the social network by pointing out developers that should communicate.
In the remainder of this chapter, we start with detailing the back ground relevant to studying coordination, its influence, and how coordination needs are identified by other researchers (Section~\ref{sec:background}).
Next, we discuss how we calculate socio-technical congruence as well as highlight some modifications to explore to allow us to explore whether a difference in the magnitude of technical dependencies and amount of communication plays a role to possibly highlight improvements for social interactions among software developers (Section~\ref{sec:congruence}).
Subsequently, we briefly go over the methodology that is relevant to exploring our research question that were not mentioned in Chapter~\ref{chap:meth} (Section~\ref{sec:methodology}).
Then, we go over the analysis and results we obtained in Section~\ref{sec:results} followed by a discussion of the results and their implications in Section~\ref{sec:discussion}.
We conclude this chapter with offering an answer to our research question and leading into the subsequent Chapter~\ref{chap:approach} (Section~\ref{sec:conclusion}).
\section{Background}
\label{sec:background}
Before we start with exploring the research question using data we obtained from the Ration Team Concert Team, we discuss related work investigating socio-technical congruence as well how the subject of team coordination has been approach by other researchers.
\subsection{Motivation}
Coordinating the efforts of individuals working together in a team is
necessary to build software systems. The complexity of current systems require contributions
from tens or hundreds of people who may span multiple offices, cities, or even continents.
To build such systems, we need to ensure that the team is not only capable of
developing components of a system, but also has the governance to be able to integrate the
interdependent parts into a whole.
We describe a case study of socio-technical coordination and its effect on software builds in the IBM
Rational Team Concert (RTC) software product.
Our approach to investigating coordination is to examine the
alignment between the technical dimension of work and the social relationships
between team members. This alignment is called socio-technical
congruence~\cite{cataldo:cscw:2006}. High socio-technical congruence
has been shown to be a predictor of coordination success
\cite{cataldo:cscw:2006,ehrlich:stc:2008}.
The mismatches between the social and technical dimensions, or gaps,
also have been observed as increasing resolution times for software activities.
The objective of this study is to investigate the effects of socio-technical congruence on high-coordination software development activities during a project.
We seek to discover the relationship that congruence has on the probability that a regularly-scheduled software build will be successful.marczak:re:2008
We conduct a case study of a large software project at IBM. A build result, which can be \emph{error} or \emph{OK}, indicates the relative health of the project up to that build. To
measure socio-technical congruence we apply two different measures: a
previously-published congruence approach \cite{cataldo:cscw:2006},
and a weighted congruence approach that provides details about the size of a
gap between two individuals~\cite{kwan2009:weighted}. We
also examine RTC's processes and tools to identify any
explanations of the relationship between congruence and builds that
we find.
\subsection{The Need for Coordination}
Software is extremely complex because of the sheer number of dependencies~\cite{sawyer2004:teams}.
Large software projects have a large number of components that interoperate with one another.
The difficulty arises when changes must be made to the software, because a change in one component of the software often requires changes in dependent components~\cite{desouza:2008}. Because a single person's knowledge of a system is specialized as well as limited, that person often is unable to make the appropriate modifications in dependent components when a component is changed.
Coordination is defined as ``integrating or linking together different parts of an organization to accomplish a collective set of tasks''~\cite{vandeven1976}. In order to manage changes and maintain quality, developers must coordinate, and in software development, coordination is largely achieved by communicating with people who depend on the work that you do \cite{kraut:1995coordination}.
A successful software build can be viewed as the outcome of good coordination because the build requires the correct compilation of multiple, dependent files of source code.
A failed build, on the other hand, demotivates software developers \cite{holck2004,damian:icgse:2007} and destabilizes the product \cite{cusumano1997}.
While a failed build is not necessarily a disaster, it slows down work significantly while developers scramble to repair the issues.
A build result thus serves as an indicator of the health of the software project up until that point in time.
Thus, a developer should coordinate closely with individuals whose technical dependencies affect his work in order to effectively build software. This brings forth the idea of aligning the technical structure and the social interactions \cite{herbsleb2007:fose}, leading us to the foundation of socio-technical congruence.
\subsection{Coordination in Software Teams}
Research in software-engineering coordination has examined interactions among
software developers \cite{carter2004,marczak:re:2008}, how they acquire
knowledge \cite{ehrlich:icgse:2006,nakakoji2010:rdc}, and
how they cope with issues including geographical
separation~\cite{espinosa2007:team_knowledge,herbsleb2003:speed}.
The ability to coordinate has
been shown as an influential factor in customer satisfaction \cite{kraut:1995coordination} and improves the capability to produce quality work~\cite{faraj2000}.
Software developers spend much of their time
communicating~\cite{perry94}. Because developers face
problems when integrating different components from heterogeneous environments~\cite{redmiles2007:continuous},
developers engage in direct or indirect
communication, either to coordinate their activities, or to acquire knowledge of
a particular aspect of the software ~\cite{nakakoji2010:rdc}.
Herbsleb, et al. examined the influence of coordination on integrating software
modules through interviews~\cite{herbsleb1999:architectures}, and found that
processes, as well as the willingness to communicate directly, helped teams
integrate software. De Souza, et al.~\cite{desouza2007:awarenessnetwork} found that implicit
communication is important to avoid collaboration breakdowns and delays. Ko, et al.~\cite{ko:icse:2007} found that developers were identified as the main source of knowledge about code issues.
Wolf, et al.~~\cite{wolf:icse:2009} used properties of social networks to predict the outcome of integrating the software parts within teams.
This prior work establishes the fact that developers communicate heavily about technical matters.
Coordinating software teams becomes more difficult as the distance between people increases \cite{herbsleb:icse:2001}.
Studies of Microsoft~\cite{bird2009:dds_quality,nagappan:icse:2008}
show that distance between people that work together on a
program determine the program's failure proneness.
Differences in time zones can affect the number of defects in software projects \cite{cataldo2009:quality}.
Although distance has been identified as a challenge, advances in collaborative
development environments are enabling people to overcome challenges of distance.
One study of early RTC development
shows that the task completion time is not as strongly affected by distance as in previous studies~\cite{Nguyen:2008Distance}. Technology that empowers distributed collaboration include topic recommendations~\cite{carter2004} and instant messaging~\cite{niinimaki2008}. Processes are adapting to the fast pace of software development: the Eclipse way~\cite{frost:ieeesoftware:2007} emphasizes placing milestones at fixed intervals and community involvement.
\subsection{Socio-technical Congruence}
Socio-technical congruence is defined as the match between the coordination needs established by the technical domain and the actual coordination activities carried out by project members. Socio-technical congruence in software engineering was brought to attention by Cataldo, et al.~\cite{cataldo:cscw:2006}, though the concept has been explored in engineering \cite{browning2001} and management science \cite{henderson1990}. A coordination need indicates that two persons should be coordinating based on the technical dependencies on the project. A coordination need is determined by analysing the assignments of people to a technical entity such as a source code module, and the technical dependencies among the technical entities.
Socio-technical congruence states that if there is a coordination need between two people, these people should be coordinating.
For example, if two people work on different, but dependent components of the project, then those persons should be coordinating with each other.
If two individuals have a coordination need, but do not coordinate, then there is a gap between these two individuals. A gap suggests the existence of a coordination problem. One of the goals of socio-technical congruence is to minimize the number of gaps, either by maintaining good coordination between individuals who have a coordination need, or by reducing the number of technical dependencies in the project and therefore reducing the coordination needs~\cite{sarma2008:measuring_stc}.
What socio-technical congruence offers is an approach to measure the coordination quality~\cite{cataldo:cscw:2006}. We can use this measurement to identify the effect of socio-technical coordination on software build quality.
\subsection{Effects of Socio-technical Congruence}
Current research suggests that attaining a high level of socio-technical congruence is beneficial to an organization.
Evidence shows that higher congruence leads to faster completion of modification requests~\cite{cataldo:cscw:2006}.
The presence of gaps increases the number of code changes \cite{ehrlich:stc:2008}, and a lack of coordination connections across system and organizational boundaries have a negative effect on performance~\cite{sosa2004:manage}.
Socio-technical gaps have been found to be an issue not only because they lower
the congruence and thus lower productivity~\cite{cataldo:cscw:2006}, but because they are especially problematic in the context of distributed development~\cite{ehrlich:stc:2008}. Thus, researchers have proposed remedial actions when socio-technical congruence gaps are discovered~\cite{valetto2007:value}.
Examples of actions include closing a gap by augmenting coordination and eliminating the gap by refactoring software.
The usefulness of socio-technical congruence depends on the conceptualizations of
the social and the technical dimensions. Communication is believed to help people coordinate. However, it is not the only way to describe the social dimension.
For instance, Cataldo et al.~\cite{cataldo:cscw:2006} evaluated congruence in the context of software development using different representations of actual coordination, including geographical proximity, IRC communication, and issue-tracking comments; these factors correlate with the resolution time of modification requests. There are also variations in the way technical dependencies can be handled. Cataldo et al.~\cite{cataldo:esem:2008} used differing ways to measure architectural dependencies and found that the congruence values computed using a ``files changed together'' dependency are more reliable than call graph dependencies~\cite{deSouza2004:thwarts_collaboration}. Gokpinar, et al \cite{gokpinar2010} applied a congruence technique and discovered that a higher coordination deficit leads to a larger number of filed incident reports, implying reduced quality.
Socio-technical congruence has been explored outside of the software development field, particularly in engineering and management disciplines \cite{henderson1990,sosa2004:manage,gokpinar2010,sosa2008}. Sosa \cite{sosa2008} described a formal technique to compute socio-technical congruence and identified ``potentially unattended technical interactions'', which are technical dependencies in modules that are not monitored. Gokpinar \cite{gokpinar2010}, independently of our work, developed a weighted socio-technical congruence technique that he applies to the automotive industry.
\section{Calculating Congruence}
\label{sec:congruence}
In this section new describe how we derive the socio-technical index as used by Cataldo et al~\cite{cataldo:cscw:2006} from our technical networks by additionally giving some comparison of our approach to the original.
\subsection{Technical Entities and Social Relationships}
A technical entity is an entity in a project that can be worked on by a person. Examples of a technical entity include a source code file, a compiled binary, a requirement, a task, or a bug. Socio-technical congruence has focused on the source code file as the technical entity \cite{cataldo:cscw:2006, ehrlich:stc:2008}, although work has also examined socio-technical congruence using a requirement~\cite{damian2010:rdc,marczak2009:crossfunctional} or a task~\cite{wolf:ieee:2009} as the technical entity. The choice of technical entity depends on the context of the study.
At the core of socio-technical congruence is the concept of a technical dependency. A technical dependency is a type of dependency between two technical entities. Examples of technical dependencies are in Box \ref{ph:technicalunits}.
\begin{placeholder}[t]
\begin{itemize}
\item Requirements that depend on each other~\cite{marczak:re:2008,marczak2009:crossfunctional}
\item Source code modules changed together in a change set~\cite{cataldo:cscw:2006,cataldo:esem:2008}
\item Source code that has a call-graph dependency~\cite{deSouza2004:thwarts_collaboration}
\item Tasks that depend on other tasks \cite{wolf:ieee:2009}
\end{itemize}
\caption{Examples of technical dependencies}
\label{ph:technicalunits}
\end{placeholder}
A coordination need is a relationship that indicates that two people should be coordinating, based on the assignment of each person to a technical entity, and the technical dependencies between the entities.
Social relationships are identified through actual coordination, which indicates how people in the organization are actually coordinating. Note that, despite the nomenclature ``coordination'' used in previous work \cite{cataldo:cscw:2006}, the actual coordination matrix does not need to represent coordination at all, but merely a relationship of interest between two people in an organization. Generally, one would want to choose relationships that are of interest to the performance of the organization, hence favouring the selection of relationships such as ``communication''. Examples of actual coordination appear in Box \ref{ph:relationships}.
\subsection{Calculating Socio-Technical Congruence}
\label{sec:stc}
In Chapter~\ref{chap:meth} we described socio-technical networks and how we conceptualize them in this thesis.
If we reformulate this network into the terms originally used by Cataldo et al~\cite{cataldo:cscw:2006} the matrix representation of the technical dependencies among software developers turns into the coordination needs matrix $CN$ and the social network in matrix representation is the actual coordination matrix $AC$.
Thus we calcite the socio-technical congruence index as follows:
\[ \text{congruence} = \frac{\text{Diff}(CN, AC)} {|CN|} \]
The main difference to the original formula lies solely in our more direct approach of deriving the coordination needs matrix instead of deriving them from task relationships, that are themselves derived from source code dependencies as we used to directly relate software developers with each othery.
\begin{placeholder}[t]
\begin{itemize}
\item Communication---A communicates with B~\cite{cataldo:cscw:2006, ehrlich:stc:2008, cataldo:esem:2008,damian2007:collaboration}.
\item Location---A is in the same location as B~\cite{cataldo:cscw:2006, ehrlich:stc:2008}.
\item Team structure---A is in the same team as B~\cite{cataldo:cscw:2006}.
\end{itemize}
\caption{Examples of actual coordination}
\label{ph:relationships}
\end{placeholder}
\section{Analysis Methods}
\label{sec:methodology}
Logistic regression is ideal to test the relationship between multiple variables and a binary outcome, which in our study is a build result being either ``OK'' or ``Error''. The presence of many data entities in this project means that we must consider confounding variables in addition to the socio-technical congruence when determining its effects on the probability of build success. Informally, logistic regression identifies the amount of ``influence'' that a variable has in the probability that a build will be successful.
The two main variables we are interested in are as aforementioned the socio-techincal congruence index as well as the ratio between gaps and coordination needs, that is technical dependencies among developers that are not accompanies by a corresponding social dependency.
We show the relationship between a variable and the build success probability by plotting the y-axis as the probability. We use probability because we feel that it is more intuitive than odds ratios or logistic functions. If there is a relationship between a variable and the probability of build success, then we should see that as the variable's value increases, the probability also increases. In the probability figures, the solid line is the expected value, and the dashed lines indicate the 95\% confidence intervals.
We run two different logistic regression models: one using weighted congruence, and one using unweighted congruence. We include the following variables: number of files per build, number of authors contributing to the build, number of files in the build, number of work items per build, the congruence, the build type, and the date of the build. We centre and scale each numeric variable.
Because we were concerned about possible interactions affecting our results, we included first-order interaction effects and used backward stepwise elimination to remove variables to keep AIC (Akaike's Information Criterion) low.
\section{Results}
\label{sec:results}
In the RTC repository, we analyzed 191 builds; of these builds, 60 were error builds, and 131 were OK builds. Table \ref{tab:summary} displays summary statistics per build.
Figure \ref{fig:hist_unweighted_congruence} displays histograms for unweighted congruence, and Figure \ref{fig:hist_unweighted_congruence} shows histograms for weighted congruence. The histograms compare the frequencies for each type of congruence for all builds, the OK builds, and the error builds only. There are some minor differences between unweighted and weighted congruence values; weighted congruence, for instance, largely reduces the number of ``fully'' congruent situations where congruence is 1.
The congruence values are low on average. The unweighted congruence has a mean value of 0.331, and the weighted measure has a mean value of 0.196, meaning that about one-third and one-fifth of the coordination needs are satisfied by actual coordination, respectively. Over 75\% of the builds have a weighted congruence value of less than 0.25.
\begin{figure}[t]
\centering
\subfloat[All builds]{
\includegraphics[width=.3\columnwidth]{figures/hist_unweighted}
\label{subfig:hist_nonweighted}
}
\subfloat[OK builds]{
\includegraphics[width=.3\columnwidth]{figures/hist_unweighted_ok}
\label{subfig:hist_nonweighted_ok}
}
\subfloat[Error builds]{
\includegraphics[width=.3\columnwidth]{figures/hist_unweighted_err}
\label{subfig:hist_weighted_err}
}
\caption{Distribution of Congruence Values}
\label{fig:hist_unweighted_congruence}
\end{figure}
\begin{table}[t]
\centering
\begin{tabular}{lrrrr}
\toprule
& Min & Median & Max & Mean\\\midrule
Authors & 2 & 17 & 44 & 18.62\\
Files & 5 & 131 & 3101 & 342.3 \\
Change Sets & 4 & 34 & 226 & 54.2\\
Work items & 4 & 34 & 182 & 48.3 \\
Build date range (days) & 0 & 345 & 361 & 319.2 \\
Congruence & 0 & 0.21 & 1 & 0.331 \\
%Gap size & -0.083 & 0.190 & 1.00 & 0.317 \\
\bottomrule
\end{tabular}
\caption{Summary statistics}
\label{tab:summary}
\end{table}
\begin{table}[t]
\begin{center}
\begin{tabular}{lrrrrr}
\toprule
& 2. & 3. & 4. & 5. & 6. \\
\midrule
1. Congruence & -0.27 & -0.22 & -0.33 & 0.08 & 0.19 \\
2. Authors & --& 0.41 & 0.76 & 0.10 & -0.30 \\
3. Files & & --& 0.37 & 0.08 & -0.20 \\
4. Change sets & & & --& 0.02 & -0.38 \\
5. Work items & & & & --& 0.04 \\
6. Build date & & & & & -- \\
\bottomrule
\end{tabular}
\end{center}
\caption{Pairwise Correlation of Variables per Build}
\label{tab:pairwise}
\end{table}
We calculated pairwise correlations between the variables weighted congruence, unweighted congruence, number of authors, number of files, number of change sets, number of work items, and build date (Table \ref{tab:pairwise}). To avoid multicollinearity problems in our data, we choose to remove change sets from our logistic regression analysis because, due to the enforced processes in RTC, we know that there is exactly one author per change set, and thus there is at least as many change sets as authors per build.
To assess the fit of the logistic regression models, we use the Nagelkerke pseudo-$R^2$ and AIC. $R^2$ shows the proportion of variability explained by the model, and AIC is a measure of how well the model fits the data. Ideally, $R^2$ is high and AIC is low. Our current model contains 19 variables and has an $R^2$ of 0.581. We present our model in Table \ref{tab:models} to a model containing every first-order interaction effect with 27 variables and a model that contains the 7 main effects only (in Table \ref{tab:logr_maineffects}). We found that 19 variables is optimal and that removing further variables lowered the $R^2$ value while raising the AIC.
\begin{table}[t]
\begin{center}
\begin{tabular}{l@{\hspace{30pt}}r@{\hspace{30pt}}rr}
\toprule
Model & Variables & AIC & $R^2$ \\ \midrule
Every interaction & 27 & 188.6 & 0.595 \\
Main effects only & 7 & 213.2 & 0.269 \\
\textbf{Our model} & \textbf{19} & \textbf{175.8} & \textbf{0.581} \\
\bottomrule
\end{tabular}
\end{center}
\caption{Model comparison}
\label{tab:models}
\end{table}
\subsection{Effects of Congruence on Build Result}
\label{sec:congruence_effect_build_result}
\begin{table}[t]
\begin{center}
\small
\begin{tabular}{l@{\hspace{15pt}}rrr}
\toprule
Variable & Coef. & S.E. & \emph{p} \\
\midrule
Intercept & -0.5459 & 0.4663 & 0.2417 \\
\textbf{Congruence} & \textbf{6.3410} & \textbf{1.6262} & \textbf{**0.0001} \\
\textbf{Authors} & \textbf{-1.9759} & \textbf{0.5310} & \textbf{**0.0002} \\
\textbf{Files} & \textbf{-1.0734} & \textbf{0.4561} & \textbf{*0.0186} \\
Work~items & -0.1456 & 0.2355 & 0.5363 \\
\textbf{Build type=I} & \textbf{2.1533} & \textbf{1.0526} & \textbf{*0.0408} \\
Build type=N & 4.6833 & 200.7587 & 0.9814 \\
Build date & -0.6560 & 0.6709 & 0.3282 \\
\textbf{Congruence * Build type=I} & \textbf{-9.2151} & \textbf{2.5572} & \textbf{**0.0003} \\
Congruence * Build type=N & -7.7308 & 91.8053 & 0.9329 \\
\textbf{Congruence * Build date} & \textbf{-5.1266} & \textbf{1.9290} & \textbf{**0.0079} \\
Authors $\cdot$ Build type=I & 1.2688 & 0.7028 & 0.0710 \\
Authors * Build type=N & 105.4123 & 535.8792 & 0.8441 \\
Authors * Build date & -0.6061 & 0.3616 & 0.0937 \\
Authors * Files & 0.7663 & 0.4289 & 0.0740 \\
Files * Build type=I & 1.0920 & 1.1838 & 0.3563 \\
Files * Build type=N & -37.9274 & 199.2314 & 0.8490 \\
\textbf{Work~items * Build date} & \textbf{0.8040} & \textbf{0.3003} & \textbf{**0.0074} \\
\textbf{Build type=I * Build date} & \textbf{2.6442} & \textbf{0.7678} & \textbf{*0.0006} \\
Build type=N * Build date & 84.7252 & 344.8129 & 0.8059 \\
\bottomrule
Model likelihood ratio & 101.92 & & $R^2=0.581$ \\
& \multicolumn{3}{c}{191 observations} \\
\multicolumn{1}{l}{ } & \multicolumn{3}{l}{\scriptsize{Build type is set to continuous}} \\
\multicolumn{1}{l}{\scriptsize{*$p < 0.05$; **$p < 0.01$}} & \multicolumn{3}{l}{\scriptsize{Nagelkerke is used as the pseudo-$R^2$ measure}}
\end{tabular}
\end{center}
\caption{Logistic Regression models predicting build success probability with main and interaction effects}
\label{tab:logr}
\end{table}
\begin{table}[t]
\begin{center}
\begin{tabular}{l@{\hspace{15pt}}rr r}
\toprule
Variable & Coef. & S.E. & \emph{p} \\
\midrule
Intercept & 0.5265 & 0.3040 & 0.0833 \\
Congruence & 0.9371 & 0.6807 & 0.1686 \\
\textbf{Authors} & \textbf{-0.5702} & \textbf{0.2003} & \textbf{**0.0044} \\
\textbf{Files} & \textbf{-0.6398} & \textbf{0.2477} & \textbf{**0.0098} \\
Work~items & -0.1755 & 0.1713 & 0.3055 \\
Build type=I & 0.1693 & 0.4269 & 0.6917 \\
Build type=N & 0.2133 & 0.7791 & 0.7842 \\
Build date & -0.1331 & 0.1821 & 0.4649 \\
\bottomrule
Model likelihood ratio & 40.59 & & $R^2=0.269$ \\
& \multicolumn{3}{c}{191 observations} \\
\multicolumn{1}{l}{ } & \multicolumn{3}{l}{\scriptsize{Build type is set to continuous}} \\
\multicolumn{1}{l}{\scriptsize{*$p < 0.05$; **$p < 0.01$}} & \multicolumn{3}{l}{\scriptsize{Nagelkerke is used as the pseudo-$R^2$ measure}}
\end{tabular}
\end{center}
\caption{Logistic Regression models predicting build success probability with main effects only}
\label{tab:logr_maineffects}
\end{table}
The result of logistic regression indicates that the following effects are significant for both unweighted and weighted congruence models: The congruence~$\times$~build type effect, the congruence~$\times$~build date interaction effect, the number of work~items~$\times$~build date interaction effect, and the build date~$\times$~build type effect. In addition, the number of authors and the number of files are significant main effects, although their coefficients are lower than the interaction effects involving congruence. We also identify unweighted congruence as a significant main effects in the unweighted congruence model (see Table~\ref{tab:logr}).
In the next section we discuss the main effects and interactions effects that involve congruence affecting build probability. We discuss the effects of the non-congruence effects, including the authors, files, work~items~$\times$~date interaction effect and the date~$\times$~nightly~build effect in Section \ref{sec:otherfactors}.
\subsubsection{Effects of interactions involving congruence}
\label{sec:congruenceinteractions}
The type~$\times$~congruence interaction effect, the date~$\times$~congruence interaction, and the type $\times$ date effect are each significant in our model (Table \ref{tab:logr}). We plot in Figure~\ref{fig:unweighted_congruence_typeci_age} the effects of congruence vs. probability of build success at the 10\% date quantile (2008-01-25), at the 25\% date quantile (2008-05-14), the 50\% date quantile (2008-06-07), and the latest build (2008-06-26).
\begin{figure}[t!]
\centering
\subfloat[ \small{2008-01-25} ]{
\includegraphics[width=.4\columnwidth]{figures/prob_unweighted_age_typeci_q010}
\label{subfig:prob_unweighted_age_typeci_q010}
}
\subfloat[ \small{2008-05-14} ]{
\includegraphics[width=.4\columnwidth]{figures/prob_unweighted_age_typeci_q025}
\label{subfig:prob_unweighted_age_typeci_q025}
}
\subfloat[ \small{2008-06-07} ]{
\includegraphics[width=.4\columnwidth]{figures/prob_unweighted_age_typeci_q050}
\label{subfig:prob_unweighted_age_typeci_q050}
}
\subfloat[ \small{2008-06-26} ]{
\includegraphics[width=.4\columnwidth]{figures/prob_unweighted_age_typeci_q100}
\label{subfig:prob_unweighted_age_typeci_q100}
}
\caption{Estimated probability of build success for \emph{congruence} and \emph{continuous builds C} or \emph{integration builds I} over time, adjusted to authors $\approx$ -0.156 (17 authors), files $\approx$ -0.352 (131 files), work~items $\approx$ -0.399 (34 work items)}
\label{fig:unweighted_congruence_typeci_age}
\end{figure}
The congruence model (Table \ref{tab:logr}) the effect of congruence on continuous builds is significant, and that increasing congruence also increases the probability that a continuous build will succeed.
For integration builds (Figures \ref{fig:unweighted_congruence_typeci_age}, in black), an increase in congruence decreases build success, with the exception of the 2008-01-25 build (Figure \ref{subfig:prob_unweighted_age_typeci_q010}). In in our 2008-01-25 build, we see that low congruence leads to low build probability, but high congruence has high build probability. As the project ages, this trend reverses and congruence is clearly inversely related with build success probability (Figure \ref{subfig:prob_unweighted_age_typeci_q100}).
The effect of congruence is totally opposite for continuous builds and integration builds. Based on Figure \ref{subfig:prob_unweighted_age_typeci_q100}, increasing congruence significantly improves the continuous build success rate. However, increasing congruence significantly decreases the integration build success rate.
\subsection{Effect of Gap Ratio on Build Result}
\label{sec:gapsizeresult}
We build logistic regression models based on the model in Table \ref{tab:logr} using the gap ratio measurement (percentage of unmet coordination needs) . In the interest of saving space, we report only the odds ratio. We retain every significant interaction from our previous weighted congruence logistic regression in Table \ref{tab:logr}.
The effect of gap ratio on build result is significant (Table \ref{tab:oddsratio_gapsize}). This indicates that increasing the gaps ratio significantly increases the odds that an OK build will occur, which is the opposite of what we hypothesized (Figure \ref{fig:prob_gapsize_a}). This means that if the gap size is large, the build success probability increases.
\begin{table}[t]
\begin{center}
\begin{tabular}{lrr}
\toprule
& Model\\
\midrule
Intercept & 1.32 \\
Authors & 0.60 \\
Files & 0.63 \\
Work~items & 0.85 \\
Build type=I & 1.31 \\
% Weighted cong & 8.41 & - \\
Gap ratio & 8.71 \\
Build date & 0.59 \\
Authors * Build date & 0.74 \\
Work~items * Build date & 1.83 \\
Build type=I * Build date & 2.52 \\
% Build type=I * Weighted cong & 0.01 & - \\
% Weighted cong * Build date & 0.00 & - \\
\bottomrule
\end{tabular}
\caption{Odds Ratio for Gap Ratio Models}
\label{tab:oddsratio_gapsize}
\end{center}
\end{table}
\begin{figure}[t]
\centering
\includegraphics[width=.5\columnwidth]{figures/boxplot_meangapsize}
\caption{Gap Ratio per Build}
\label{fig:gapsizes}
\end{figure}
\begin{figure}[t]
\centering
\includegraphics[width=.5\columnwidth]{figures/prob_gapsize_g1}
\caption{Effect of gap ratio on build success probability. }
\label{fig:prob_gapsize_a}
\end{figure}
\subsection{Social and Technical Factors in RTC Affecting Build Success and Congruence}
\label{sec:otherfactors}
In light of our results, we examine not only the number of work~items~$\times$~date significant interaction found in Section \ref{sec:congruence_effect_build_result}, but different social and technical factors that may affect congruence
and build success probability to find explanations for the interactions between socio-technical congruence and build success probability in RTC.
Specifically, we examine the effect of build date on work items, coordination around fully-congruent builds and
incongruent builds, and the effects of commenting behaviour on builds.
\subsubsection{Other Effects on Build Success}
\label{sec:effectauthors}
\begin{description}
\item[Authors] As the number of authors involved in a build increases, the probability that the build succeeds decreases. The build probability is significantly lowered after more than 15 authors are involved in the build (Figure \ref{subfig:prob_weighted_authors_age_q100}). When over 30 authors are involved in the build, the estimated build success probability falls under 10\%.
\item[Files] As the number of files involved in a build increases, the probability that the build will succeed decreases (Figure \ref{subfig:prob_weighted_files_age_q100}).
\item[Build Date and Work items] The work~items~$\times$~date interaction is significant. Early in the project, as the number of work items increases, the probability of build success decreases (Figure \ref{subfig:prob_weighted_workitems_age_q010}). As the project ages, this trend reverses and as the number of work items increases, the probability of build success increases as well (Figure \ref{subfig:prob_weighted_workitems_age_q100}). According to the coefficients in Table \ref{tab:logr}, this effect on build success probability is not as strong as the authors main effect or the files main effect.
\end{description}
\begin{figure}[t]
\centering
\subfloat[ \small{Authors} ] {
\includegraphics[width=.45\columnwidth]{figures/prob_weighted_authors_age_q100}
\label{subfig:prob_weighted_authors_age_q100}
}
\subfloat[ \small{Files} ] {
\includegraphics[width=.45\columnwidth]{figures/prob_weighted_files_age_q100}
\label{subfig:prob_weighted_files_age_q100}
}
\caption{Estimated probability of build success for \emph{authors} and \emph{files}, congruence. Adjusted to work~items $\approx$ -0.399 (34), authors $\approx$ -0.156 (17), files $\approx$ -0.352 (131), congruence $\approx$ 0.1446, type = cont, date=2008-06-26}
\label{fig:weighted_congruence_authors_age}
\end{figure}
\begin{figure}[t]
\centering
\subfloat[ \small{2008-01-25} ]{
\includegraphics[width=.45\columnwidth]{figures/prob_weighted_workitems_x_age_q010}
\label{subfig:prob_weighted_workitems_age_q010}
}
\subfloat[ \small{2008-06-26} ]{
\includegraphics[width=.45\columnwidth]{figures/prob_weighted_workitems_x_age_q100}
\label{subfig:prob_weighted_workitems_age_q100}
}
\caption{Estimated probability of build success for \emph{work items} and \emph{date}, congruence. Adjusted to authors $\approx$ -0.156 (17), files $\approx$ -0.352 (131), congruence $\approx$ 0.1446, type = cont}
\label{fig:weighted_congruence_workitems_age}
\end{figure}
\subsubsection{Examining Extreme Congruence Values}
\label{sec:extremecongruence}
We are interested in the differences between high-congruence builds and low-congruence builds.
We further this investigation by looking at builds that have extreme values of congruence: zero, where absolutely no coordination needs are satisfied with communication, and one, where every coordination need is satisfied with communication.
We chose to investigate the extreme cases to see if there were differences in the way people coordinated in fully-congruent builds, and in incongruent builds.
Table \ref{tab:congruence_extremes} shows the number of OK and error builds that occurred when congruence was equal to one, and equal to zero. The weighted builds with full congruence are a subset of the unweighted builds with full congruence.
\begin{table}[t]
\centering
\begin{tabular}{llrr}
\toprule
& & \multicolumn{2}{c}{Congruence} \\\midrule
& & 1 & 0 \\\midrule
\multirow{2}{*}{Unweighted} & OK & 26 & 30 \\
& ERR & 2 & 2 \\%\midrule
%\multirow{2}{*}{Weighted} & OK & 6 & 30 \\
% & ERR & 1 & 2 \\
\bottomrule
\end{tabular}
\caption{Number of Builds with Congruence Values 0 and 1}
\label{tab:congruence_extremes}
\end{table}
To determine if the presence of commenting affected the builds, we examined the number of comments on work item--change set pairs in builds with extreme unweighted congruence values. Our results are shown in Table \ref{tab:changeset_commenters}. Build success probabilities improve with respect to builds that have no comments, though work items with no comments are in the minority.
Of note is the high number of comments on work items that have zero congruence. This indicates that individuals who have no technical relationship to the work item are commenting on the work item.
\begin{table}[t]
\centering
\begin{tabular}{ll@{\hspace{40pt}}rr@{\hspace{40pt}}rr}
\toprule
& & \multicolumn{2}{@{\hspace{-40pt}}c}{Num. of Pairs} & \multicolumn{2}{c}{Success rate} \\
Congruence & & 1 & 0 & 1 & 0 \\\midrule
\multirow{2}{*}{No comments} & OK & 42 & 143 & \multirow{2}{*}{49\%} & \multirow{2}{*}{69\%} \\
& Error & 43 & 64 & & \\\midrule
\multirow{2}{*}{Comments} & OK & 610 & 445 & \multirow{2}{*}{68\%} & \multirow{2}{*}{69\%} \\
& Error & 290 & 199 & & \\\midrule
\multirow{2}{*}{Total} & OK & 652 & 588 & \multirow{2}{*}{66\%} & \multirow{2}{*}{69\%} \\
& Error & 333 & 263 & &\\\bottomrule
\end{tabular}
\caption{Number of work items-change set pairs with comments and build success probabilities for congruence 0 and 1}
\label{tab:changeset_commenters}
\end{table}
We manually inspected the work items with extreme amounts of congruence, reading the comments for any differences in the content discussed. Unfortunately, there were no obvious qualities between comments made in a build with a congruence of zero, and comments made in a build with a congruence of one. In both builds, individuals discussed technical implementation details, provided updates to colleagues, or requested assistance from colleagues. We are unable to discover root causes of failure without a deeper examination of the technical changes and more knowledge of the RTC context.
\section{Discussion}
\label{sec:discussion}
The concepts illustrated in Conway's Law, as well as previous empirical work on socio-technical congruence lead us to expect that team members must coordinate according to coordination needs suggested by technical dependencies in order to build software effectively.
In this case study, we applied socio-technical congruence to study coordination and its relationship to build success probability in RTC. We applied a modified weighted congruence measurement to study also how the size of a coordination gap affects build success probability, and investigated what social and technical factors in RTC affect congruence and builds.
Overall, we found that the average congruence across builds was very low---only 20--30\% of the coordination needs in the project were fulfilled with actual coordination. Even in the cases where there is zero congruence, the build result was an OK build in over 90\% of the observed cases.
We found that there was an interaction effect involving congruence and build type on build success probabilities (Section \ref{sec:congruenceinteractions}). For continuous builds, increasing congruence improves the chance of build success in continuous builds and can actually decrease build success probability in integration builds (Figures \ref{fig:unweighted_congruence_typeci_age}). High unweighted congruence significantly improves continuous build success probability, and both unweighted and weighted congruence significantly reduce integration build success probability.
The gap ratio is a representation of whether enough coordination
occurred to fulfill multiple coordination needs. If two developers have multiple dependencies on each other, one would expect them to
coordinate more often as well.
We hypothesized that a small gap ratio would increase the probability of successful builds and that a large gap ratio would decrease the probability of a successful build. Instead, we found that as the mean gap size increases, the build success probability also increases (Figure \ref{fig:prob_gapsize_a}).
Below we discuss the reasons for these observed results based on our knowledge of RTC.
\subsection{Strong Awareness Helps Coordination}
The overall congruence for the majority of builds is low: over 75\% of
builds have a congruence of less than 0.25 (Section \ref{sec:congruence_effect_build_result}).
Despite low congruence, the RTC team is able to successfully build its software in many situations.
When we examined extreme congruence values, we observed 85\% build success probability when weighted congruence is 1 and 93\% build success when weighted congruence is 0 (Section \ref{sec:extremecongruence}).
If socio-technical congruence is a measure of coordination quality in software, and builds rely on coordination quality to be successful, then there must be reasons why builds can succeed even when the congruence is zero.
First, because RTC is a highly-distributed project, the product under development uses a modular design \cite{maccormack2006} and thus is affected less by dependencies. Second, team members in RTC do not conduct all of their coordination through \emph{explicit communication} even though work item inspection and discussion with developers indicate that the RTC corporate culture focuses on the work item as their base for communication. Rather, they use the \emph{shared workspace} that incorporates cues from the environment and from peers in order to address technical issues. Both of these effects may contribute to congruence being lower than expected.
\subsubsection{RTC supports Explicit Communication}
The RTC team members use the RTC environment extensively to communicate with each other. We were informed that RTC team members rarely use private email, and our inspection of the mailing list reveals that its primary purpose is for announcements such as server outages rather than for discussing technical work.
This leaves the RTC work item comment system and instant messaging as methods for communication, as well as the phone and internal face-to-face meetings.
We learned that while face-to-face interaction is efficient for solving local issues, it does not benefit remote teams, and the RTC team as a whole encourages every team member to record face-to-face discussions as comments for the purpose of archiving and sharing information.
However, explicit coordination has a cost. There is evidence that involving too many authors in the same build also reduces the build success when using a weighted congruence conceptualization (Figure \ref{subfig:prob_weighted_authors_age_q100}); the effect for unweighted congruence is similar. The overhead required to coordinate many people may interfere with the ability of the team to build the project successfully, suggesting that there is a limit before a developer is overloaded with information.
\subsubsection{RTC is a Shared Workspace}
The RTC client software helps a developer acquire and maintain \emph{environmental awareness} of what is going on in the project by providing access to a shared workspace. Much of the work is centred around the RTC technical entities, which include plans, source code, work items, and comments.
RTC's awareness mechanisms feature a developer-centred dashboard that reports changes to the workspace, built-in traceability, user notifications, regularly-generated reports, and an optional web browser interface. For example, when a change set is created, it is attached to a work item, thus ensuring that people who are involved with the work item receive notification of this change set. These automatic notifications cut down the amount of explicit communication and allow people to coordinate implicitly.
Coordinating using the workspace is well-known in the computer-supported cooperative work domain \cite{schmidt1996}. Open-source developers, in particular, coordinate around source code~\cite{bolici:stc:2009} and mailing lists \cite{gutwin2004:awareness,mockus2002:opensource} because there is little opportunity for face-to-face interaction. RTC shares many characteristics with open-source development, such as a distributed team and a transparent development process.
In light of these results, we believe that, using our conceptualization, the RTC team requires a congruence of only 0.2--0.3 for their tasks to be completed.
Much of the need for explicit, point-to-point communication is mitigated by implicit communication and the use of the workspace to coordinate.
We expect that the remaining congruence is covered through the RTC workspace, and through face-to-face communication, instant messenger, and phone communication. Though our congruence value appears low for the RTC team, the coordination in reality may be higher. Future studies should keep in mind that congruence may be lower than expected because of conceptualizations that cannot include every type of coordination in a project.
\subsection{Coordination and Geographic Distribution}
As RTC is a distributed team, geographic distribution has an effect on team performance, though the RTC environment helps mitigate some of these effects \cite{Nguyen:2008Distance}.
We learned from the RTC project that continuous and nightly builds should involve mainly a co-located team, and that integration builds involve multiple components from RTC teams in different locations. Our results suggest that congruence best benefits builds that occur within co-located teams; however, the design of our study does not allow us to draw a firm conclusion about the influence of both co-location and congruence on build success probability.
It appears that involving too many individuals when coordinating the activities of various teams may harm build success due to information overload~\cite{damian:icgse:2007}, especially when the team members are distributed. To negate this effect, development leaders and build managers that have an overall view of the project are suited to coordinate teams to ensure build success \cite{hinds:cscw:2006}.
\subsection{Project Maturity and Build Success}
We found that early builds exhibited a different type of relationship between congruence and build success probability than later builds (Section \ref{sec:congruenceinteractions}). Over the course of the study, we observed 13 internal milestones; the last milestone in our observed builds was a public beta release for end users.
Build success probability decreased significantly over time for continuous builds and stayed roughly the same for integration builds (Figures \ref{fig:unweighted_congruence_typeci_age}).
However, the early builds in the project behaved contrary to later builds in the project (Figure \ref{subfig:prob_unweighted_age_typeci_q010} and \ref{subfig:prob_unweighted_age_typeci_q010}). The RTC software early in its lifetime is in a state of change. Integration builds are not a priority, and features are being added to the project. This means that dependencies are changing rapidly, as well as the expertise among team members, making it difficult to solidify coordination needs.
In addition to interactions between congruence and type, and congruence and date, we observed a significant interaction effect between build date and work items.
We found that early in the project (Figure \ref{subfig:prob_weighted_workitems_age_q010}), builds with large numbers of work items have a high probability of failing, but late in the project (Figure \ref{subfig:prob_weighted_workitems_age_q100}), these builds succeed. Because the latest release was focused on a public release, a build linked to numerous work items may indicate that a bug is highly problematic or a feature is highly desired, and therefore received more attention.
\subsection{Threats to Validity}
\label{sec:threats}
Socio-technical congruence can be difficult to compare between studies. Conceptualizations, such as technical dependencies and actual coordination, vary from project to project. Some project-specific conceptualizations we used are the build as a measure of success, the fact that only files that are touched by more than one individual qualified as having dependencies on each other, and the work item comments connecting people in a clique.
Because of the context-sensitive nature of socio-technical congruence, especially with respect to the construction of communication and dependencies, it is difficult to apply socio-technical congruence as a benchmark for coordination. Having an understanding of the context of the project is extremely important when interpreting results obtained from socio-technical congruence calculations.
Our coordination needs are likely overestimated due to the modular nature of RTC.
As we are studying a distributed project that follows a transparent development process, RTC uses a modular design \cite{maccormack2006} and a change in a file in a change set may not necessarily be dependent on other changes within the same file.
We observed evidence in our study that the dependencies among change sets as well as the attached work items may not be as strong as we have originally believed.
Our conceptualization of actual coordination underestimates the amount of communication that truly occurs in the project.
We relied on repository data in RTC and were unable to conceptualize forms of communication such as instant messenger, phone calls, and face-to-face interactions.
Due to geographical distance and time zone differences, the RTC team's primary mode of collaboration is the work item comment system, but we made a number of assumptions about commenting behaviour.
First, we assume that everyone involved in the work item reads every comment. Second, we do not take into consideration any additional coordination that may occur from a silent onlooker reading the comments. We believe intuitively that the first effect is greater than the second.
Another threat to validity is that our data does not cover every build executed in the lifetime of RTC. RTC does not keep a full archive of build results, and as a consequence, we do not have a full population from which to draw data from. The threats are particularly high for early data points. The large confidence intervals in some of the builds, namely nightly builds, reflect this lack of data, thus we hesitate to draw conclusions based on early builds and nightly builds.
Finally, this study is a single case study and the results are not generalizable, nor can they be directly compared to existing studies due to the differences in the projects under examination. However, we believe that this study advances the theoretical and the empirical examination of socio-technical congruence, and raises a number of questions that are worthwhile for future study.
\section{Conclusion}
\label{sec:conclusion}
We end this chapter by bringing it back to the initial research question we set out to answer:
\begin{description}
\item[RQ 1.2:] Does Socio-Technical Networks influence build success?
\end{description}
We conducted two investigations: (1) the investigation the influence of the socio-technical congruence index on build success and (2) the investigation of gaps uncovered by socio-technical networks and their influence on build success.
Both avenues of investigations showed that they expose an influence on build success even in the presence of other measures of build size.
These findings, especially that gaps within socio-technical networks influence build success, opens the door to formulating an approach on how to leverage the concept of socio-technical congruence to improve the communication among software developers.
Thus in the next chapter we will detail an approach that we propose to improve developer communication using the concept of socio-technical congruence.