«ABSTRACT The purpose of this research study is to perform a comparison and contrast analysis on three F. Davis Cardwell published longitudinal case ...»
Case studies are useful for research, because they explore the problem in real-world conditions with all of the complexity inherent therein, whereas experiments are usually structured so as to explore the problem in a strictly controlled environment. One problem that is caused by this complexity is that it then becomes difficult to draw conclusions from the results of the case studies, because the inevitable differences induced by real-world conditions introduce confounding variables into the results. This difficulty is exacerbated when exploring empirical processes. Because empirical processes are usually never executed in precisely the same way, making comparisons between results of case studies examining empirical processes leads to questions of internal validity; did the changes we observe occur because of deliberate changes to the process, or did they occur due to incidental changes to the process caused by real-world conditions, which were not the intended subject of the study? The goal of the evaluation framework is to minimize the impact of these confounding factors on the results of the studies, in order to make the conclusions drawn from them more practically useful.
The XP evaluation framework [XP-EF] is composed of three parts: XP Context Factors [XP-cf], XP Adherence Metrics [XP-am] and XP Outcome Measures [XP-om].
Figure 14. The XP Evaluation Framework (Williams, Krebs, & Layman, 2004, p.
2) The goal of the XP-cf component is to quantify the ways in which software development projects can
differ. XP context factors are organized into seven categories:
1. Software classification: Software that is developed to solve a particular type of problem often has different characteristics from software that is developed to solve another type of problem, e.g., office productivity software has different characteristics compared to software designed to supervise an industrial process. This category captures and assigns a weight to the type of development project being evaluated.
THE PERFORMANCE OF AGILE METHODS:
COMPARISON TO TRADITIONAL DEVELOPMENT METHODS
2. Sociological: Captures and assign a weight to personnel factors of the development team.
Factors such as level of education, experience with the problem domain, team size, and attrition rate can affect the outcome of a project.
3. Project-specific: Captures particulars of the specific project being evaluated, e.g., project size, project budget, and project schedule.
4. Ergonomic: Captures and assign weights to the specific development environment. For example, agile methods stress collaboration, which may be impeded by a development environment consisting of private cubicles and offices.
5. Technological: Captures and assign weights to differing processes and tools used during development. Information regarding specific fault prevention methods, reusable code libraries, and project management factors are recorded here.
6. Geographical: Captures and assign weights to geographical factors affecting the development process. For example, agile methods usually stress face-to-face communication and pair programming, which is the practice of having two developers use one computer, where one produces code and the other checks the code for faults; these aspects of agile development may be hindered by distributed teams or language barriers.
7. Developmental: Captures the natural development method for the project under evaluation.
Based on the work of Boehm and Turner (2003) this category plots five project factors:  team size,  criticality,  personnel understanding,  dynamism, and  culture onto a polar chart, as shown in figure 15. When the five data points are connected, the resulting shape indicates the project’s optimal method. Shapes that cluster near the center of the graph are suggestive of an agile development method, while shapes that cluster toward the edges of the graph are suggestive of a plan-driven method. Shapes where some portions cluster toward the edges while other portions cluster near the center suggest the usefulness of a hybrid method containing aspects of both traditional and agile methods. (Williams, Krebs, & Layman, 2004, pp. 2 – 15) Figure 15.
Example Developmental Factors Polar Chart
THE PERFORMANCE OF AGILE METHODS:
COMPARISON TO TRADITIONAL DEVELOPMENT METHODSThe goal of the XP-am component is to quantify the thoroughness in which agile practices are implemented within the specific software development project under evaluation. It is not unusual for software projects using agile methods to implement only a subset of agile practices. Another common situation is that a project implements all recognized agile practices, but does not implement them consistently. The XP-am component is divided into two components: objective measures such as test lines-of-code/source lines-of-code, iteration length, and pairing frequency, and subjective measures such as surveys’ responses to questions concerning the perceived frequency of use of agile practices during the project under evaluation. (Williams, Krebs, & Layman, 2004, pp. 15 – 19) The Shodan Adherence Survey [see Appendix C] is an anonymous survey used within the XP-EF to gather XP adherence information from team members. (Boehm & Turner, 2003) Additionally, the XP-EF uses semi-structured interviews of the development team members to elicit supplemental qualitative data related to the project under evaluation.
The goal of the XP-om component is to quantify the outcome of the specific software development project under evaluation. The XP-om asks five questions about the outcome of the development
project being evaluated:
1. Does the pre-release quality improve when a team uses XP practices?
2. Does the post-release quality improve when a team uses XP practices?
3. Does programmer productivity improve when a team uses XP practices?
4. Does customer satisfaction improve when a team uses XP practices?
5. Does team morale improve when a team uses XP practices?
Of these measures, both pre- and post-release quality and programmer productivity are quantitative, based on the quantification of the data throughout the evaluation framework. Customer satisfaction and team morale are qualitative measures, based on data gathered during interviews with the development team. (Williams, Krebs, & Layman, 2004, pp. 19 – 22)
FINDINGS FROM THE CASE STUDIESThe combined XP-om results for the studies are included in Appendix D. The XP-am results for the three central case studies are combined and presented in Appendix E.
The first of the three central studies took place between 2001 and 2004 as a cooperative research effort between Sabre Airline Solutions and North Carolina State University. The research team studied the third and ninth releases of a scriptable graphical user interface environment used by external customers to develop customized end-user business software. The third release, hereafter referred to as “Sabre-A Old,” was developed by a 10-person team over an 18-month period beginning in 2001.
The ninth release, hereafter referred to as “Sabre-A New,” was developed by a 10-person team consisting of some of the original team members from Sabre-A Old, beginning in 2003. The Sabre-A Old release was developed using a Waterfall software development method, whereas the Sabre-A New release was developed using XP. In the 18-month period between the releases, the development team became first familiar with, and then proficient in, XP practices. The “A” suffix is used to identify these projects because both the project size and team size place the projects within the category of projects that are considered characteristically agile. (Layman, et al., 2004a., pp. 1 – 3) The second of the three central studies took place between 2003 and 2004 as a cooperative research effort between Sabre Airline Solutions and North Carolina State University. The research team studied the 13th release of a large web application that was combined with a back-end batch component; the combined project consisted of over a million lines of executable code. Development on this release lasted for approximately five months. The team for this project consisted of 15 developers, one dedicated tester, and several specialists. This release will hereafter be referred to as “Sabre-P.” At the time of this project, the development team had approximately 20 months of experience in the XP
THE PERFORMANCE OF AGILE METHODS:COMPARISON TO TRADITIONAL DEVELOPMENT METHODS
practices used. The results from this project are compared to two sets of published industry averages:
industry averages documented by Capers Jones, and the Bangalore SPIN group. (Layman, et al., 2004b., p. 8) These two sources were chosen because of their availability, and also because the data contained therein is similar enough to the XP-EF format to allow meaningful comparison. The “P” suffix was used to identify this project because the project size and team size placed the project within the category of projects that are considered characteristically plan-driven. (Layman, et al., 2004b., pp.
1 – 3) The last of the three central studies took place between 2001 and 2004 as a year-long cooperative research effort between IBM and North Carolina State University. The research team studied the second and third releases of a proprietary software component developed under contract to another IBM organization, where the component eventually became part of a software package marketed to external customers. The second release, hereafter referred to as “IBM Old,” was developed by an 11person team. The third release, hereafter referred to as “IBM New,” was developed by a seven-person team consisting of some of the original team members from IBM Old. The IBM Old release was developed using a Waterfall software development method with some informal small-team practices that resemble XP. The IBM New release was developed using a “safe” subset of XP practices that was appropriate for the corporate culture in place at the time of the study. The results of the developmental evaluation of these projects place them within the category of projects that are best served by using a hybrid process, containing both plan-driven and agile processes. (Williams, Krebs, et al., 2004, pp. 1 – 5) According to the original published studies, The Sabre A team experienced a 65% improvement in pre-release quality, a 35% improvement in post-release quality and a 50% improvement in productivity as a result of the implementation of XP practices. (Layman, et al., 2004a., p. 8) The Sabre P team reported an improvement in total defect density, similar pre-release quality, and similar productivity when compared to industry averages compiled by the Bangalore SPIN group. When compared to industry averages compiled by Jones, the Sabre P team reported an improvement in total defect density, similar pre-release quality, and higher productivity as a result of the implementation of XP practices, although defect removal efficiency fell. (Layman, et al., 2004b., p. 8) The IBM team reported that pre-release quality doubled and post-release quality improved by 39%. Productivity was measured three ways: as a function of lines of code, as a function of user stories, and through the calculation of the Putnam Productivity Parameter. (Putnam & Myers, 1992) Employing user stories, productivity improved by 34%. Using lines of code, productivity improved by 70%. Using the Putnam Productivity calculation, productivity improved by 91% as a result of the implementation of XP practices. (Williams, Krebs, et al., 2004, p. 8)
COST COMPARISONSThe performance metric most directly affecting development costs for a software development project is productivity. Simply put, productivity is a measure of efficiency; how much useful output is produced by a process in proportion to the inputs to the process. If a process can produce more output in relation to the same level of input, or if it can produce the same level of output with less input, then it is said the be more productive. An input into a process represents a business cost, whether it is raw materials or developer labor; anything that more effectively uses these assets reduces costs.
(Farnham, 2010) Dillman (2003) points out that the focus of the business world is to do things better, faster, and cheaper, which corresponds to the traditional project management “triple constraint” of scope, schedule and cost. (Schwalbe, 2011, p. 8) It is important to recognize that these three facets are interrelated; for any project, it is possible to alter two of these facets, with an induced change in the third being the cost of altering the first two. All else being equal, “if you want it faster and cheaper, it will cost you quality. If you want it faster and better, it will cost you money. If you want it cheaper and better it will cost you time.” (Dillman, 2003, p. 6)
THE PERFORMANCE OF AGILE METHODS:COMPARISON TO TRADITIONAL DEVELOPMENT METHODS
Productivity for the cases studied is measured as a function of thousands of lines of executable code [KLOEC] divided by person-months [PM.] The person-month is a measure of the effort needed to complete a task, e.g., if it takes two people three months to complete a task, the person-month metric for the task is equal to the product of 2 and 3, i.e., 6.