ࡱ > q` 0 v bjbjqPqP F& : : ^ $ $ $ 8 C C C 8 C D 8 } F HH " jH jH jH PK PK PK *} ,} ,} ,} ,} ,} ,} $ h T P} i $ PK J @ .K " PK PK P} jH jH } rN rN rN PK l jH $ jH *} rN PK *} rN rN 2q , $ s jH F + C L Nr " x } 0 } pr f G L Z G D s G $ s rN PK PK PK P} P} M v PK PK PK } PK PK PK PK 8 8 8 0 C 8 8 8 C 8 8 8 Modelling Software Fault Dependency Using Lag Function Received: 14/12/2006 Accepted: 27/3/2007 Omar Shatnawi* Assistant professor, Department of Computer Science, Al al-Bayt University.* 1. Introduction: The importance of modelling and analysis of software failure occurrence or fault detection (removal) phenomenon has been well recognized and many studies have addressed this problem. An important objective of most of these investigations has been to develop analytical models for the fault detection phenomenon in order to compute quantities of interest such as the number of faults detected/removed, the number of remaining faults and the software reliability function. Such quantities are useful for planning purposes, in both the development and the operational phases of the software systems. A fault in software leads to an output, which is different from specifications and requirements. The testing phase aims at identifying and removing these faults that occur in implementing requirements, design specifications etc. During this phase there is a need for a tool to monitor the progress of testing phase through quantifying various reliability measures of the software system such as reliability growth, remaining number of faults, mean time between failures etc. An software reliability growth model (SRGM) that provides a mathematical relationships between the number of faults removed and the testing time (CPU time or calendar time) has been used as a tool for the purpose. A number of SRGMs have been developed in the literature, under particular set of assumptions and testing environments and many of them are based on the non-homogeneous Poisson process (NHPP) assumptions (Musa et al.,1987; Xie, 1991; Kapur et al., 1999; Pham, 2000). More SRGMs are being proposed to capture the variability of growth curves and to explain the cause of this variability. SRGMs are fitted to the historical software reliability data collected during testing to estimate the parameters. Models based on more realistic assumptions fit the data set better. Hence more models are developed for closer depiction of testing environment. There is a class of NHPP models assumes that faults present in the software are of the same types. There is a class of NHPP models assumes that faults present in the software are of different types. In some category, the faults with respect to the level of difficulty and time taken for removal (Kapur et al., 1995 (2004)), others do so by distinguishing the means of fault identification (Ohba, 1984; Kapur and Garg, 1992). Ohba, (1984) categorized these faults as independent and dependent faults. Dependent faults can be removed only after some faults lying on that path are detected. Kapur and Garg, (1992) assumed that more faults could be removed during the checking of code for identification of cause of a failure. It is a fact that different categories of faults exist in a software. In this paper, we develop an SRGM by modifying the above assumption. We assume that two types of faults exist in a software viz. leading faults and dependent faults. Leading faults are those that cause failures and dependent faults are detected upon modification of leading faults. The model is developed as a two-stage process. This type of modelling was first done by (Kapur and Younes, 1995). We propose a more general model by incorporating time dependent lag function into the second stage, i.e., during the modelling of dependent faults detection process. The proposed SRGM is validated on actual software reliability data with respect to the goodness of fit and predictive validity criteria. The performance of the model is compared with some existing models. Generally the SRGMs are classified into two groups. The first group contains models, which use the machine execution (i.e., CPU) time or calendar time as a unit of fault detection/removal period. Such models are called continuous time models. The second group contains models, which use the number of test occasions/cases as a unit of fault detection period. Such models are called discrete time models, since the unit of software fault detection period is countable. A large number of models have been developed in the first group while there are fewer in the second group (Kapur et al., 1999). Discrete time models in software reliability are important and a little effort has been made in this direction. In this paper, we propose a discrete SRGM for the situation given above. The assumptions in this case are with respect to test cases instead of time. The rest of this paper is organized as follows: Section 2 derives the proposed model. Section 3 defines the methods used for parameter estimation. The criteria used for validation and evaluation of the proposed model and the applications of the proposed model to actual software reliability data through data analyses and model comparisons are shown in Section 4. A discrete version of the proposed model is briefly presented in Section 5. We conclude this paper in Section 6. Notations Used: N(t): number of faults removed in time interval (0,t]m(t): mean value function of NHPP, the expected number of faults removed in time interval (0,t].a: number of faults in the software at the initiation of testingb,c: constants of proportionality.d: constant for rate of increase in delayp:proportion of leading faults in the softwarem1(t):expected number of leading faults detected in time interval (0,t]m2(t):expected number of dependent faults detected in time interval (0,t]n:number of test cases2. Software Reliability Growth Modelling: 2.1. Model Development: Goel and Okumoto (1979) first introduced a NHPP model to represent the fault-detection-process. They assumed that detected faults / observed failure during non-overlapping time intervals of testing phase are independent of each other. In other words, the counting process [N(t), t>0] has indent increments and the process has a Poisson distribution with time dependent mean value function. EMBED Equation.3 (1) They proposed a very simple form of m(t), which based upon the following assumptions: A1. A software system is subject to failure due to faults present in the system. A2. On a failure the fault causing that failure can be immediately removed and no other faults are introduced during the process. Mathematical form of Goel-Okumoto model (G-O model) is exponential in nature: EMBED Equation.3 (2) The G-O model is still used due to its simplicity. But as compared to exponential growth curves, S-shaped reliability growth curves are more often observed in real software development projects. Hence many SRGMs that flexible in nature have been proposed for the purpose (Yamada et al., 1983, Ohba, 1984; Bittanti 1988; Kapur and Grag, 1992; Zhang and Pham, 2000; Kapur et al., 2004). Flexibility is demonstrated by describing both exponential and S-shaped growth curves. Most of the models are NHPP models and have A1 and A2 as basic assumption. Apart from the ability to fit the reliability growth curves, the parameters of an SRGM should be interpretable in terms of software testing phenomenon and such popular model is due to (Ohba, 1984). Ohba proposed that the fault removal rate increases with time and assumed the presence of two types of faults in the software. The distinctive assumptions of the model can be summarized as follows: B1. The fault detection rate is proportional to the current fault content in the software and the proportionality increases linearly with each additional fault removal. B2. Faults present in the software are of two types: mutually independent and mutually dependent. Mutually independent faults lie on different execution paths and mutually dependent faults lie on the same program execution path. The second type of faults is detectable if and only if faults of first type are already detected. The model can be summarized in the following differential equation EMBED Equation.3 (3) where EMBED Equation.3 Solving (3) with the initial condition m(t=0)=0, we get EMBED Equation.3 (4) Depending on the values of r, the above SRGM can describe both exponential and S-shaped growth curves. SRGMs proposed by (Bittanti et al., 1988; Kapur and Garg, 1992) have similar form of the model but are developed under different set of assumptions. (Bittnaiti et al., 1988) have proposed an SRGM exploiting the fault detection (exposure) rate during the initial and final time epochs of testing. Whereas, Kapur and Garg (1992) describe a fault removal phenomenon, where they assume that during a detection process of a fault some of the remaining faults may also be detected (removed). Though they do not name these two types of faults distinctly, yet it is clear that faults can be categorized according to the way they are detected. Equation (3) can be re-written as follows: EMBED Equation.3 (5) The first part of the sum on the right hand side represents independent fault detection and the second part represents detection of the dependent faults. As soon as a failure occurs, effort is made to remove the cause of the failure. During this process many more faults lying on the execution path are detected. These are the independent faults. Due to the presence of these two types of faults Ohba (1984), claims that the more faults we detect, more undetected faults become detectable. In other words, both independent faults and dependents faults can cause the detection of more dependent faults. But as effort is primarily the aimed towards the identification of independent faults, detection of dependent faults are dependent on cause of failure only. These faults, which can be the independent faults of Ohba (1984) are more realistically termed as leading faults as they lead to detection of more faults. Hence the above proposition of Ohba can be rephrased as the more leading faults we detect, more undetected dependent faults become detectable. Moreover, there exists a definite time lag between the detection of leading faults and the corresponding dependent faults. As leading fault detection is independent, we feel that the two detection processes should be modeled separately to correctly understand fault dependency. Kapur and Younes (1995) first proposed an SRGM formulating the two detection processes in two stages. In this paper however, we extend it further to include time dependent lag function. 2.2 Proposed Model Formulation: As pointed out earlier it is also important to correctly depict the time lag that exists between the two processes of fault removal. In this paper, we propose a more general model which can account for the time lag between the two processes. The model is based on the following assumptions other than A1 and A2. C1. Faults present in the software are of two types: leading faults and dependent faults. C2. The intensity of leading fault detection is proportional to number of leading faults remaining in the software. C3. The intensity of dependent fault detection is proportional to number of dependent faults remaining in the software and the ratio of leading faults removed to the total leading faults. From assumption C2 we have the following differential equation EMBED Equation.3 (6) Solving (6) with the initial condition m1(t=0)=0, we get EMBED Equation.3 (7) The curve for leading fault removal is exponential in nature and is similar to G-O model. Dependent faults are detected on detection of leading faults. But there is a de f i n i t e t i m e l a g b e t w e e n t h e m . T h e f o l l o w i n g d i f f e r e n t i a l e q u a t i o n i s b a s e d o n t h i s r e a s o n i n g a n d a s s u m p t i o n C 3 . E M B E D E q u a t i o n . 3 ( 8 ) T o m a k e t h e m o d e l m o r e g e n e r a l , d i f f e r e n t t i m e d e p e n d e n t f o r m s o f t h e l a g f u n c t i o n t c a n b e c o n s i d e r e d d e p e n d i n g o n t h e t e s t i n g e n v i r o n m e n t s . A s t h e n u m b e r o f d e p e n d e n t f a u l t s r e d u c e s a n d c h a n c e o f c h e c k i n g t h e s a m e p a t h f o r l e a d i n g f a u l t s i n c r e a s e s , t h e t i m e l a g a l s o i n c r e a s e s . H e n c e w e a s s u m e a n i n c r e a s i n g f o r m o f t ( X i e a n d Z h a o , 1 9 9 2 ) a s E M B E D E q u a t i o n.3 (9) Substituting this form in equation (8) and then solving it with the initial condition m2(t=0)=0, we get EMBED Equation.3 (10) where EMBED Equation.3 The proposed model is the superposition of the NHPP with mean value functions given in equations (7) and (10). Thus, the mean value function of the superposed NHPP representing fault detection of the software takes the following form. EMBED Equation.3 (11) Many interesting results emerge from the proposed SRGM (11). When p=1, it reduces to purely exponential model (Goel and Okumoto, 1979), pointing to absence of dependent faults. For 0