A Characterization of the Simple Failure-Biasing Method for Simulations of Highly Reliable Markovian Systems MARVIN K. NAKAYAMA T. J. Watson Research Simple biasing fadure estimates vian the systems. We article, derive we leading aspect most and to failure technique gradients biasing yields is balanced, used to m simulations bounded it may reduce the of highly relative not provide paths as well, Categories with error variance reliable of Marko- for the performance bounded it the relative error that those more that efficiently when when the but necessary the is also for using bounded simple One error, secondary measure not paths conditions estimator; performance the failure established. sufficient measure error. when relative some and method relative system estimators to obtain performance than bounded of the error be examined for the fadure-biasing with structure derivative by example imply simple relative shows must the derivatives bounded failure We also show lts on the for is that to system a derivative and condition condition do not of when measure sufficient conditions estimator failure a characterization a similar to estimate simple them can be estimated of the hkely a derivative possible fadure system provide Furthermore, the and of a performance measure interesting for the a necessary performance only simple when estimators biasing. importance-sampling is unbalanced. this produces an measures Although estimate system In is of performance measure Center I.e., it is when using biasing. and Subject Descriptors: C.4 [Computer Systems Organization]: Performance of Systems—reLLabl lLty, aL,aLlabLhty, and [email protected]; G.3 [Mathematics of Computing]: Probability and Statistics—probabdLstzc algorithms (zncludmg ilfonte C’arlo); statlstzcal computzng; and Modeling]: Simulation Output Analysis 1.6.6 [Simulation General Terms: Additional Algorithms, Key Words systems, importance Experimentation, and Phrases: sampling, Reliability, Balanced hkelihood failure ratios, Theory biasing, simple gradient failure estimation, highly rehable biasing 1. INTRODUCTION There is systems an increasing and demand communications is thus of utmost Author’s address: importance IBM for systems, systems, that to be able Research Divmion, such to predict T. J Watson as have high transaction-processing levels during Research of reliability. the Center, design Yorktown It stage the Heights, NY the copies are 10598. Permission not made of the to copy without or distributed publication Association specific its for Computing date of thm commercial appear, Machinery. and material 1s granted advantage, the ACM notice is ,gwen To copy otherwise, that provided copyright copying or to republish, that notice and the title is by permission requires of the a fee and/or permission. @ 1994 ACM ACM and fee all or part for direct 1049-3301/94/0100-0052 Transactions on Modeling $03.50 and Computer Simulation, Vol 4, No, 1, January 1994, Pages 52-88 Simple Failure-Biasing reliability of a system, acceptable system, level. and with and state of models practical, and standard tion so the One for arises of any such an is dynamics The (i.e., so as to make frequently. effective event simulated under mates, we must can identify to the using critical typically are simulation. use large methods of any rareness see this probability of interest she of reduction sampling; is known of the not However, variance reduc- system failures. techniques to be used systems. behind the at an model numerical the variance reliable idea perform Due the 53 will or to of basic underlying he without because importance the This it, . a stochastic tradeoffs. resort simulation for system of systems, must proposed method overview. types inefficient actual constructs design designer (i.e., need simulations the analyze of these simulation is to different the techniques) Hence, in explore that a designer methods components spaces to ensure Typically, Method Glynn and approach distributions) (in our as a “change case, to system the failures) since the [1989] change governing of measure” a different probability multiply our estimator Iglehart is the system occur system more is now measure. To obtain unbiased estiby a correction factor called the likelihood ratio. The proper choice of importance-sampling distribution can lead to a large variance reduction. However, selecting an appropriate change of measure is simulated, not we may increase to determine For reliable simple Thus, available failure it depends the thrust on biasing there as (Lewis system being is chosen, in this area is distributions. systems, known the distribution of the research importance-sampling Markovian techniques include since importance-sampling the variance. appropriate highly sampling straightforward and if an unsuitable and is a class failure-biasing Bohm of importanceschemes. [1984], Goyal These et al. [1992], Shahabuddin et al. [1988]), bias2 failure biasing (Goyal et al. [1992]), balanced failure biasing (Shahabuddin [1994] and Goyal et al. [1992]), and failure distance biasing (Carrasco [1991; 1992]). The basic idea behind each of these methods transitions, is as follows. the probability From any state of choosing having a failure both repair transition and over failure a repair transition is very small under the original dynamics of the system. Thus, in failure biasing, we increase the total probability of a failure transition to 8, where O << 8 < 1 and 1 – 8. The various decrease the failure-biasing total probability schemes differ of a repair in the way transition they to allocate 8 to the individual failure transitions. In this article we provide a mathematical analysis of the simple failure-biasing method, from which we will gain insight into other techniques. Shahabuddin [1994] was the first to study mathematically the asymptotic properties of performance measure estimates obtained using failure-biasing methods. In particular, Shahabuddin proved that when using balanced fail- ure biasing, certain performance measure estimates always have bounded relative error (i.e., for a fixed number of samples, the expected width of the relative confidence interval for the performance measure estimate remains bounded as the failure rates of the components tend to zero). Shahabuddin also showed that if the system being simulated is balanced (i.e., all of the failure rates of the components are of the same order of magnitude), then ACM Transactions on Modeling and Computer Simulation, Vol. 4, No. 1, January 1994. Marwn K. Nakayama 54 . using simple failure biasing performance measure ple that showing unbalanced, failure rates Since tors if simple then the vanish. balanced that have balanced failure example and will estimate. give rise failure relative biasing always relative error biasing is a more that is applied may failure yields without method. simple which biasing However, estimadoes not, we show biasing is as the measure failure failure for the an exam- bound performance simple error devised to a system increase and robust when relative Shahabuddin biasing error bounded empirically to bounded Furthermore, both by rise to does give bounded relative error, it sometimes yields a smaller variance than balanced failure biasing. Thus, in certain contexts, simple failure biasing may be more appropriate than balanced failure biasing, and this is an important part of our motivation for now studying simple failure biasing. It was not previously established exactly when simple failure biasing leads to bounded article relative we develop system that error for the a necessary characterizes when error for the performance condition is that in order performance and sufficient simple failure of paths needed to be considered estimate. In on the structure biasing measure estimate. One to determine if simple bounded relative error, we must examine not system failure but also some secondary paths number measure condition yields this of the bounded relative interesting aspect of this failure biasing will give only the most likely paths to to failure as well. (The total is finite.) This is in contrast to what others have methodology. Shahabuddin encountered when analyzing the balanced failure-biasing For example, by considering the most likely paths to failure, [1990, Section 2.5.1] derived bounds on the variance of the performance measure Nakayama theorem using [1991] some current when failure sense setting using the idea and used it to establish balanced In estimate formalized balanced of “most results about failure likely paths derivative biasing. Also, to failure” estimators into a obtained biasing. the need illustrates to the examine secondary difference between paths the to failure in our importance-sampling distributions we consider here and those used to simulate large-deviationstype rare events, such as buffer overflows and excessive backlogs in queuing networks. Specifically, in the large-deviations context, the optimal (i. e., minimum variance subject to certain constraints) importance-sampling distribution is selected solely with regard to the most likely path to failure; e.g., see Cottrell et al. [1983] and Parekh and Walrand necessary and sufficient condition for simple must take into account importance-sampling this article. We also consider the components the scheme estimating using secondary for the paths types derivatives the likelihood [1989]. failure ratio On the other hand, biasing shows that to failure of reliability with respect method. (For when designing systems to the failure details our we an studied in rates on the of likeli- hood ratio method of derivative estimation, see Glynn [1986], Reiman and Weiss [1989], and Rubinstein [1989].) We establish a necessary and sufficient condition that characterizes when simple failure biasing will give bounded relative error for the estimate of a derivative. The condition for the perforACM Transactions on Modeling and Computer Simulation, Vol. 4, No 1, January 1994 Simple Failure-Biasing Method . 55 mance measure differs from that for the derivatives. In fact, we show by an example that the condition for one derivative may hold whereas the condition for the performance measure does not. Thus, by using simple failure biasing, we may actually mance measure derivatives obtain better estimates for a derivative than for the perforitself. However, we prove that if we can estilmate all of the with performance bounded measure. balanced failure proved that with relative This biasing balanced with respect with the same relative to any error, situation then contrasts we that can also which do so for arises when the using to estimate derivatives since Nakayama [1991] failure biasing, we can always estimate derivatives component error failure rate with bounded as the performance relative measure error and estilmate. We also establish a number of other conditions which are used to further analyze the derivatives. Finally, we apply all of our conditicms to examine several examples to give a characterization of the simple failure-biasing method. The rest of the article mathematical the basic model concepts failure-biasing simple examples to motivate some tables Section 5 contains as follows. reliable of importance and balanced 4 we present article. is organized of a highly In Section Markovian sampling and explicitly failure-biasing methods. why simple which the we study summarize analysis 2 we describe system. Section describes We then failure all of the the simple examine biasing. results of estimating the the 3 reviews some In Section shown in this performance mea- sure using the simple failure-biasing method. In Section 6 we examine estimators of the derivatives using both standard simulation and simple failure biasing. We also consider several examples in order to complete our characterization of simple failure biasing. results from simulating a large computing and some placed 2. directions for future research Section 7 presents system. We state in Section 8. All some empirical our conclusions of the proofs are in the Appendix. MATHEMATICAL The following MODEL AND NOTATION is a modification of a model originally developed by Shahabud- din [1994] to study performance measures of highly reliable systems and later adapted by Nakayama [1991] to analyze derivatives. We assume that the system is composed of C different types of components, labeled 1,..., C, with n, components of type i. We let N E z:= ~ nl, which is the total number of components of all types in the system. As time evolvles, components randomly fail and are sent to a repair facility to be repaired, where the type of service A state each type discipline for the failed x of the system and we let S be Our analysis state space. operational S = U U F, keeps any information components track about is arbitrary. of the number the queuing of failed components at the repair facility, of and the state space of all such states, where we assume that IS I < ~. in the following sections is independent of the actual form of the We define n,(x) to be the number of components of type i to be when the system is in state x. We decompose the state space as where U is the set of states for which the system is considered ACM Transactions on Modeling and Computer Simulation, Vol. 4, No. 1, January 1994 56 Marvin K. Nakayama . operational, state and with all states. The system starts operational, and we assume that of one component ously. More precisely, E {y let S,(x) states y which state set of failed if x @ U and y = E with n,(y) for the possibility of component assume that We allow failure F is the components x, with x. Suppose causes consider nj(y) E S: have for all failure other one more component i fails of type of type when simultane- state x = S, and which is the set of of each type failed as in < n~(x) a component O, the y = U. i.e., the to fail i and type for all j, n,(y) < as many components i, then propagation; components some component at least at least some > n,(x) in state O = U. We also n,(x)}, i failed the system in state y than is in state in x, and let x, i) be a probability mass function on S,(x). Then the system immedip( y; %, i). Thus, nj( x) – nJ( y) ately enters state y ● S,(x) with probability components of type j, 1 < j < C, fail on the transition caused by the failure of i triggered the the component of type i. In this case, we say that component P(.; transition (x, y). This situation may occur, for example, system, where the failure of a processor contaminates the the disks to fail simultaneously. Similarly, repair on component nent we allow for more than one is made is replaced the possibility component up of a number when enough that at the repair a time. This of subcomponents, in a computing clata and causes facility may completes occur if and the entire of the subcomponents fail. However, some compo- we do not allow for some components to complete repair and other components to fail at the same instant. (For example, this may happen when the re]pairperson fixes some component but breaks something else component in the system.) We define a transition (x, y) to be a failure when replacing the if n,l( y) transition repaired for < nj( x) < nh(x) for some k. We use the notation “y ~ x“ l,..., C, with nk(y) y) is a failure transition. Furthermore, we define (x, y ) to be a repair allj= if (x, if nj(y) transition > nj(x) We use the notation When the system distributed transition ,C, with “y < x“ if ( x, y) is a repair is in state ~, the components lifetimes (x, for all j = 1,... with y) with rate rate A,(x) >0. nk(y) > n,k(x) transition. of type i have Also, the W( x, y) z O. We model system for some k. exponentially makes the evolution a repair of the system as a continuous-time Markov chain Y = {Y(s): s > O} on the state space S. The infinitesimal generator matrix Q = {q( x, y): x, y E S} of Y is given by X~.lnk(X)Ak(X)p(Y; q(x, Y) = [ for x #y, and q(x, We denote X = {X. {l’(x, Y): ACM Transactions X, the (1) otherwise embedded : n > O}. The on Modeling >X o x) = E y+. Y = S’}, ify ify<x q(x, q(x) by Z, k) W(X, y) where y). We let = –q(x, discrete-time transition F’(x, and Computer matrix y) = q(x, Simulation, x). Markov of X y)/q(x) chain is then for Vol. 4, No. 1, January (DTMC) given x #y, 1994. of by and Y P = I’(x, Simple Failure-Biasing x) = O. We possible define r = {(x, transitions We assume that components our (i.e., the repair rates). (High high redundancies.) assume that where ~,(x) where is the set of highly reliable system failure is composed rates are of highly much smaller reliable than the > 0 and we allow rates b,(x) P(”; ~> of the components > 1 are of type independent x, i) to depend ~) =P.(Y; ~, i) = of i are E, and b,(x) on e; i.e., for all (x, is integer y) E r such y)6d’(’’y’, C,(x, dl(x, y) > 0 is integer valued, c~(x, y) > 0, and ZY~ sL(z) P, (“x, Y, that repair rates, W( x, y), are independent of c. We will the behavior of the system as E -+ O. examine We say that order the of magnitude 1) and assume the system pe( y; without f(e)/E~ express larly, - x, i) are independent loss of generality 0 as E - f(E) that f(e) of E. In this we use the f(e) ~ is O( E~) = @(E~) if we can c # O is independent express f(~) we can is not balanced, d, a function notation as E ~ O, where if we can situation, 1. If the system b = same = b for some of E. Simi- = cez + o(~z) as E -+ O, of E. Finally, we say .f( E) = Q( Ed ) if ~ > d is independent flE) = CEZ + O(EZ) as E - 0, where c + O and ~< d is where c + O and we can express independent = O(E~) A,( x, E) are of the x E S, b,(x) For some constant 0. Also, = Ced + o(E~) we say if all values is balanced (i.e., if for all 1 < i s C and all we call it unbalanced. We now make some definitions. if y) > O}, which 1. We assume = b > y c S, P(x, 57 reliability for the system can also be achieved by having We model this by introducing a rarity parameter E and P(Y; i) x, . can make. component the failure valued. Also, that y > x, y): the system Method of e. We define and so q(0) = @(e~o). For any s(x) - x = S, we define min {b, (x): nt(x)~, (x) > O}. I<isc Note that S(x) is the exponent of the order of magnitude of the total failure rate out of state x, and so s(x) = d if ZY ~, q(x, y) = @(e~). Furthermore, s(0) = bo. For any (x, y) = r, we define b(x, y) ‘{ — minl<,<c{b,(x) o which is the (x, Y). Thus, y)>lify>x. + d,(x, y):nl(x)~,(x)p,(y; x, z) > O} ify ‘- exponent for ACM if y xx any of the (x, TransactIons order y) E r, on Modeling of magnitude b(x, and y) = Computer d of the if q(x, Simulation, rate y) Vol. <x’ of a transition = @(ed) 4, No. and 1, January b(x, 1994. 58 . Marvin To prove considered. K Nakayama our results, we need to assume (Al) The DTMC (A2) For each state x E S with andy ~ x. x + O, there (A3) For each state z G F such that Assumption from any X is irreducible some structure A2 states state that for exists y G S such that (O, z) G r, (x, than This is at least one repair x # O, q(x) = @(l). A3 guarantees that transitions to a failed state have transition E~“, which is the magnitude ensures that system failures when E is small. From these assumptions certain form. For transition This P(x, y) which rates elements of the possible implies that P(x, hold, will all y) = then work take the system from that are much smaller of the largest transition are rare events for the the any (x, y) G r q(O, z) = O(E ~“). failure transitions (x, y) with x + O have transition probability (3( e ~(” y “). Assumption A2 is a critical assumption. If it does not none of the failure-biasing methods discussed in the introduction well; see Juneja and Shahabuddin [1992]. Assumption state O directly being over the state space S. there x # O. Hence, on the model rate from embedded transition matrix state O. DTMC have a y) = r, = [ (3(1) ifx+Oandysx @( Eb(’’y)) ifx+Oandy>x, @(e ifx=Oandy>x b(x, Y)- b”) (2) as c-O. For hitting any set time of states to the A set A c S, for the define ~~ = inf{n DTMC. We >0: X. concentrate ~ A}, which is the on estimating which is of interest for several reasons. First, we can express the mean time to failure as MTTF = </y, where < = -E[ ~~~~ -1 l/q( X~ )1 and 7~,n = min{n > 0: X. e {O, F}}; see Goyal et al. [ 1992]. Also, let U(t) be the unreliathat the syst;i.e., U t ) is the probability bility of the system for time horizon tem fails before time t.Then Shahabuddin and Nakayama [1992] prove that (1 – e-g(() )yf)/U(t) ~ 1 as E -0, where t = @( E-’f)with rt >0. Shahabuddin [19941 showed that if Assumptions A1–A3 hold, then there exists some constant r > 1 (which depends on the model being considered’) such that [email protected](Er). (3) We define A={(xO, (x,, which ACM is the Transactmns . . ..). n>l, xO=O= x,+l)=rfor O<i set of sample on Modehng paths and Computer O,xm~F,x,@{O,FlfOrlSi <7z7 <n}, of the embedded S]mulatlon, Vol. DTMC 4, No. 1, January for which 1994. rr < 70. Simple Failure-Biasing Furthermore, we define A~=((xO,..., x~)=A:n which is the (under the original >1, set of sample P{(XO,..., paths measure) for which of the order A= where failure” X,~) Method . 59 =(xO,... , Xn)} = @( E~)), ~~ < To and have probability Em. Note that U~=,A~, the r is as defined in (3). We call and any path ( XO, . . . . x.) = An (4) A, the set of “most likely with m > r a “secondary paths path to to failure.” For any sample path (xO,..., x.) @ A, we define n–1 Xn) = f(xo,..., ~ l{X k.+1 ~xh}s(xk), k=O which failure failure is the sum of the exponents of the orders of magnitude of the total rate out of each state x~ along the path such that ( x~, x~. ~) is a transition. The reason for introducing f( X., . . . . Xn ) will become more apparent in the always true, later sections. Note that f(x~,..., as s(x) > 0 for all since xl > XO = O is (5) x = U. In a balanced system, f(xo, ..., x~ ) is the number x.). we use the following (Xh)P(X~; xk.~, additional Let 71 =inf{k which failure and xn)>bo of failure transitions along the path (x 0, ..., For our results on derivative estimation, notation. s(0) = b. > O:X~ >Xh–l, rzl (X~.l)A, is the first failure transition which of a component of type i. Nakayama constants r, > r and i) >01, may have been triggered [1991] showed that there by a exists 7, > r such that P{T, < ‘TF < To} = @(Er’) (6) and P{T~ < min{~,, If P{~~ < min{~l, I-o}} = O, then in Nakayama [1991] that Xn) forsomeo is the set of sample ACM it was established 7,} = r. (8) the set A’={(xO,..., which (7) we set ?, = ~. Additionally min{r~, Also we define To}} = @(Er). Transactions f= A:n > 1, n~(x~)~~(xk)P(x~+l; s k < n such that paths ~k+l of the embedded on Modeling and Computer ‘k, Z“)>0 > xk}, DTMC Simulation, for which Vol. 4, No. ~, < ~~ < To. 1, January 1994 60 Marvin K. Nakayama . Similarly, which we define is the min(~o, set of sample ~,). Furthermore, AZ=”. n., of the embedded A’ n A ~ and Zn = u“ n. ~, A’n, where and~ A’n paths let A’~ ~ r, and DTMC = 1 for which n A ~, and r~ < note 7, are as defined that in (6) and (7), respectively. 3. BACKGROUND Let (Q, @ AND MOTIVATION be the probability the probability measure Letting 1{.} denote the which ion. leads to the We first generated estimate following collect using space on which on (Q, indicator approach n i.i.d. to estimating samples the original probability measure 1 of l{r~ as e -+ O. We define relative given normal level better so it error interval more One such RE number to be the expected 1 – q!I/2 quantile to estimate is reduction importance y using technique to sampling, such that = where ACM L(O) Transactions ~ L(oJ)P’(0) OJEL!I = P( 0)/P’( on Modeling co), which and Computer n and a of a stan- for n fixed, (Q, =) for which note that for all P is of samples set of events P‘ measure >0 are = @(E’) of our estimator for a fixed difficult method Ih the point measure now describe. Recall that A is the (measurable) 1{7~ < TO}(w) = 1. Consider another probability P’(I?) simulatthe we form probability @(eZ’) – as system failures become rare. we must utilize some variance estimator. (l). To}], ,1 ZV be the we see that becomes y by standard P. Then the original 1 – @. Letting distribution, ~ ~ O, and simulation Hence, the relative P denote gh~l~k, 72 = @(e’) of its confidence confidence dard as width = < 7.} under (T2 = y– and let ~1, . . . . ~~ of 1{~~ < 1-0}, where ?(n) and the variance X is defined, 9) induced by the Q-matrix given in function, we can write y = E[ 1{7~ < B c A, B •~, = E’[l{TF is known Simulation, standard obtain which a we w for which defined on P(B) >0. Then, < TO}L], as the Vol. 4, No. Radon-Nikodym 1, January 1994. deriva- Simple Failure-Biasing or simply tive, the the likelihood probability Glynn measure and Iglehart Equation follows. using ratio, P‘. [1989] (9) forms Generate basis i.i.d. the probability E‘ is the expectation See Hammersley for more the n and and of imQort~nce P‘. . operator Handscomb 61 under [1964] and details. samples measure Method (11, [email protected], -Ll),... Then , (In, our new point which we apply L.) of (l{Tf estimate is as < 7.}, L) In ?(n) and the variable of 1{~~ < ~O}L under (7 The goal is ensuring to that ;k~lfkik? = choose so that the variance measure when E’[ l{TF< TO}L2] << E[l{~F using importance than that for standard simulation. Now we describe the simple failure-biasing sampling method repair transition biasing is 0(1), as was is to increase 8< 1, of a failure failure transitions shown the total of importance transition from in proportion in (2). The basic probability any state We do not alter idea sampling. behind to ti, where transition to in proportion the transition probabilities x ● F. In some sense the simple failure-biasing smaller of any failure probability of a simple 8 = @(1) with x, and then allocate 8 to the to their original probabilities. decrease the total probability of a repair 1 – 8 to the individual repair transitions probabilities. < TO}], thus is much Under the original probability measure P, the probability and the transition from some state x ● U, x # O, is 0(e), failure P‘ is E’[l{TF < ~o}L2]– y2. ‘2 = P‘ the probability individual Also, we 1 – 8 and to their from state method allocate original O or from is a natural way of implementing importance sampling since it preserves the underlying structure of the system. A more precise description of simple failure biasing follows. For any state x = S, define F(x) = Z,> ~ P(x, z) and R(x) = Zz. . P(x, z). Note x, and construct that R(x) is the total the transition transition (i) For F(x) is the total matrix (x, y) probability probability matrix P using failure transition transition biasing from from from x. We the original algorithm. @ r, For (x, y) = r and (a) With x = O, y) = o; x = U, P’(x, (b) With a failure a repair P‘ for simple the ensuing P’(x, (ii) of taking of taking y)= P(x, y); x # O, i3P(x, P’(x, y) = ify>x y)/F(x) (1 – i3)P(x, y)/R(x) Transactions on Modeling (lo) ; otherwise [ o ACM if y <x and Computer Simulation, Vol. 4, No. 1, January 1994. 62 Marvin . (iii) K. Nakayama y) ● r and For (x, x 6 F, P’(.z, y)= P(x, y). Extensive empirical work suggests that 8 should be chosen such that 0.5<8 < 0.9; see Lewis and Bohm [ 1984], Goyal et al. [ 1992], and Shahabuddin et al. [1988] for further are the same (ii’) details. as above, but For (x, y) = r and (a) With x = O, (For balanced With y) = l/rzF(x); ify>x t5/n F(x) y) = (1– [ ble n~(x) from failure (i) and (iii) x + O, P’(x, where items, to x ● U, P’(x, (b) failure-biasing (ii) is changed = Z.Y. state . 1{( x, x. See (x, y) 8) P(x, y)\R(x) ify<x o ; otherwise ● r} y) is the Shahabuddin number [ 1994] for of failure further transitions details on possibalanced biasing.) Note that for = r x G U, with @(e bf.,Y)-.t. )) if y >~ P’(x,’y)= { (11) ify<x @(l) = @( El{ Y~l}(b(l.Y)-s(z))) under Let simple failure biasing. cr’z be the variance of 1{~~ < ~O}L under simple failure (4), we obtain the following representation of the second will aid in our analysis th~oughout the article: E’[l{TF biasing. moment term Using which < 70}L2] = E[l{TF < TO}L] (12) =: ~=r(~O,.. z ,Z.)-~~ L(xo,..., ,X,, xn)P{(xo, ,Xn)}. )=(xo, n>o We define RE’ = 6% (13) Z,b Y’ which is the Shahabuddin the proof. ACM TransactIons relative [1994] on error proved Modeling and of our simple the following Computer Simulation, failure-biasing result; Vol estimator see Shahabuddin 4, No. 1, January 1994. of y. Then [1994] for Simple Failure-Biasing PROPOSITION simple biasing, Shahabuddin RE’ showed biasing remains failure mance return measure estimate. Example to Example 1 and Example Example 1. bounded by example simple may simple failure biasing not for others.) the system If 1 (Shahabuddin). failure leads Consider that not lead Method is balanced, then when using if the system is not balanced, then relative error for the perfor- 1 below also demonstrates 2 below later to gain insight a system relative which 63 as e + 0. to bounded to bounded . error has this. (We will into why the for certain three types models but of Components (i.e., C = 3), where the first two component types have a redundancy of two (i.e., nl = n2 = 2), and the third type of component has a redundancy of one (i.e., e (i.e., n~ = 1). Also, b ~ = bz = 1), the and components the of type component of 1 and type 2 have 3 has failure failure rate ●2 rate There is a single (i.e., b~ = 2). Thus, b. = 1, and the system is unbalanced. repairperson who repairs components at rate 1 using a processor-sharing discipline. In our situation, it is sufficient to define the state of the system to be x = (xl, Xz, X3), where x, is the number of failed components of type i. Initially, all components are operational, and the system is considered to be operational if operational. We assume diagram ties, and of this and lumped only model Figure if there there with 2 is the all of the states the same with is at least one component of each type is no failure propagation. Figure 1 is a state arcs having the original transition probabili- when simple using X3 = 1 into the single failure biasing. state ( xl, We have X2, 1) = F for all x ~ and x ~ since the system fails as soon as the component of type 3 fails. Also, we have omitted the arcs from state ( xl, Xz, 1) since these repair transitions do not play It is easy to see that a role in our analysis. y = 8(~), and so r = 1. Now we examine the variance of the estimator of y obtained using simple failure biasing. Consider ((0, O, O), (O, O, 1)) = A. From (12), the second moment satisfies E’[l{TF < the path ~o}L2] z (.XO,..., O,. ... x.)} ... X,F)=(X , XJP{(XO,. L(xo,... )=A=A n>o > Thus, L((o, o, o), (o, o, l)) u ‘2 = E’[1{~~ 1)) – W 62 ) = (0(e). < ~o}L2] It then – y2 > -L((O, O, O), (O, O, l)) P((O, follows 20 RE’=z as e when 0. Hence, the system ACM L>— *yG simple o, o), (o, o, 1)) P((O, K failure that for a fixed number O, O), (0, 0, of samples n, (3(E1/2) @(.) biasing = &@(’-’/2) may +Cc not give bounded relative error is unbalanced. Transactions on Modeling and Computer Simulation, Vol. 4, No. 1, January 1994. 64 . Marvin K. Nakayama I Fig. 1. Now we Nakayama Transition diagram examine [ 1991]. Example now change another for Examples 1 and example, 2 with which original was transition probabilities, previously 2. Consider the same system as in Example the failure rate of the third type of component analyzed in 1 except that we to ~ 3 (i.e., b3 = 3). Thus, the system is still unbalanced. Figures 1 and 2 are state diagrams of this model with the arcs having the original transition probabilities and the simple failure-biasing transition probabilities, respectively. It is easy to see that y = ~ + O( e), and so r = 1. Now consider the path contributed @(~) to the second moment in ((O, O, O), (O, O, 1)) ● A, which Example 1, thereby giving the unbounded relative error. Note that now L((o, o,o), (0,0, l)) P((O, O,O), (0,0, 1)) = ; and in fact, we can show that E’[1{~~ W e 2). (Recall that 8 is the parameter ACM TransactIons on Modeling and Computer + 0(E2) = @(eZ), < ~O}L2] = ((12 + 8)/46)~2 + o(~2) = used in failure biasing.) It then follows Simulation, Vol 4. No. 1, ,January 1994 Simple Failure-Bias!ng that Transition diagram a’2 = ((12 + 8)/(48) for Examples – l)e2 1 and 2 under + o(~2) which remains bounded as ~ ~ O. Nakayama [1991] showed that the using balanced failure biasing is . simple failure Transactions on Modeling and \ biasing. and variance of the estimator of -y when Thus, although the variances obtained using simple and balanced biasing are of the same asymptotic order of magnitude, the variance using simple failure biasing has a smaller constant for the lowest-order in its asymptotic expansion. ACM 65 )\ I ( ‘ .DvH //(’} Fig. 2. Method Computer Simulation, Vol. 4, No. 1, January failure when term 1994. 66 Marvin K. Nakayama . The importance certain of Example models, balanced failure simple failure biasing, biasing. relative Also, error several even questions: 2 is twofold. failure biasing thereby it shows if the why simple being does simple why failure demonstrates smaller that variances we want biasing considered failure it yield motivating that model First, may to study may give is unbalanced. biasing work well for than simple bounded This raises for Example 2 but not for Example 1? For a given model can we determine if simple failure biasing will give bounded relative error? In the following sections we will develop conditions on the structure of the model to determine when simple failure biasing will and will not give bounded RE’. 4. SUMMARY In this when OF THE RESULTS article the bounded we define relative derivative conditions performance measure error by using estimation, we impose which, or its for a given derivatives simple failure some model, can biasing. additional (For being (co’) f(xo,..., 1, (Ci) r, =r (Ci’) If with results on the restriction is that we do not allow components, and so b,(x) ===b, for below, are in terms of the asymptotic given system the structure see Section 6. L One failure rates for the conditions, characterize be estimated on system; state-dependent all x E S.) The structure of the considered. x~)>2r —rn+b or b, =bO Ofor all (x O,. ... )cA~, or ?, – 2b0 <r, r<rrzrz S2r — used to –2b,. r, – b, s FL – bO, then f(xo,..., xn) 2r, –m+b0 forall( 2 2r, [ S2r,–1 –m+3b0–2b, forall( If r, – b, >7, f(x(,,..., xO, . . ..)~A~n.rl<m<m xo, . . ..x. )~~n, F,<rn S2r, -2 b,+2bO+l +2 b,–2b0–l – bo, then xn) 2FL–m–bo+2b1 forall(xO, > 27, –m+b [ (CSi) The ri<m<27, o x~)~E~,FL5m527,— forall(xo,..., 1 r, = r or b, = bo. conditions which are characterize simple failure condition to show a result ACM . . ..x~)~ALn. Transactions on Modeling and labeled with a prime (i.e., “‘“) are ones biasing. If there is no prime, then we use the about standard simulation. Also, if a condition is Computer Simulation, Vol 4, No 1, January 1994 Simple Failure-Biasing labeled the with a “O,” then performance condition rate is used Basically, estimate. to characterize of component under it is an assumption measure type failure the needed it to establish is labeled derivative with with . 67 a result an on “i ,“ then respect to the the failure i. we use Conditions simple If Method CO’ and biasing of certain C i‘ to ensure sample paths that the probabilities are not too small. As we shall see, these conditions are necessary and sufficient for achieving bounded relative error. The first condition of C i and C Si states that a component of a failure transition on a most likely path to failure; the type i can trigger second part stipulates that the components of type i have one of the largest condition. For more details and failure rates. The last part of C i is a technical insights on the conditions, We let with RE, respect simulation, Sections Table article denote to the and see Sections the relative failure rate RE~ denote gives of the estimator of component the same when 6.2 and 6.3 for the precise I summarizes the results and 5 and 6.3. error a fairly type using particular holds row and column the if and column only headings, heading if RE’ and that characterization biasing method. The interpretation of Table I column heading states some condition. We use “bounded.” For each off-diagonal entry, there is ‘<e “ which means “if and only if”; “ + ,“ which whi~h means “does not imply.” The symbol gives implies simple definitions. and examples complete of the derivative i when where is bounded; failure biasing. simple See in this failure- is as follows. Each row and “bald” as an abbreviation for one of the following symbols: means “implies”; and “ * ,“ the relationship between the “ means for “ * RE~ being of y standard are presented of the the “ * similarly using the the row heading .“ For bounded for example, CO’ one value of i does not imply that RE’ is bounded; and RE~ bounded for all k implies that CO’ holds. Table II gives the justification for all of the entries in Table I. If an entry in Table I is either”= refers to a theorem “ or”= ,“ then the corresponding entry in Table II either holds trivially. If an entry in Table I is “ * ,“ or the result then the corresponding ample. entry in Table Some of the most interesting First, the Condition CO’ is performance measure to have II refers to the appropriate results contained a necessary and bounded relative in Table sufficient error counterex- I are the following. condition for the when using simple failure biasing. Since CO’ imposes restrictions on all sample paths ( XO, ”.., x.) G Am with r s m < 2 r — 1, we see that secondary paths to failure (i.e., role in determining the variance (.xO,..., x~)e A~, m > r) play an important of the estimator; see Theorem 2 for further holds for the derivatives and Condition C i‘; bounded remains for some bounded details. A similar observation see Theorem 5. Also, if REj is component type i, it does not necessarily follow that RE’ as ~ ~ O. In other words, it is possible to estimate a derivative more efficiently than the performance measure when using simple failure biasing; see Example 3. However, if we can estimate all derivatives with bounded relative error when using also do so for the performance measure; ACM Transactions on Modeling and simple failure biasing, see Theorem 7. Computer Simulation, Vol 4, No. then we can 1, January 1994. 68 Marvin K. Nakayama . Table ==+ System balanced co’ II c, =+ 11+ REh/RE bdd bdd Vk RE; c%’ RE~ bdd bdd Vk co’ co’ RE’, RE’ c s, c, bdd a ==+ ==? ==+ =+ ==$ a * =+ 1 * * + =+ + + =& * * * 1; I* I =+ =+ + =!4 + + I=?l=#l+l*l *-m A + + + + + + - RE~ bdd Vk =+ == ==+ + + + .==’ G + =+ + + * + =?+ +, +’ =+ =$! == Cs, + ==’ ==’ ==+ ==+’ + ==’ * + co’, c% + == ==+’ =+ ==$ + + =+ + + RE’, RE; bdd + ==’ ===$’ + + =+ ==+ + + .=+ Table Cond,tion Systcm co’ R2 - R19 R19 R14 R14 R14 R19 R19 R19 R19 RI 4 R14 R14 R19 R4 R15 R1O R1o R115 R15 R1o R1o R15 I Rll I R16 R16 RIO R16 R16 R4 1 RI R9 co’, Cs, co’, c% R13 R. 13 RE’, RE; bdd R13 Rl: Trivml. R4: Theorem R7: R1O: Ii Rvt * R17 R7 ) R7 I R19 I R19 R19 R19 R1O I R1o I R5 I— 3 R4 R1 R19 R15 + m R2, RI ! RI< R19 R] 4 I R16 I Rlf H R16 I R16 II RI E n R9 1 R15 R8 R15 R20 R20 R15 R20 R19 R5 Rl R15 R19 R9 Proposition R5: Theorem 5. Theorem 7. R8: Theorem 8, R9: Example 1, Example 1 with t = 3. Rll: Example 1 with k = 3. R12. Example 2. R3: Theorem 2. R6: Theorem 6. RO R17 1 RI 7 R19 i+% R2[ R19 1 R13: Example 2 with a = 1. R14: Example 2 with t = 3, R15 Example 2 with I = 1 and k = 3 R16: Example 3 with t = 1. R17: Example 3 with i = 3. R18: Example 4, R19: Example 4 with R20: Example 5 with 5. ESTIMATING THE PERFORMANCE i = 3 USING SIMPLE MEASURE FAILURE BIASING We now investigate accomplish which failure ACM this, we do by biasing. Transactions the we first showing Consider on Modellng estimation must its y) Computer y using a better connection (X, and of gain E r to with Simulation, simple failure understanding the likelihood x G U Vol. and 4, N(). of ratio biasing. ~( X., under x # O. Using 1, January 1994 H R17 R1 R2 : R2, ?.5 R19 R16 R19 4. t = 3. I I + R17 RI 1. ==+ +~ R17 R19 RI R11R31R1 R3 I R1 R17 R18 ~~ R2, RI R19 R1o R17 ~~~k R6 R3 C* RE~ bdd RE’ bd; R6 RE, /RE bdd REj bdd Vk Ct’ R6 SW R9 bdd Vk R4 R12 R17 RE, /RE bdd R4 RE’ bdd bdd Vk RE, \RE +’ I RI R12 C*’ of Table =+ * R2 co’ REk/RE Justification bdd Sy, tem balanced u II c, RE’ b.danmd U RE,/RE RE; bdd co’, I 1 of Results * 1=+1=+1 & c,’ Summary bdd II balanced \ [ c, RE’ co’ System Condition II I To . . . . x.), simple (2) and Simple Failure-Biasing Method . 69 (11), we obtain P(x, y) P’(x, y) if y > x. If y < x, then when y x x. Since simple failure @( EMx>Y)) both P(z, transition = @(Esf’J) @( Eb(x.Y)-9(.K)) = y) and P(x, y) P’(x, y) P’(x, y) are @(l), and thus, = @(l) probabilities out of state O are not changed in biasing, for all (0, y) = r. It then P(o, y) P’(o, y) follows that =1 for ( XO, ..., x.) G A, the likelihood ratio satisfies L(xO,.. ., Xn)= ‘–1 JJ P(xk, k=o xk+l) P’(xk, = (ij(eml{n+l~ 2,) S(X, )) xk+l) (14) since both s(0) = b. the initial failure using simple From (12) much the sum which failure biasing. we see that the simple consider and transition failure biasing in the likelihood reduces (xO, . . . , x~) = A. The definition has no effect ratio the smaller is the second L(xO, of f( XO, ..., x.) on the likelihood key when to determine how moment. ..., x.) More is, includes ratio the specifically, less the path ( XO,... , x.) will contribute to the second moment. From (14) the order of by /lx O,..., x.). AS we see in the magnitude of L(xO, ..., x.) is dictated following theorem, CO’ establishes the orders of magnitude for the likelihood ratio in terms of the /lx O, . . . . x~ ) for each path (xo,..., x.) that are needed to achieve bounded THEOREM 2. We now restrictions give relative RE’ some on paths error. remains bounded insight of order into as this c m with E + O if and result. Note only that if CO’ holds. CO’ only r s m s 2 r — 1. The condition imposes ensures that the contribution of each of these paths to the second moment is not for the variance required to greater than e z‘, which is the order of magnitude achieve bounded relative error. It turns out that none of the paths of order e 2‘ or higher will interfere with ensuring a second moment of order ● z‘, and proof of Theorem paths of order e‘ - 1 or lower are not possible. The complete 2 is given in the Appendix. One interesting point about Theorem 2 is: it shows that to determine if simple mance failure failure biasing will result in bounded relative error for the performeasure estimate, we need to check not only the most likely paths to (i.e., (XO, . . . . xn) G A,) but also some secondary paths (i.e., of paths in each IAm 1, m > r, m > r) as well. The number (Xo>..., x~)GAnl, ACM Transactions m Modeling and Computer Simulation, Vol. 4, No. 1, January 1994 70 Marvin K. Nakayama . is finite that (see Lemma 9(ii) in the Appendix), and so the total number of paths must be considered is finite since CO’ only restricts paths in A ~ for r s m < 2r — 1. Furthermore, when m > 2r, we have 2r — m + bO < 6., and by (5), the condition f( XO, . . . . x.) > 2r – m + 60 used in CO’ becomes vacuous when m > 2 r. We now examine Condition CO’ more closely. To do so, we will use the following result, PROPOSITION (i) (ii) f’(xo,.. .Mxo, Now whose 3. proof Forall( is in the Appendix. xO,... – b. + Z~j~l{xh+l consider Proposition L(xo, Since ~, .,xn)<m+bo, xl) any path >xk}b(x~, (xO, . . . , x,) 2r–m+b0 by j)~A~, (xO, ..., 3(i). Thus, . . ..xn x.) ● Am, <f(xO, from xh+l) m =m > r. If CO’ is valid, . . ..x~)<r +bO then (15) (14), we see that )=Q(~~) and ● A ~, it follows that L(xO,..., )= O(~z~n)-n). @(Em) P’{(xO,.. ., X,F)=(XO,. =Q(em) ... x.)} and @(Em) P’{(XO,... ,XTF)=XO,O,XJ},XJ} = o(ez’-~), or equivalently, P’{(XO,..., X,F)=(XO, XJ}=O(l)O(l) and P’((XO, .... XTF)= (Xo,.. .,xn)} =Q(E-2’). Hence, Condition CO’ ensures that a sample path (XO, . . . . x.) = Am, r < m < 2 r – 1, has probability under simple failure biasing at most of the order ●‘n 2‘. In particular, this means that all of the most likely paths to failure of order 1 under the original measure (i.e., ( XO, . . . , x.) ● A,) have probability under simple failure biasing. For example, if r = 2, then CO’ ensures that under simple failure biasing, (1) all paths in A ~ have probability of order 1 and (2) all paths in A ~ have probability at most of the order ●z. Also, it follows from (15) and Proposition 3(ii) that for ( .xO, . . . . x.) ● A, k=O k=O If CO’ holds. b(x, y) > s(x) Recall that for any (x, y) G r with >1. Thus, if CO’ is valid, then b(xh, ACM on Transactions Modeling and Computer Slmulatlon, Vol. 4, No. y > x, we have that Xk+l) = S(xh) for all k 1, January 1994 Simple Failure-Biasing such that occurs Xk + ~ > Xh. In other along measure). must any of the If simple failure be of the largest words, most biasing order consider likely a failure paths gives to . transition failure bounded of magnitude Method the error, then relative of any failure (x, (under transition 71 y) that original P( x, y) from state x. In general, However, checking as shown relative error Condition in Proposition when using anced. Developing verified Now may be difficult, we use Theorem Example that we estimated Al for a given general failure biasing sufficient and we have 2 to reexamine model sufficient may condition is that conditions the which be tedious. for bounded system can is bal- be easily not done so. Examples 1 and 2. 1 (continued). Recall that r = 1, and so Theorem 2 establishes only have to consider the paths in A ~ to determine if y can be with bounded relative error using simple failure biasing. Note that = {((0,0,0), ((o, and simple more CO’ 1, a simple fl(O, (1,0,0), o, o), O, O), (0,0, (2,0,0 )), ((0,0,0), (O, 1,0), (0,2,0)), 1))}, (O, O, 1)) = 1 < 2r – m + bO = 2 since m = 1. Thus, CO’ does not hold. Example ascertain that 2 (continued). if simple Since r = 1, we only need to examine A ~ to biasing will result in bounded relative error. Note failure Al = {(((),0,0), (1,0,0), (2,0,0)), ((0,0,0), (0, 1,0)> (0,2,0))}, and f((O, O, O), (1, O, O), (2, O, O)) = ~((0, O, O), (1, O, O), (2, O, O)) = 2> — m + bO = 2 since m = 1. Thus, CO’ holds. 6. ESTIMATING In this section to component DERIVATIVES we discuss failure the estimation rates. We first of the derivatives analyze estimating of y with these using 4 which relative System Structure To obtain results more structure. on derivative estimates, we will The main differences are that state-dependent failure propagation. We now respect quantities standard simulation. Then we examine Condition C i‘ from Section characterizes when simple failure biasing will give rise to bounded error for the derivative estimates. 6.1 Additional 2r rates and (2) we assume that the system (1) we no longer allow restrict the generality has for of failure The following modification was developed by Nakayama [1991]. assume that components of type i have failure rate A, which is independent of x = S; i.e., A,(x) = A, for all x ● S. We assume that O < A, < ~,, where ~, <00. Furthermore, when parameterizing by the rarity parameter b’, where ~, > 0, b, > L and b, is integer e, we assume that A, = A,(E) = ~, ● valued. Note that b, does not depend on the state x. Also, we no longer allow Oforalll<i<Candx, y=S. the P(Y; x, i) to depend on ~; i.e., dt(x, y)= In this situation, we have that 60 = min{b, :1< i < C}. ACM Transactions on Modeling and Computer Simulation, Vol. 4, No. 1, January 1994. 72 Marvin K. Nakayama . As in Nakayama simplify [1991], we make (A4’) VP(Y; (A5) If there exists a component then there exists another Z, Z) >0 p(y; O, j) and P(Y; +P(Y; Assumption A4 stipulates can trigger of the also implies Z, ~) > Q then to b, = b,. that if the failure two component for (x, types y) 6 r, y) = of either from state are of the same where C(X, Hence, the -Y) 0, d(x, > transition than a sum magnitude ify>x p(x, ify<x y) [1991] one of the O to state A5 largest component y with type transition able when that Assumption we having a different are holds p( y; O, j) ensure there A4 is failure to and W(X, of a single to determine Al; one of the see the y) the proof > 0. term order of of Lemma = O. Assumption is no cancellation must rates condition systems. type type It be A5 is a technical having from state some other which causes is not unreason- should j in Assumption in the calculations i a transition there failure This reliability component component cause then largest probability. large some can probability, if the ~ > 0, consist is used respect there whose considering A5 if positive with assumption details. that j also This failure form: value, transitions with further rates some b~ = bO and that for failure failure of y stipulates types the otherwise Assumption derivative 11 of Nakayama Assumption for as before. of the ●-order. C(. X,y)E~(~’y) y) > 1 are integer rates of two different .x to y, then q( x, y ) has a certain { o same assumptions type i such that b, = bO and p(y; O, i) >0, component type j + i such that bJ = b ~ and a transition that q(x, rather additional O, i). of components rates the following the proofs. be A5 noted satisfies assumption of certain the used to expressions arising in the derivatives with respect to Al; see the proof of Lemma Nakayama [1991] for further details. In the situation when there is no failure propagation, Assumption 11 of A4 is automatically satisfied, and Assumption A5 reduces to requiring that there are two different component types which both have failure rates of the order ● ~0; i.e., there exists i and j such that i # j and 6, = bj = bO. (Assumption A5 holds under this weaker condition for the following reason. Suppose that when component type i fails in state O, the system then enters state y, and suppose that when component type j # i fails in state O, the system then enters state z #y. Then P(Y; O, i) = 1 and P(X; O, i) = O for all x #y, and p(z; O, j) = 1 and P(X; O, j) = O for all x #z. Thus, P(Y; O, i) = 1 #p(y; O, j) = O, and so A5 holds.) If we allow for failure propagation but with the restriction that any given failure transition can be triggered by the failure of only one type of component (i.e., for (x, y ) = r, there is at most one component type i for which p( y; x, i) > O), then again it is easy to verify that (1) Assumption A4 holds and (2) Assumption A5 reduces to requiring that there exist component types i and j such that i + j and b, = b] = bO. ACM Transactions on Modeling and Computer Simulation, Vol 4, No, 1, January 1994, Simple Failure-Biasing Estimating 6.2 We now tive Derivatives analyze of y with standard likelihood respect simulation. obtained The in the following A(AI, Using ..., ratio results section rate estimators of the of component in this when . 73 Simulation derivative to the failure using simple failure employ the notation tion Using Standard Method section type will we consider partial deriva- i obtained be compared estimating using to those the derivatives biasing. Throughout the rest of the article, we will d, A( Al, ..., &) = ( ~/6’A, )A( Al, ..., Ac ) for some func- AC). the likelihood ratio method of estimating derivatives, we obtain where See Nakayama further et al. [1990], Glynn [1986], and Reiman and Weiss [1989] for details. We estimate d, y using ~staAndard Generate n i.i.d. samples (11, D,,l),..., probability measure P. We then form si~ul~tion in the following (1,, D,.) of (1{~~ < 7.}, the point estimate manner. D,) using the k=l Nakayama [1991] established that C7ty= @(e for all ~ sufficiently small, mln{rL-bj,Ft-be}) (16) and cr21 = where r, and 7, are cannot say whether <7, –2boorri–2b, Var[l{~F < as defined ~O}D, ] = @(e in (6) and min{r, –2b L,7, –2b0) (7), respectively. (17) ), In general, we r, – bi s 7, – 60 or r, – b, > Pi – b. or whether r, – 2 b, >?, – 2 b.. Also, the expression for d, y is independent of whether we use standard simulation or importance sampling. However, the variance of the estimator depends on the simulation method being employed. Let RE, denote the relative error of the estimator of d, y obtained using standard simulation, i.e., Corollaries 4–6 of Nakayama be estimated with the same [1991] relative established accuracy that certain derivatives can as the performance measure itself when using standard simulation. (In particular, it was shown that this is the case if C Si holds.) We now want to further characterize these derivatives. To do so, recall Condition C i defined previously in Section 4. Then we have the following result, whose proof is given in the Appendix. THEOREM 4. ACM REJRE Transactions remains on Modeling bounded and as e -0 Computer Simulation, if and only Vol. 4, No. if Ci 1, January holds. 1994. 74 Marvin K. Nakayama . 6.3 Estimating Using Simple Failure Biasing Derivatives We can also use simple failure (7Ly = E[l{’TF Thus, we estimate d, y using biasing < to estimate derivatives. TO}D1]= E’[l{TF < s~mple failure biasing Note that TO}D,LI. in the following manner. Generate n i.i.d. samples (11, D, ~, Ll), . . . ,(i~, D, ~, t.) of (1{~~ < ~0], D,, L) using the probability measure P‘. We then form the point estimate REl’ Let simple denote failure the biasing, relative error of the estimator of d, y obtained using i.e., ~qqrl RE: = d,y ‘ of I{TF < rO}D, L under the probability UL‘2 is the variance next result establishes that Condition C i‘ from Section where The sary and sufficient condition have bounded relative error RE~ remains THEOREM 5. The basic idea orders of the proof of magnitude of the bounded as of Theorem summands E + O if and of D~ depends r[ ) occurs. We then derive bounds To take advantage of this, we express E’[1{~~ only 5 is as follows. of D,, and Xh+l) satisfies 7F < min(~o, situations. measure P‘. 4 is a neces- for the derivative estimator with respect when using simple failure biasing. whether or not the transition ( Xh, i ) > 0. Thus, the order of magnitude or (18) Zti if Ci’ We first note that to Al to holds. examine it nL(Xk_l)ALp(Xk; on whether for the depends on Xk-l, ~~ < ~~ < To D~ for these two < ~O}D:L2] = EII{Tr (19) < rO}D:L] = E[l{~, s TF < 70}D~L] + E[l{~~ < min(70, 7,)} D~L]. We separately analyze each expectation on the right-hand side of ( 19) and show that Condition C i‘ characterizes when simple failure biasing will give bounded relative error by using the bounds developed on D, and some other bounds (Lemma 9 in the Appendix). The complete proof is given in the Appendix. We now examine Condition C i‘ more closely. First suppose that r, – b, < F, and note that C i‘ only imposes restrictions on paths in AZ having probaprobability of order em with r, < m < 2 r, – 1 and paths in ~ having ~t < m < 2rl — 2 bl + 2 bO — 1. These conditions enbility of order ● m with sure that the contribution of each of these paths to the second moment is not ●‘’2 b,, which is the order of magnitude for the variance greater than required to achieve bounded relative error. It turns out that none of the paths of higher order will interfere with ensuring a second moment of order e ‘L 2 hL, – b., ACM TransactIons on Modeling and Computer Simulation, Vol 4, No. 1, January 1994 Simple Failure-Biasing and paths rl –b, of lower >7, In general, the checking Condition following result relative failure biasing error for that the is sufficient conditions have not done so. THEOREM biasing, are not possible. If 6. the the shows that is the using need simple failure for the failure biasing measure error 75 occurs when balanced. all of the when may then be tedious. condition using Developing verified when more using for simple general be difficult, some examples, failure biasing. to ensure using additional is utilized, and may sufficient and simple we failure as e -0. when biasing model estimators be easily sufficient relative a given is balanced, bounded CO’ is not bounded strates situation a simple derivative can system for that system which RE~ remains C i‘ shows We now use Condition C i‘ to examine gain a better understanding of simple with A similar . –bo. However, bounded order Method that differs failure C i‘. the the from relative always biasing. Thus, somewhat in which derivatives which we will first example we can estimate simple Condition from Our that errors remain derivatives This demon- situation when of the bounded; when balanced performance see Nakayama [1991]. Example 2 (continued). We previously showed that CO’ holds. Now consider component type 3. Note that A: = ~, At = {((O, O, O), (O, O, l))} and q = {((0,0,0), (1,0,0), (2,0,0)), ((0,0,0), 1,0), (o, (0,2,0))}, and so r~ = 2 and Fa = 1. (Similarly, we can show that rl = 71 = r2 = F2 = 1.) ((O, O, O), (O, O, 1)) ● A~. Note Thus, r3 – b~ = – 1 s 73 – bO = O. Consider that f((O, does not O, O), (O, O, 1)) = 1< hold. The next follows (Also, we example from can shows Theorems may be able to estimate 2r~ – m + bO = 3 since show that 2 and that Ci’ C3 does not m = 2. Hence, does not necessarily 5 that when a derivative with using bounded imply simple CO’. Thus, failure relative biasing, error but 3. Consider a system which has four types it we not the performance measure itself. In other words, it is possible to obtain estimates for a derivative than for the performance measure. Example C3’ hold.) of components better (i.e., C = 4), where each of the first two types has redundancy one (i.e., nl = nz = 1);the third type has redundancy two (i.e., nj = 2); and the fourth type has redundancy failure rate of type 1 and 2 have one (i.e., n4 = 1).Also, the components so b ~ = bz = 1. The component of type 3 has failure rate ●, and Ez, and the fourth b4 = 4. Thus at rate type of component bO = 1. There 1 using has failure is a single a processor-sharing rate repairperson discipline. e 4; and so b ~ = 2 and who It is then repairs components sufficient to define the state of the system to be z = ( x ~, Xz, X3, xl), where xl is the number of failed components of type i. Initially, all components are operational, and the system is considered to be failed if and only if either all components of types 1, 2, and 3 are failed or the component of type 4 is failed. When the ACM Transactions on Modeling and Computer Simulation, Vol. 4, No 1, January 1994 76 Marvin K. Nakayama . component of type 1 fails in state (O, O, 0, O), it causes the component of type 2 and one of the components of type 3 to fail, i.e., P((l, 1, 1, O); (O, O, 0, O), 1) = 1. Figure 3 is a state diagram of the model of this system with the original states transition with .X3 since have do the system omitted not probabilities X4 = 1 into the play a role {(1, 1, 2, 0), (xl, Note that Al = fails arcs in as the state ( x ~, X2, our analysis. X3, on the arcs. We have lumped all of the ( x 1, X2, X3, 1) ~ F for all xl, X2, and state as soon from Xj, given the single The component of type X3, 1) since set of failed 4 fails. these repair states is Also, we transitions given by F = 1)}. f), A2 = {(a, h, 1)} As = {(a, m), (a, h, g,k, 1), (a, h,f, j, 1), (a, h, g, h, 1), (a, h, f, h, 1)}. We can show that y = (5/6)e2 + 0(6Z), and so r = 2. Also, rl r~ = 2, r4 = 3, 71 = 3, F2 = 2, 73 = 3, and Fb = 2. Now consider path (a, m) = As. Observe and so CO’ does not hold. that m) = 1 < 2r f(a, = 2, r-z = 3, the sample – m + bO = 2 since m = 2, Now consider component type 3. We have that r3 – ba = O <2 = 73 – bO. Also, A; = A ~, and f(a, h, 1)= 3 > 2r~ – m + bO = 3 since m = 2. FurtherA: = As – {(a, m)}, and we can establish that all (XO, . . . . x.) = Al. Additionally, 2r~ – m + bO for ~(xO,..., more, – 1 = 1, and +2b0 (x~,..., xn) so the condition is vacuous =x: since 2rL – 2b, + 2b0 – 1.Therefore, The previous example the performance one of the estimate achieve measure derivatives. imposed there C3’ that bounded However, C3’ on ~(xo, . . . . x.) of m for which for 7L < m < holds. demonstrated with by is no value x.) >2 = 2r~ – 2b~ 73 = 3> the we may relative following error result not be able to estimate even if we can do so for shows that if we can all of the derivatives with bounded relative error, then we can also the same for the performance measure. See the Appendix for the proof. THEOREM If 7. Ci’ holds for all component types i = 1, ..., C, then CO’ holds. The next example shows that being able to estimate a derivative with the same relative error as the performance measure using standard simulation does not necessarily imply that we can use simple failure biasing to estimate the derivative with bounded relative error. (In fact, even if we also assume that the performance measure can be estimated with bounded relative error when using simple failure biasing, this still does not guarantee that the derivative estimate using simple failure biasing will have bounded relative error. See Example 5 in the Appendix.) Example Thus, Cl, ACM 1 (continued). Note that rl = r2 = r~ = 1 and 71 = F2 = 73 = 1. C2, and C3 hold. TransactIons on Modeling and Computer Simulation, Vol 4, No. 1, January 1994, Simple Failure-Biasing il,o~’ Fig. 3. Now Transition consider diagram component for Example type 3 with 3. Observe original that Method I= transition . fll,l,l, 77 o) \ probabilities. r~ – 63 = – I < ~3 – bo = 0. Consider the path ((O, O, O), (O, O, 1)) @ At. We have that fl(o, 0, 0), (o, O, 1)) = 1< 2r3 – m + bO = 2 since m = 1. Thus, C3’ does not hold. We now want to establish a simple sufficient condition the derivative with respect to A, can be estimated by using simple failure biasing. The next result from Section THEOREM 4 implies 8. If To gain a better of y with respect the desired CO’ and CSi conclusion; hold, understanding to A, to be then largest asymptotic magnitude if lim inf. types j. Then Corollary 4 of Nakayama s, has a largest ACM Transactions asymptotic on Modeling Ci’ that error C Si for the proof. holds. CSi, we define the sensitivity = A, - d, y. We define s, to have a Is,( c)/s,( E) I >0 for all component ~ ~ [1991] establishes that magnitude and ensures see the Appendix of Condition s, = s,(~) that with bounded relative shows that Condition Computer if and only if C Si holds. Simulation, Vol. 4, No 1. January (20) 1994 Marvin K, Nakayama 78 . Thus, if(1) we can estimate the performance measure with bounded error using simple failure biasing and (2) the sensitivity with has a largest asymptotic magnitude, then simple failure biasing bounded relative error for the estimate of d, y. The following example shows that being able to estimate relative respect to A, will result in both y and cl,y with bounded relative error by using simple failure biasing does not necessarily imply that we can estimate y and d, y with the same relative error when using standard simulation. Example 4. C = 3), where n3 = 2). Consider a system each component Also, the which type components has three types has a redundancy of type 1 and of components of two (i.e., 2 have bl = bz = 1), and the component of type 3 has failure bO == 1.There is a single repairperson who repairs using a processor-sharing discipline. It is then sufficient Thus failure nl (i.e., = nz rate = e (i.e., rate ●3 (i.e., b~ = 3). components at rate 1 to define the state of the system to be x = ( x ~, Xz, X3), where x, is the number of failed components of type i. Initially, all components are operational, and the system is considered to be operational if and only if there is at least one component of each type operational. We assume there is no failure propagation. It is easy to see that y = @(e), and so r = 1. Also, rl = rz = FI = Fz = Is =landr~ =4. Note that Al = {((O, O, O), (1, O, O), (2, O, 0)), ((O, O, O), (O, 1, O), (O, 2, O))}. It is easy to check that CO’ holds, and so Theorem 2 implies that RE’ remains bounded as 6 + O. Now consider component type Wecaneasily (x0,..., show that 3. We have ~(xo,..., )= A~withr3=4<m <m< /1$0,... X~)~27s7m +bO=3=m < m s 5 = 273 + 4bo – 1. Thus, r~ > r, and so C3 does not hold. C2’ 7. hold.) However, EXPERIMENTAL We now present large computing x.) >273 5=2 b~>bo, r3 – b~ = 1>73 F~+2b~– forall(x C3’ that holds. 2bo–l, andalso o,o, x.., x.)~z~ (We also and73–2bo can – bO = O. = 7 – m for all – m – bO + 2b3 that with show that 73=l Cl’ and =–l>r~–2ba=–2, RESULTS some experimental system to show that article hold when simulating et al. [ 1990], and Nakayama results from the asymptotic actual systems. [ 1991] previously running results simulations developed of a in this Goyal et al. [1992], Nakayama examined the same system but with different failure rates for some components. We used the SAVE package [ 1987] to obtain all of our results. The system is composed of two types of processors, A and B, each having redundancy 2; two types of disk controllers, each having redundancy 2; and six sets of disk clusters, each having four disks, When a processor of one type fails, it causes a processor of another type to fail also with probability 0.01. Each type of component can fail in one of two modes which occur with equal probability. The repair rates of the all mode-1 failures are 1 per hour, and the repair rates of all mode-2 failures are 1/2 per hour. There is a single repairperson who fixes failed components in random-order service. The data is replicated on each of the disk clusters in such a way that all of the data in ACM TransactIons on Modehng and Computer Shmulat,on, Vol 4, No 1, January 1994 Simple Failure-Biasing a cluster type is accessible of processor processors of a given its corresponding at least even type disk with 1/2000 per 1/12000 Thus hour. per hour, the system gives results simulation, obtained all simulation is either is operational disks all the other first disks cluster have have failure note of the simple with using failure results a component that biasing, failure respect a numerical from the processors be able to estimate error interval) failure 1/6000 to pAfr and value magnitude than of the of type still not produced note that with per hour. that failure biasing. events, respect with agrees of the 99% to pAfr to Theorem and d 1 fr our rate of all of Condition we should about the same 9970 confidence is the case, Also, is much standard is worse estimate (see of the sensitivity Theorem 5). Finally, with balanced in simulation than 4 and (20). Using the larger that simple of the failure estimates for both y and its sensitivity the system is unbalanced (see Theorems failure-biasing acceptable this respect d 1 fr, to with an half-width failure of the and indeed respect with to pAfr width We where estimate) the largest respect with method, We give the relative relative measure, sensitivity with This the d 1 fr. Table with 2 and respect to d 1 fr failure biasing reasonably stable estimates for all the quantities. We should also the confidence intervals for the estimates of -y and its sensitivity respect balanced of the we obtained good to pAfr even though 8). The simple rate d 1 fr = (nonsimulation) one million A have with by sensitivity sensitivity. biasing, respect is measured as the performance estimate is = 1/1000 rate rate and balanced simulating or repair. the sensitivity (as numerical other if(1) data have failure the component types, and so it satisfies the second stipulation Ci. Theorem 4 then states that when using standard simulation, relative The one of if and only All of the disk controllers in Each through (2) all (i.e., confidence interval half-width divided by the point confidence interval for all simulation estimates. First is failed. clusters and 79 controllers. of the system is given in Figure 4. processors of type A and B are pAfr obtained standard event on the disk is operational y and its sensitivities the cluster of disk . is unbalanced. We estimated III type respectively. and in the types The system of each The disks of the access the data accessible. A block diagram The failure rates of the per hour, one controllers. one processor and 1/2000 if one of the is paired Method to pAfr failure are slightly biasing, thus better for simple demonstrating that failure simple biasing failure yield better results than balanced failure biasing. For empirical analysis of estimates of performance measures see Goyal et al. [1992] and Nakayama et al. [1990]. 8. CONCLUSIONS AND DIRECTIONS than biasing for can a more thorough and sensitivities, FOR FUTURE RESEARCH We have performed an extensive study of the simple failure-biasing method. In particular, we established necessary and sufficient conditions for obtaining performance measure and derivative estimators that have bounded relative error. One interesting aspect of these conditions is that it shows that in order to guarantee bounded ACM relative Transactions error, on Modeling we need and Computer to consider Simulation, not Vol. 4, No only the most 1, January 1994 80 Marvin K, Nakayama . Proce Disk Control Disk Fig, 4. Table III. 1 Cluster Estimates Block Dtsk Cluster diagram Measure 0.9421 0.5060 “ &T 0.9945 X 10-3 X Relative x 10-3 –. 5770 x 10-5 . &T –.1013 * likely paths failure. Also, bounded to system we showed relative error 99% 0.9361 Confidence 10-3 Biasing Failure Biasing X 10-3 0.9471 x 10-3 0.5000 x 10-3 x 10-4 –.5329 imply those 0.4992 for the X 10-3 + 10 6% —.6030 X 10-5 * 141.1% failure but also secondary paths that the conditions for a derivative do not k 3.6% & 6.3% 106.2% Intervals Balanced + 3.1% + 40.5% dljr system Failure 14.6% 0.5477 x 10-3 Disk Cl~ster 6 Simple Simulation + PAfT with 4 computing Standard Result 7 Disk Cluster of a large of y and Sensitivities Numerical Performance 3 X 10-5 + 46 3% leading to system estimator to have performance measure by constructing an example demonstrating that it is possible to estimate a derivative more efficiently than the performance measure when using simple failure biasing. easily implementable m.th.d. One area for future research is to develop which will determine if simple failure biasing will or will not work well (i.e., if CO’ holds) for a given model. A simple sufficient condition for this is to check if the system is balanced. However, as we have seen, this condition is not necessary. Also, further work is required to characterize other importancesampling schemes used for analyzing highly reliable systems. Finally, as we have shown in Example 2, when simple failure biasing yields bounded relative error, it may produce smaller confidence intervals then balanced failure biasing. Thus, it may be more appropriate to use simple failure biasing in certain contexts instead of balanced failure biasing, and additional research should be carried out to identify classes of models for which this is ACM TransactIons on Modeling and Computer Slmulatlon, Vol 4, No, 1, January 1994 Simple Failure-Biasing the case. Currently, the only is most is experimentation. appropriate method available Method for determining . which 81 method APPENDIX First we prove PROOF tion Proposition 3. OF PROPOSITION for rn definition in terms we 3. of must To the have prove b(xk, our Xk+l). result, we first Consider derive any (xo, a representa- ..., Xn) G Am. 13Y that n—1 P{(xo, . . .. x.,) = (Xo, . . ..xn)} = lqP(xk, Xk+l) = o(Em). k=O The Also, probability (xo,... of the initial transition (X., x ~) is 0( e 6(‘0’”) that Xk + () for k > 0. Hence, , Xn ) G Am implies of any failure repair transition transition (x~, - ~“ ) by Eq. (2). the probability x~ + ~), k > 0, is @(~ b(X~’xk+‘)) by Eq. (2). Each has probability of order 1. Thus, since b( x, y) = O if y < x, n–1 13(xo, X1) – b. + ~ Xk+l) =m, ., x.), note b(xh, (21) k=l or equivalently, which establishes part (ii). To convert Eq. (22) into a bound for any (x, y) = r with on flxo,.. y > x. It then follows that b(x, y) > s(x) that n–1 s(xo) b. + – ~ l{xk+l >Xk}S(Xh) <m, k=l which To implies prove LEMMA (i) (ii) ..., Theorem x.) ❑ < m + bO. 2, we will Consider 9. (xo, need the next E Am, where . . . . x.) lA~[ . . . . XTF) = (Xo, . . . . x.)} = tl(~~) and m > r. Then where independent m. of Parts Theorem of (i), 1 (xo, . . . . x~) (ii), and of ACM Transactions the and and upper Nakayama on Modeling and P{(XO, . . . . X,r) sufficiently small, where of (X., . . . . x~ ) and m L(xn, ..., x.) = @(e~{xO’’Xn)-~O) for &l G > g euffi.i.mtly .srnall, PROOF. proof n >0 < lS/(n+l)~ Xn)} < a~ ‘~m for all ● >0 (xO,..., are constants which are independent (iv) lemma. n<(m+l)~ F’{(XO, (iii) ~(xo, L(xn ,. ... v a~d bound [1991]. and Computer in The x.) < q a.: (iii) first Simulation, zqmcf(xo’ constants are of (iii) ~ ‘“’x”)–bo which established part = a and in CZP. the follows Vol. 4, No. 1, January 1994. 82 Marvin K. Nakayama . immediately (iv) in from Eq. To prove the .>xn)e (xO,.. the definition we have in part shown the first part of validity of the upper bound (iv), first note that for A~, ~ < a/3”Em k=O (iii). From P’(x, Eq. y) n–l ‘–1 P(xk, .%k+~) ., Xn) = L(xo,.. by part of A ~. Also, (14). P’(xh, (11), = <(x, we ~k+l) can k=O (23) “~~k, ~k+l) express > X}(b(x, y)el(~ 1 ~ y)–s(x)) + o(El{y> X)( b(x..Y)– s(x))) ((x, y) is independent of ●. Define (’ = min{[(x, y) :(x, y) = r} and <. = min{l, [’/2}. Note that P’(x, y) z ~. e’{y ~ x}(~(Z,.V–Sf~l) for all suffl. ciently small ● > 0. Thus, for all sufficiently small ~ > 0, where n–1 JJP’(xk, x,+,) 2 k=O (24) from part First (i). We now examine note that since b(x, of ● in Eq. (24). the exponent y) = O if y < x, n–1 ~ l{xk+~ ‘~~}(b(~k, ~k+~) ‘S(Xk)) k=O n–1 = ~ n–1 b(X~,Xk+~) - ~ k=O l{~k+~ ~Xk}S(X~) k=O n–1 = ~b(x~, xk+~)-f(xo,..., xn)=~t+ bf(~~,~, ~n), ~n) k=o by Eq. (21). It then follows from Eq. (24) that for all sufficiently small e >0, n–1 H “(xh, Xh+l) > ~(~+’%m+ho-f(zu, ,.,), k=O Thus, Eq. (23) and the proof Now implies that is completed we can prove by letting Theorem v = cz/~,N and ❑ q = fl/~N. 2. PROOF OF THEOREM 2. Suppose CO’ holds. Since y = W e r ), it is clear from Eq. (13) that we need to prove that l?’[1{~~ < ~o}L2 ] = 0( ~2r). Recall our expression for the second moment given in Eq. (12), and consider some ACM TransactIons on Modehng and Computer Simulation, Vol 4, No 1, January 1994 Simple Failure-Biasing ● A. x.) (x~,..., L(XO,. where r < m +’{(XO,..., ... < 2r X,F) – 1. Using Lemma Method . 83 9, we see that (Xo,....xn)} = @(Em+f(xO’ “’x”)-~”), = and so CO’ implies xn)P{(xo, _L(xo,..., . . ..x.F) = (Xo,. ...xn)} = o(Em+2r-m) E2’). = 0( Hence 2r–1 xx m=r L(XO,..., ....xTF xn)P{(xo, (25) since lA~l < ~ by Lemma 9. Now consider (xo,..., x.) ~ Am, n > 0, with L(xo,... for all ~ >0 that xn)} 0(62’) = implies )=(x”,..., (Xo, .,. .Xn)=Am n>O , Xn) < sufficiently small Ez since ~,. ... )eAm – 1.By Lemma 9, ‘X”)–bo< qm f( XO,..., x.) , XJP{(XO,. L(xo,... m=2r(~ Vq%f(’o’ m > 2r ... > b.. Therefore, X,F)=(XO,. Lemma 9 . . . Xn)} Am n>O m=2r (xo, . , xn)s ~=z~ Am 71>0 = valslN (qPlslNE)m= 0(62’) i m=2r for all sufficiently E[ 1{~~ < 70}L] small e since = 0( ● 2r ), thereby qp lSl~ showing < ~. Using RE’ Now suppose that CO’ does not hold. Then Am such that r<m<2r– land f(yo,..., Lemma -L(Y( 9 implies Eq. (25), remains there exists bobl– E[l{TF ..., yh ) ● Thus,s, that l,...,y~)~{(xo,..., x,F)=(yo, yk)}yk)} = 0( — then that as ~ -0. some (y., yk)<Zm+ = @( Ef(YO>. ,Yk)–bo)~(E77L) = o(E2r–m+~O-l–~,,)@(Em) It we have bounded follows < ~o}Ll E2’–1). that = L(xo,..., ~ xn)P{(xo, . . ..x.F)=(xo, . . . . Xn)} (XO,..., )=A=A n>o >L(yo, We now prove ACM . . ..yk )P{(xo,..., Theorem 4. TransactIons on Modeling and X,, )=(yo, Computer Simulation, . . ..yk )} =Q(e2r-1), Vol. 4, No. 1, January 1994. 84 Marvin K. Nakayama . PROOF OF THEOREM Corollary as 4. Recall 4 of Nakayama 6 ~ O if either r, = r b, > bO, and 7, – 2b0 <r, 7, — 2b0 <rL – 2b, implies RE that [1991] = @( 6-’12 ). First establishes that suppose CZ holds. remains bounded RE,\RE r, > r, or b, = bo. Hence, we may assume that – 2b,, and so CT,z= @(e F’-2b” ) by Eq. (17). Also, that 7, – bO = ~, – 2b0 + bO < ~, – 2b, + bO < r, – b, since b, > bO, and so dly = @(eFI-b~ ). Furthermore, r-, > r implies = @(c-’/2). Thus, RE, ~, = r byEq. (8), and so RE, = @( Etr-2bO)/2-fr-bOl) remains Now bounded suppose 7, —2b O>r, as ~ ~ O. that Ci does — 2 b,. It then not follows hold, i.e., assume r, > r, ~, = r by Eq. (8) and that that /RE and b, > bo, a,z = @( e’- ‘b) r, – b, < F, – b. = r – bo. Then by Eq. (17). First suppose d,y= @(~rJ-~L), and so RE~ = @(~~rt–Zb,J/Z–(r,–~,) ) = @(~-rL ‘2). Hence, since r, > r, we have that r, – b, > F, – bO = r – bo. Then RE,/RE - ~ as ● ~ O. Now suppose d*[email protected](e’-~’ (r – bo) ), RE,/RE that We now Our RE, = @( E[rI-2b~/2-( – (r – bO) = –r/2 and < (F, – 2bO)/2 first cc as ● + prove lemma 10. that (r, – 7, = r. Thus, we Zb,)iz _ again see •1 Theorem 5 by first establishing the of the [1991] Note since describes D,; see Nakayama LEMMA O. ’-b Ol). forms some summands preliminary in the results. expression for for the proof. Suppose y) ● I’ (x, x ● U. Then with for all e sufficiently small, (i) ~,~($, Y)/~(x, pendent of 31P(x, Y)/P(x, (ii) e, ifeitherx (iii) c7,P(x, pendent The next LEMMA there (i) (ii) y) ll,(xo, = P(x, + O(E-bZ), x, i) where p(x, y) where # O andy Conside~ bounds ( x ~,. ... ~ which sufficiently . . . . x,) = + O is inde- > O; + o(l), y) < xorx establishes a constant E >0 (xO,..., y)E-bL p(y; p(x, >x y) <0 is independent with p(y; x, i) = O; y)/P(x, y) = p(x, y)e-~” + O(~-~O), where p(x, of c, ifx = O andy >0 with p(y; O, i) = O. 11. Dl(xo, = P(X, > x with # O andy lemma exists for all Y) e, ify y) <0 of is inde- on D,. XJlm, m, where is independent n > 0 and of ( XO, . . . . x.) m > r. and E such Then that small, @(E-b) and Ill, (xo, . . . , x~)l < (m + l)@b, if @(E-h”) and ID, (xO, . . . , Xn)l < (m + l)~~-b[l if G A’~; x.) . . . . Xn~= (XO,. ... X~)G~’~. PROOF. Nakayama other results, The first [1991]. note part of Similarly, (i) was we established can prove in the the first proof part of n–l xn)l < & c7tP(xk, P(xh, on To show 13 of the Xk+l) x,+,) by the triangle inequality. First suppose that ( XO, . . . . x.) path (xO, ..., x.) has at least one transition that satisfies TransactIons Lemma that lDL(xo,..., ACM of (ii). Modeling and Computer Slmulatlon, Vol 4, No 1, January = A’n. Thus, the conditions 1994 the of Simple Failure-Biasing Lemma 10(i). Lemma 10 then (x, y) ~ r Since b, > bo, we have implies with that that Method . and 1 = o(~-~). e-~” = O(e-bL) d, P( x, y)/P( 85 x, y) = 0( c-b I) for all transitions x = U. Now define p, = max{lp(x, y)l:(x, y) = r}. small c > 0. It follows 2/3,6 - bI for all sufficiently for all sufficiently small e, Then from 16’, P(X, y)/P(x, y)l < Lemma 9(i) that n–1 Xn)l< ~ ID1(XO,..., E‘b, = 2nP,~–bL 2p. < 2(m + l) Np, e–bl, k=O thereby Now proving suppose (i), where @ = 2Np,. that (xo,... , x.) e ~~. x,) satisfies either (%),..., since 1 = O(~–b O), we have all sufficiently small condition (ii) that ldl P(xh, e > 0. Then Then each of the transitions of or (iii) of Lemma 10. Consequently, xk+l)/F’(xk, xk+l )1 s 2p, e-~o for Lemma 9(i) implies that n–1 ID, (XO,..., Xn)l < ~ 2p, e–~0 = 2np, ~–b0 < 2(m + l)Np, E–bO k=O for all sufficiently Now small ~, which we are in a position PROOF bo. Then that to OF THEOREM Eq. (16) 5. our to prove Assume implies establish completes that result, d,y Theorem Ci’ holds. = @(er-b’). we must ❑ the proof. 5. First suppose Thus, show it that is that r, clear from 13’[ l{TF – < b, s Eq. To}~~~21 7, – (18) = o(E2(r’-b’)). Recall 70}D~L]. (xO,..., our decomposition in Eq. (19). First Consider any path (xo, . . . . x.) that Xn) G A’~ for some m > r,, and so Lemmas D12(X0, . . .. XJL(XO . . . . . xn)P{(xo, Now consider Ci’ implies that ~(xo,..., ● sufficiently with E[ 1{~1 < r~ < ~, < ~~ < To. Thus, 9 and . . ..x.F) = 11 imply that (Xo,.. ..xn)} (x.,..., x~) ~ AC~, where rl s m < 2r, – 1. Condition Xn) > .2r, – m + 6., and so it follows from Eq. (27) . . . . XJP{(XO, . . ..xn)zixo Iy’(xo, for all a path that we work satisfies small. . . .. X.F)=G CO,..., x,)} Consequently, 2r, – 1 D:(xo, Ex m=r, (x “, . . ..xn)L(xo. XP{(XO, . . .. X.F)=(XO, ACM xn), xn) ... xn)=li; n>O Transactions on Modeling and ..., Xn)} = o(E2’’-2)’) Computer Simulation, Vol. 4, No. 1, January 1994. Marvin K, Nakayama 86 . since IA ~ I < ~ for employed all to establish m by Lemma 9. Using Eq. (26), we can show arguments similar to those that CX. Dt2(xo, . . ..xn)L(xo . . . . ..xn) Zx m=2rL (-to,.. .xn)=ALm n>O XP{(XO, using Lemmas Similarly, 9 and ..., . .. XnXn)} = o(E2’’-2)’) Ci’ that 11. Thus, we can show Therefore, xTF)=(xo, if r, – b, <7, that implies – bo, then E[l{I-~ < To}D~L] = 0(e2rL-2~L) from Eqs. (28) and (29). Hence, we have proven that Ci’ ensures RE~ remains bounded as ~ ~ O when r, – bl < F, – b.. Also, we can establish using similar arguments that C z‘ implies that 71 — bo. By appropriately modifying if does CO’ conditions not hold, PROOF Theorem OF THEOREM b. = 1. Thus, 6. prove bounded as ~ ~ used to show that RE,’ * that ~ as O when RE’ c ~ r, – b, > + cc as e + O if any O of the ❑ 6. Since s(x) remains the proof we can of C i‘ is violated. We now prove and RE,’ the = 1 for all system is balanced, x = S. Now b, consider = 1 for all any path 1 s i s C (x O,... , x,) ● Am. Note that (xo,.. ., x,) has exactly m + 1 failure transitions (including the initial transition), and so f( X., . . . . x.) = m + 1.To verify that Ci’ holds, first suppose that r, – b, <7, – 6., and so r, <7,. If (xO, ..., x.) = A[~, then wemusthave 2r, – m + 1 that f(xO,..., )=m+l>2r, similarly show If r,, andsof(xo, . . ..x~)=m+ (xO, . . . , Xn) G ~n, then rm+3b0b2b, that We now prove PROOF m> holds. C i‘ holds Theorem OF THEOREM l>2r, –m+b O= m > FL > r,, and So =2rlrm+l+l holds. we can ❑ if r, – b, > F, – b.. 7. 7. Suppose that CO’ does not hold. Then there exists some (y O,. ... yh)=A~ with r<m<2r–l such that f(yo, . . ..yk)<2r– m + bO. Now there must exist some component type i such that (y., . . . . yk G A’m , and so A1~ # ~. It then follows that r, s m. r, – b, <7, First suppose Eq. (8), we have that r, > r. Thus, – bO. From that r, < m < 2r, — 1. Also, we have that rL < m and m < 2r — 1 imply f(yo,..., Now yk) < Zr–m + bo SZr, –m suppose r, – b, > F, – bo. Note + bo, andso Ci’ does not hold. that Eq. (8) implies that r, > r and 2b, –2bo – 1> 2r– 1 since 27, + 2b1 –2bo – 1> 2r+ f(yo, . . ..yk)<2m+m+ r,<m<27, +2 b, —2bo —1. Also, + 2bt – b. Now ACM since we prove Transactions b, > bo. Therefore, Theorem on Modeling Ci’ does not hold. b, >bO. bO<27t ❑ 8. and Computer Slmulatlon, Vol. 4, No 1, January 1994 Hence, —m ) Simple Failure-Biasing PROOF OF THEOREM 7L2r=rL, we Xn) (xO,..., 2r 8. have G Azm with – 1. AIso, Assume that CO’ and ri r, < – CO’ bi n-t < < Zr, r, = r imply holds. F, – b. — 1. since that Now suppose rL – b, >7, O>2ri– + 2bi bounded as standard simulation the performance simple failure Example we have > 2r is used, r. Since that any r m < < – m + bO = 2r, – with 7, <m <2r, – 2bi + 2b0 2r, – 2bZ + 2b0 – 1< 2r – 1. CO’ gives and that ri > r, and = A’~ with us that b,>bo. so Fi = r and r, < m <27, – l, and so r < m < 2r it is possible diverges as that c + + 2b1 – – l. Hence, the same relative RE,/RE O. In we may be able to estimate with = consider – m + bO = 27, – m – bO + 2b,. that RE, ‘/RE’ measure 5. ).) r, Now 87 Now ❑ shows but biasing > 2r holds. example e ~ ~, r, . with 7, < m s 27, — 1. Note that r < m < implies that f(xo,..., ).) > 2r – m + bO = CO’ 27, – m + bO, and so Ci’ following ).) – 2b0 – 1 = 2r F, = r. Thus, The assume any(xo,..., CO’ implies that ~(xo,..., x,) consider any (xO, . . . . x.) E ~~ 2r – 1 since = b.. m+3b0–2bzsincert=r bi = bO. We may – bO. Consider 2b0 – 1. Then,2F, r, suppose > r < m s 2 r – 1. Furthermore, that that b, f(xo,..., m + bO. Now consider any (xO, ..., x~) e ~~ from r, = r and bi > bO that – 1. It follows Thus, 7, > r implies x~)>2r–m+b f(xo,..., Therefore, C i‘ holds. First as Method other remains words, if some derivative accuracy, but and we cannot if is used. Consider a system which has three types of components (i.e., C = 3), where each of the first two types has redundancy four (i.e., nl = ?zZ = 4), and the third type has redundancy three (i.e., n~ = 3). The components of type 1 and 2 have type 3 have repairperson failure failure rate who repairs ● (i.e., rate bl = bz = 1), and the components discipline. It is then sufficient to define the state of the system x,, x,), where xl is the number of failed components of type components are operational, only if all of the components 1 fails in state components and the system of all types (3, O, O), it causes of type of bO = 1. There is a single e 2 (i.e., b~ = 2). Thus components at rate 1 using a processor-sharing 3 to fail, all four i.e., is considered are failed. When components P((4, 4, 3); to be x = ( xl, i. Initially, all to be failed a component of type (3, O, O), if and of type 2 and all three 1) = 1. When a component of type 2 fails in state (O, 3, O), it causes all four components of type 1 and all three components of type 3 to fail, i.e., P((4, 4, 3); (0, 3, 0), 1) = 1. When a component of type 3 fails in state (O, O, 2), it causes all of the components of types 1 and 2 to fail, i.e., P((4, 4, 3); (O, O, 2), 1) = 1. It is easy to see that y = @(~3), and so r = 2. Also, rl = 3, rz = 3, r3 = 5, F1 = 3, F2 = 3, and 73 = 3. We can show that CO’ holds. Also, we have that i3 — 2 = 1 s r3 – 2b3 = 1, and considering show that so C3 holds. the path ((O, O, O), (O, O, 1), C3’ does not hold. (Also, CS3 (O, O, 2), is not (4, 4, 3)) in force.) ● Al, BY we can ACKNOWLEDGMENTS The author would like to thank Randy Nelson this problem and for giving helpful comments ACM Transactions on Modeling and Computer for initially suggesting work on on a preliminary draft of this Simulation, Vol. 4, No. 1, January 1994 88 Marvin K. Nakayama . article. Also, useful tions gratitude comments. that is expressed Finally, improved to Perwez the editor Shahabuddin and two referees for providing gave numerous sugges- the article. REFERENCES CARRASCO, J. A, In 1992. Proceedings Computer 5th Performance CARRASCO, J. A. Proceedings CONWAY, Failure of the Ezzaluatlon. 1991. Efficient A. E., AND Computmg. based Elsewer of stochastic GLYNN, P. W. Science GLYNN, P. W,, Manage. A. models. In 1987. Monte Proceedings IEEE algorithms. A., Stochastic Conference AND IGLEHART, D. S. S. SHAHABUDDIN, Comput. JUNEJA, P., for L. models with of In 152-161. computer system Symposium on Fault Large New deviations C’ontr. AC-28, Carlo York, and rare events in the 907–920, optimization. of In proceeduzgs 356–364. Importance sampling for stochastic simulations, 1987 Modeling and analysis of computer system availability. HEIDELBERGER, P., E. E., Eng. J. Watson 1964, Fast In 1984, 1991. Monte AND GLYNN, P. W. dependable Methods. Carlo simulation of Markovian of the 22nd proceedings Monte Asymptotic Carlo simulation Research of likelihood systems. 1992. A IEEE systems. Metheun, London, reliability/availability International of Markov Yorktown ratio N Y 1990, of highly Research dependable Center, of queues IEEE 1989 Trans. M. I., AND WEISS, A. systems. Yorktown J. Symposium unreliability estimators quick Automat. N.Y, Sensitivity Sensitivity analysls ratio To appear simulation Contr. 1989. Likelihood IBM Res. Rep. RC 15400, Heights, A derivative of Res, Rep. RC 76637, Heightsj models Center, Revision M. K., GOYAL, A., AND GLYNN, P, W. S,, AND WALRANL), REIMAN, 1992, policies. F. Markovian T J. Watson networks P. repan- AND BOHM, Markovian PARERH, F,, of highly on models. Des. 77, 49-62, reliable NARAYAMA, V, models Computmg. NAK.AYAMA, M. K. highly NICOLA, Markovian C-41, 36-51. general Tolerant LEWIS, sensitivity IBM for of Division, T. analysis Research in Oper. method AC-34, in simulations Research for Division, Res. excessive backlogs in 54–66. analysis for simulations via likelihood ratios, Res. 37, 830-844. RUBINSTEIN, R. simulation Y. 1989. models. SHAHABUDDIN, P. Manage. SHAHABUDDIN, P. Operations SHAHABUDDIN, derivatives P., P., reduction Szmulat~on TransactIons performance for the simulation extrapolation for computer Importance sampling of highly reliable Markovian Scz. To be published. 1990. Simulation AND NAKAYAMA, highly V, mean 1993; analysis F., time revised K, 1992. and to failure New York, 1993; Computer reliable systems. Ph.D. thesis, Dept. Calif. Fast systems. HEIDELBERGER, June of highly , Stanford, Markovlan IEEE, cm Modeling Univ M. reliable NICOLA, m and Stanford Conference. February and Res. 37, 72–81. Research, in SHAHABUDDIN, Oper. 1994. systems. ACM models. New York, simulation for Monte IEEE, 1989 simulating S., AND SHAHABUDDIN, Fault Received 1983. G approximation Simulation framework Trans. ance 351-365. Markovian of the 17th International Trans. Automat. HAMMERSLE~-, J, M., AND HANDSCOMB) D, C. of systems. Tools for Sci. 35, 1367-1393. unified Oper. Amsterdam, Systems. IEEE, Carlo tolerant J. Res Deu. .?1, 651-664, GOYAL, Nut. B.V., of failure/repair on Ftellable D~stributed GOYAL, GOYAL, A., AND LAVENBERG, IBM Publishers fault Techzuques and 230-235. 1986. the 1986 Winter of repamable on i!lodellmg simulation COTTRELL, M., FORT, J. C., AND MALGOUYRES, study simulation Conference transition of the 10th Syntposlum availability/reliability Tolerant distance International simulation of transient Working P., GOYAL, simulations. measures A., In AND GLYNN, Proceedings P. W. 1988. of the 1988 491-499. accepted Simulation, August Vol and their draft. 1993 4, No. 1, January 1994 VariWinter

© Copyright 2021