PDF file

A Characterization of the Simple
Failure-Biasing Method for Simulations
of Highly Reliable Markovian Systems
MARVIN
K. NAKAYAMA
T. J. Watson
Research
Simple
biasing
fadure
estimates
vian
the
systems.
We
article,
derive
we
leading
aspect
most
and
to failure
technique
gradients
biasing
yields
is balanced,
used
to
m simulations
bounded
it may
reduce
the
of highly
relative
not provide
paths
as well,
Categories
with
error
variance
reliable
of
Marko-
for the performance
bounded
it
the
relative
error
that
those
more
that
efficiently
when
when
the
but
necessary
the
is
also
for
using
bounded
simple
One
error,
secondary
measure
not
paths
conditions
estimator;
performance
the
failure
established.
sufficient
measure
error.
when
relative
some
and
method
relative
system
estimators
to obtain
performance
than
bounded
of the
error
be examined
for the
fadure-biasing
with
structure
derivative
by example
imply
simple
relative
shows
must
the
derivatives
bounded
failure
We also show
lts
on the
for
is that
to system
a derivative
and
condition
condition
do not
of when
measure
sufficient
conditions
estimator
failure
a characterization
a similar
to estimate
simple
them
can be estimated
of the
hkely
a derivative
possible
fadure
system
provide
Furthermore,
the
and
of a performance
measure
interesting
for
the
a necessary
performance
only
simple
when
estimators
biasing.
importance-sampling
is unbalanced.
this
produces
an
measures
Although
estimate
system
In
is
of performance
measure
Center
I.e., it is
when
using
biasing.
and
Subject
Descriptors:
C.4 [Computer
Systems
Organization]:
Performance
of
Systems—reLLabl lLty, aL,aLlabLhty, and serllceabil@y;
G.3 [Mathematics
of Computing]:
Probability
and Statistics—probabdLstzc
algorithms
(zncludmg ilfonte C’arlo); statlstzcal computzng;
and Modeling]:
Simulation
Output
Analysis
1.6.6 [Simulation
General
Terms:
Additional
Algorithms,
Key Words
systems,
importance
Experimentation,
and Phrases:
sampling,
Reliability,
Balanced
hkelihood
failure
ratios,
Theory
biasing,
simple
gradient
failure
estimation,
highly
rehable
biasing
1. INTRODUCTION
There
is
systems
an
increasing
and
demand
communications
is thus
of utmost
Author’s
address:
importance
IBM
for
systems,
systems,
that
to be able
Research
Divmion,
such
to predict
T. J Watson
as
have
high
transaction-processing
levels
during
Research
of reliability.
the
Center,
design
Yorktown
It
stage
the
Heights,
NY
the copies
are
10598.
Permission
not made
of the
to copy without
or distributed
publication
Association
specific
its
for Computing
date
of thm
commercial
appear,
Machinery.
and
material
1s granted
advantage,
the ACM
notice
is ,gwen
To copy otherwise,
that
provided
copyright
copying
or to republish,
that
notice
and the title
is by permission
requires
of the
a fee and/or
permission.
@ 1994 ACM
ACM
and
fee all or part
for direct
1049-3301/94/0100-0052
Transactions
on Modeling
$03.50
and Computer
Simulation,
Vol
4, No, 1, January
1994, Pages 52-88
Simple Failure-Biasing
reliability
of a system,
acceptable
system,
level.
and
with
and
state
of models
practical,
and
standard
tion
so
the
One
for
arises
of any
such
an
is
dynamics
The
(i.e.,
so as to make
frequently.
effective
event
simulated
under
mates,
we must
can
identify
to the
using
critical
typically
are
simulation.
use
large
methods
of any
rareness
see
this
probability
of interest
she
of
reduction
sampling;
is known
of the
not
However,
variance
reduc-
system
failures.
techniques
to be used
systems.
behind
the
at an
model
numerical
the
variance
reliable
idea
perform
Due
the
53
will
or
to
of
basic
underlying
he
without
because
importance
the
This
it,
.
a stochastic
tradeoffs.
resort
simulation
for
system
of systems,
must
proposed
method
overview.
types
inefficient
actual
constructs
design
designer
(i.e.,
need
simulations
the
analyze
of these
simulation
is
to
different
the
techniques)
Hence,
in
explore
that
a designer
methods
components
spaces
to ensure
Typically,
Method
Glynn
and
approach
distributions)
(in
our
as a “change
case,
to
system
the
failures)
since
the
[1989]
change
governing
of measure”
a different
probability
multiply
our estimator
Iglehart
is
the
system
occur
system
more
is now
measure.
To obtain
unbiased
estiby a correction
factor
called
the
likelihood
ratio. The proper
choice of importance-sampling
distribution
can
lead to a large variance
reduction.
However,
selecting
an appropriate
change
of measure
is
simulated,
not
we may increase
to determine
For
reliable
simple
Thus,
available
failure
it
depends
the thrust
on
biasing
there
as
(Lewis
system
being
is chosen,
in this
area is
distributions.
systems,
known
the
distribution
of the research
importance-sampling
Markovian
techniques
include
since
importance-sampling
the variance.
appropriate
highly
sampling
straightforward
and if an unsuitable
and
is
a class
failure-biasing
Bohm
of importanceschemes.
[1984],
Goyal
These
et al. [1992],
Shahabuddin
et al. [1988]),
bias2 failure
biasing
(Goyal et al. [1992]),
balanced failure
biasing
(Shahabuddin
[1994]
and Goyal
et al. [1992]),
and
failure
distance biasing (Carrasco
[1991; 1992]). The basic idea behind each of
these
methods
transitions,
is as follows.
the
probability
From
any
state
of choosing
having
a failure
both
repair
transition
and
over
failure
a repair
transition
is very small under the original
dynamics
of the system. Thus, in
failure
biasing,
we increase
the total probability
of a failure
transition
to 8,
where
O << 8 < 1 and
1 – 8. The various
decrease
the
failure-biasing
total
probability
schemes
differ
of a repair
in the
way
transition
they
to
allocate
8
to the individual
failure
transitions.
In this article we provide a mathematical
analysis
of the simple failure-biasing
method,
from which we will gain insight
into other techniques.
Shahabuddin
[1994] was the first to study mathematically
the asymptotic
properties
of performance
measure
estimates
obtained
using failure-biasing
methods.
In particular,
Shahabuddin
proved
that
when
using
balanced
fail-
ure biasing,
certain
performance
measure
estimates
always
have bounded
relative
error (i.e., for a fixed number
of samples,
the expected
width
of the
relative
confidence
interval
for the performance
measure
estimate
remains
bounded
as the failure
rates of the components
tend to zero). Shahabuddin
also showed that if the system being simulated
is balanced
(i.e., all of the
failure
rates of the components
are of the same order of magnitude),
then
ACM
Transactions
on Modeling
and Computer
Simulation,
Vol. 4, No. 1, January
1994.
Marwn K. Nakayama
54
.
using
simple
failure
biasing
performance
measure
ple
that
showing
unbalanced,
failure
rates
Since
tors
if simple
then
the
vanish.
balanced
that
have
balanced
failure
example
and
will
estimate.
give
rise
failure
relative
biasing
always
relative
error
biasing
is a more
that
is applied
may
failure
yields
without
method.
simple
which
biasing
However,
estimadoes not,
we show
biasing
is
as the
measure
failure
failure
for the
an exam-
bound
performance
simple
error
devised
to a system
increase
and
robust
when
relative
Shahabuddin
biasing
error
bounded
empirically
to bounded
Furthermore,
both
by
rise
to
does give
bounded
relative
error, it sometimes
yields a smaller
variance
than balanced
failure
biasing.
Thus, in certain
contexts,
simple failure
biasing
may be more
appropriate
than balanced
failure
biasing,
and this is an important
part of
our motivation
for now studying
simple failure
biasing.
It was not previously
established
exactly when simple failure
biasing
leads
to bounded
article
relative
we develop
system
that
error
for
the
a necessary
characterizes
when
error for the performance
condition
is that in order
performance
and sufficient
simple
failure
of paths
needed
to be considered
estimate.
In
on the structure
biasing
measure
estimate.
One
to determine
if simple
bounded
relative
error, we must examine
not
system failure
but also some secondary
paths
number
measure
condition
yields
this
of the
bounded
relative
interesting
aspect of this
failure
biasing
will give
only the most likely
paths to
to failure
as well. (The total
is finite.)
This
is in contrast
to what
others
have
methodology.
Shahabuddin
encountered
when
analyzing
the
balanced
failure-biasing
For example,
by considering
the most likely
paths to failure,
[1990,
Section
2.5.1] derived
bounds
on the variance
of the
performance
measure
Nakayama
theorem
using
[1991]
some
current
when
failure
sense
setting
using
the idea
and used it to establish
balanced
In
estimate
formalized
balanced
of “most
results
about
failure
likely
paths
derivative
biasing.
Also,
to failure”
estimators
into
a
obtained
biasing.
the
need
illustrates
to
the
examine
secondary
difference
between
paths
the
to
failure
in
our
importance-sampling
distributions
we consider
here and those used to simulate
large-deviationstype rare events, such as buffer overflows
and excessive backlogs
in queuing
networks.
Specifically,
in the large-deviations
context,
the optimal
(i. e., minimum variance
subject to certain
constraints)
importance-sampling
distribution is selected solely with regard to the most likely path to failure;
e.g., see
Cottrell
et al. [1983] and Parekh
and Walrand
necessary
and sufficient
condition
for simple
must
take
into
account
importance-sampling
this article.
We also consider
the components
the
scheme
estimating
using
secondary
for
the
paths
types
derivatives
the likelihood
[1989].
failure
ratio
On the other hand,
biasing
shows that
to failure
of reliability
with
respect
method.
(For
when
designing
systems
to the failure
details
our
we
an
studied
in
rates
on the
of
likeli-
hood ratio method
of derivative
estimation,
see Glynn
[1986],
Reiman
and
Weiss [1989], and Rubinstein
[1989].) We establish
a necessary
and sufficient
condition
that characterizes
when simple
failure
biasing
will give bounded
relative
error for the estimate
of a derivative.
The condition
for the perforACM
Transactions
on Modeling
and Computer
Simulation,
Vol. 4, No
1, January
1994
Simple Failure-Biasing
Method
.
55
mance measure
differs from that for the derivatives.
In fact, we show by an
example
that the condition
for one derivative
may hold whereas the condition
for the performance
measure
does not. Thus, by using simple failure
biasing,
we may actually
mance measure
derivatives
obtain better estimates
for a derivative
than for the perforitself. However,
we prove that if we can estilmate
all of the
with
performance
bounded
measure.
balanced
failure
proved that with
relative
This
biasing
balanced
with
respect
with
the same relative
to any
error,
situation
then
contrasts
we
that
can
also
which
do so for
arises
when
the
using
to estimate
derivatives
since Nakayama
[1991]
failure
biasing,
we can always estimate
derivatives
component
error
failure
rate
with
bounded
as the performance
relative
measure
error
and
estilmate.
We also establish
a number
of other conditions
which are used to further
analyze
the derivatives.
Finally,
we apply all of our conditicms
to examine
several
examples
to give a characterization
of the simple
failure-biasing
method.
The rest
of the article
mathematical
the basic
model
concepts
failure-biasing
simple
examples
to motivate
some tables
Section
5 contains
as follows.
reliable
of importance
and balanced
4 we present
article.
is organized
of a highly
In Section
Markovian
sampling
and explicitly
failure-biasing
methods.
why
simple
which
the
we study
summarize
analysis
2 we describe
system.
Section
describes
We then
failure
all of the
the simple
examine
biasing.
results
of estimating
the
the
3 reviews
some
In Section
shown
in this
performance
mea-
sure using
the simple
failure-biasing
method.
In Section
6 we examine
estimators
of the derivatives
using
both standard
simulation
and simple
failure
biasing.
We also consider
several examples
in order to complete
our
characterization
of simple failure
biasing.
results
from simulating
a large computing
and
some
placed
2.
directions
for
future
research
Section 7 presents
system. We state
in
Section
8. All
some empirical
our conclusions
of the
proofs
are
in the Appendix.
MATHEMATICAL
The following
MODEL AND NOTATION
is a modification
of a model
originally
developed
by Shahabud-
din [1994]
to study performance
measures
of highly
reliable
systems
and
later adapted
by Nakayama
[1991] to analyze
derivatives.
We assume that
the system is composed of C different
types of components,
labeled
1,...,
C,
with n, components
of type i. We let N E z:= ~ nl, which is the total number
of components
of all types
in the system.
As time
evolvles, components
randomly
fail and are sent to a repair
facility
to be repaired,
where the type
of service
A state
each type
discipline
for the failed
x of the system
and
we let S be
Our analysis
state space.
operational
S = U U F,
keeps
any information
components
track
about
is arbitrary.
of the number
the
queuing
of failed
components
at the repair
facility,
of
and
the state space of all such states, where we assume that IS I < ~.
in the following
sections is independent
of the actual form of the
We define n,(x) to be the number
of components
of type i to be
when the system is in state x. We decompose
the state space as
where U is the set of states for which the system is considered
ACM
Transactions
on Modeling
and Computer
Simulation,
Vol.
4, No.
1, January
1994
56
Marvin K. Nakayama
.
operational,
state
and
with
all
states.
The system
starts
operational,
and
we assume
that
of one component
ously.
More
precisely,
E {y
let S,(x)
states y which
state
set of failed
if x @ U and y = E with n,(y)
for the possibility
of component
assume that
We allow
failure
F is the
components
x, with
x. Suppose
causes
consider
nj(y)
E S:
have
for all
failure
other
one more
component
i fails
of type
of type
when
simultane-
state
x = S, and
which is the set of
of each type failed as in
< n~(x)
a component
O, the
y = U.
i.e., the
to fail
i and
type
for all j, n,(y)
<
as many components
i, then
propagation;
components
some component
at least
at least
some
> n,(x)
in state
O = U. We also
n,(x)},
i failed
the system
in state
y than
is in state
in
x, and let
x, i) be a probability
mass function
on S,(x). Then the system immedip( y; %, i). Thus,
nj( x) – nJ( y)
ately enters state y ● S,(x) with probability
components
of type j, 1 < j < C, fail on the transition
caused by the failure
of
i triggered
the
the component
of type i. In this case, we say that component
P(.;
transition
(x, y). This situation
may occur, for example,
system, where the failure
of a processor
contaminates
the
the disks to fail simultaneously.
Similarly,
repair
on
component
nent
we
allow
for
more
than
one
is made
is replaced
the
possibility
component
up of a number
when
enough
that
at
the
repair
a time.
This
of subcomponents,
in a computing
clata and causes
facility
may
completes
occur
if
and the entire
of the subcomponents
fail.
However,
some
compo-
we do not
allow for some components
to complete
repair and other components
to fail at
the same instant.
(For example,
this may happen when the re]pairperson
fixes
some component
but breaks
something
else
component
in the system.)
We define a transition
(x, y) to be a failure
when
replacing
the
if n,l( y)
transition
repaired
for
< nj( x)
< nh(x)
for some k. We use the notation
“y ~ x“
l,...,
C, with nk(y)
y) is a failure
transition.
Furthermore,
we define (x, y ) to be a repair
allj=
if (x,
if nj(y)
transition
> nj(x)
We use the notation
When the system
distributed
transition
,C, with
“y < x“ if ( x, y) is a repair
is in state ~, the components
lifetimes
(x,
for all j = 1,...
with
y) with
rate
rate
A,(x)
>0.
nk(y)
> n,k(x)
transition.
of type i have
Also,
the
W( x, y) z O. We model
system
for some k.
exponentially
makes
the evolution
a repair
of the system
as a continuous-time
Markov
chain Y = {Y(s):
s > O} on the state space S.
The infinitesimal
generator
matrix
Q = {q( x, y): x, y E S} of Y is given by
X~.lnk(X)Ak(X)p(Y;
q(x,
Y) =
[
for x #y,
and
q(x,
We
denote
X = {X.
{l’(x,
Y):
ACM
Transactions
X,
the
(1)
otherwise
embedded
: n > O}. The
on Modeling
>X
o
x) = E y+.
Y = S’},
ify
ify<x
q(x,
q(x)
by
Z, k)
W(X, y)
where
y). We let
=
–q(x,
discrete-time
transition
F’(x,
and Computer
matrix
y) = q(x,
Simulation,
x).
Markov
of
X
y)/q(x)
chain
is
then
for
Vol. 4, No. 1, January
(DTMC)
given
x #y,
1994.
of
by
and
Y
P =
I’(x,
Simple Failure-Biasing
x) = O. We
possible
define
r = {(x,
transitions
We assume
that
components
our
(i.e.,
the
repair
rates). (High
high redundancies.)
assume
that
where
~,(x)
where
is the
set of
highly
reliable
system
failure
is composed
rates
are
of highly
much
smaller
reliable
than
the
> 0 and
we allow
rates
b,(x)
P(”;
~>
of the components
> 1 are
of type
independent
x, i) to depend
~) =P.(Y;
~, i) =
of
i are
E, and
b,(x)
on e; i.e., for all (x,
is integer
y) E r
such
y)6d’(’’y’,
C,(x,
dl(x,
y) > 0 is integer
valued,
c~(x, y) > 0, and ZY~ sL(z) P, (“x,
Y,
that repair
rates, W( x, y), are independent
of c. We will
the behavior
of the system as E -+ O.
examine
We say that
order
the
of magnitude
1) and
assume
the
system
pe( y;
without
f(e)/E~
express
larly,
-
x,
i)
are
independent
loss of generality
0 as E -
f(E)
that
f(e)
of E. In this
we use the
f(e)
~ is O( E~)
= @(E~) if we can
c # O is independent
express
f(~)
we can
is not balanced,
d, a function
notation
as E ~ O, where
if we can
situation,
1. If the system
b =
same
= b for some
of E. Simi-
= cez + o(~z)
as E -+ O,
of E. Finally,
we say .f( E) = Q( Ed ) if
~ > d is independent
flE) = CEZ + O(EZ) as E - 0, where
c + O and ~< d is
where c + O and
we can express
independent
= O(E~)
A,( x, E) are of the
x E S, b,(x)
For some constant
0. Also,
= Ced + o(E~)
we say
if all values
is balanced
(i.e., if for all 1 < i s C and all
we call it unbalanced.
We now make some definitions.
if
y) > O}, which
1. We assume
=
b >
y c S, P(x,
57
reliability
for the system can also be achieved
by having
We model this by introducing
a rarity
parameter
E and
P(Y;
i)
x,
.
can make.
component
the failure
valued. Also,
that y > x,
y):
the system
Method
of e.
We define
and so q(0)
= @(e~o).
For any
s(x)
-
x = S, we define
min
{b, (x):
nt(x)~,
(x)
> O}.
I<isc
Note that S(x) is the exponent
of the order of magnitude
of the total failure
rate out of state x, and so s(x) = d if ZY ~, q(x, y) = @(e~). Furthermore,
s(0) = bo. For any (x, y) = r, we define
b(x,
y)
‘{
—
minl<,<c{b,(x)
o
which
is the
(x, Y). Thus,
y)>lify>x.
+ d,(x,
y):nl(x)~,(x)p,(y;
x,
z) > O}
ify
‘-
exponent
for
ACM
if y xx
any
of the
(x,
TransactIons
order
y) E r,
on Modeling
of magnitude
b(x,
and
y)
=
Computer
d
of the
if
q(x,
Simulation,
rate
y)
Vol.
<x’
of a transition
= @(ed)
4, No.
and
1, January
b(x,
1994.
58
.
Marvin
To prove
considered.
K Nakayama
our results,
we need to assume
(Al)
The DTMC
(A2)
For each state x E S with
andy
~ x.
x + O, there
(A3)
For each state z G F such
that
Assumption
from
any
X is irreducible
some structure
A2
states
state
that
for
exists y G S such that
(O, z) G r,
(x,
than
This
is at least
one repair
x # O, q(x)
= @(l).
A3 guarantees
that transitions
to a failed state have transition
E~“, which is the magnitude
ensures that system failures
when E is small.
From
these assumptions
certain
form.
For
transition
This
P(x,
y)
which
rates
elements
of the
possible
implies
that
P(x,
hold,
will
all
y) =
then
work
take the system from
that are much smaller
of the largest
transition
are rare events for the
the
any (x,
y) G r
q(O, z) = O(E ~“).
failure
transitions
(x, y) with
x + O have transition
probability
(3( e ~(” y “). Assumption
A2 is a critical
assumption.
If it does not
none of the failure-biasing
methods
discussed
in the introduction
well; see Juneja
and Shahabuddin
[1992].
Assumption
state O directly
being
over the state space S.
there
x # O. Hence,
on the model
rate from
embedded
transition
matrix
state O.
DTMC
have
a
y) = r,
=
[
(3(1)
ifx+Oandysx
@( Eb(’’y))
ifx+Oandy>x,
@(e
ifx=Oandy>x
b(x, Y)- b”)
(2)
as c-O.
For
hitting
any
set
time
of states
to the
A
set
A
c S,
for
the
define
~~ = inf{n
DTMC.
We
>0:
X.
concentrate
~ A},
which
is
the
on estimating
which
is of interest
for several
reasons.
First,
we can express
the mean
time to failure
as MTTF
= </y,
where
< = -E[ ~~~~ -1 l/q( X~ )1 and 7~,n =
min{n
> 0: X. e {O, F}}; see Goyal et al. [ 1992]. Also, let U(t) be the unreliathat the syst;i.e., U t ) is the probability
bility of the system for time horizon
tem
fails
before
time
t.Then
Shahabuddin
and Nakayama
[1992]
prove
that
(1 – e-g(() )yf)/U(t)
~ 1 as E -0,
where t = @( E-’f)with rt >0.
Shahabuddin
[19941 showed that if Assumptions
A1–A3
hold, then there
exists some constant
r > 1 (which
depends
on the model being considered’)
such that
y=@(Er).
(3)
We define
A={(xO,
(x,,
which
ACM
is the
Transactmns
. . ..).
n>l,
xO=O=
x,+l)=rfor
O<i
set of sample
on Modehng
paths
and Computer
O,xm~F,x,@{O,FlfOrlSi
<7z7
<n},
of the
embedded
S]mulatlon,
Vol.
DTMC
4, No.
1, January
for which
1994.
rr
< 70.
Simple Failure-Biasing
Furthermore,
we define
A~=((xO,...,
x~)=A:n
which
is the
(under
the original
>1,
set of sample
P{(XO,...,
paths
measure)
for
which
of the order
A=
where
failure”
X,~)
Method
.
59
=(xO,...
, Xn)}
= @( E~)),
~~ < To and
have
probability
Em. Note
that
U~=,A~,
the r is as defined
in (3). We call
and any path ( XO, . . . . x.) = An
(4)
A, the set of “most likely
with
m > r a “secondary
paths
path
to
to
failure.”
For any sample
path
(xO,...,
x.)
@ A, we define
n–1
Xn) =
f(xo,...,
~
l{X k.+1 ~xh}s(xk),
k=O
which
failure
failure
is the sum of the exponents
of the orders of magnitude
of the total
rate out of each state x~ along the path such that ( x~, x~. ~) is a
transition.
The reason for introducing
f( X., . . . . Xn ) will become more
apparent
in the
always true,
later
sections.
Note
that
f(x~,...,
as s(x)
> 0 for all
since
xl
> XO = O is
(5)
x = U. In a balanced
system,
f(xo,
...,
x~ ) is the number
x.).
we use the
following
(Xh)P(X~;
xk.~,
additional
Let
71 =inf{k
which
failure
and
xn)>bo
of failure
transitions
along the path (x 0, ...,
For our results
on derivative
estimation,
notation.
s(0) = b.
> O:X~
>Xh–l,
rzl (X~.l)A,
is the first
failure
transition
which
of a component
of type i. Nakayama
constants
r, > r and
i) >01,
may have been triggered
[1991] showed that there
by a
exists
7, > r such that
P{T,
< ‘TF <
To} =
@(Er’)
(6)
and
P{T~ < min{~,,
If P{~~ < min{~l,
I-o}} = O, then
in Nakayama
[1991] that
Xn)
forsomeo
is the set of sample
ACM
it was established
7,} = r.
(8)
the set
A’={(xO,...,
which
(7)
we set ?, = ~. Additionally
min{r~,
Also we define
To}} = @(Er).
Transactions
f=
A:n
> 1, n~(x~)~~(xk)P(x~+l;
s k < n such that
paths
~k+l
of the embedded
on Modeling
and
Computer
‘k,
Z“)>0
> xk},
DTMC
Simulation,
for which
Vol.
4, No.
~, < ~~ < To.
1, January
1994
60
Marvin K. Nakayama
.
Similarly,
which
we define
is the
min(~o,
set of sample
~,). Furthermore,
AZ=”.
n.,
of the
embedded
A’ n A ~ and Zn
= u“ n. ~, A’n, where
and~
A’n
paths
let A’~ ~
r, and
DTMC
= 1
for
which
n A ~, and
r~ <
note
7, are as defined
that
in (6) and
(7), respectively.
3. BACKGROUND
Let (Q, @
AND MOTIVATION
be the probability
the probability
measure
Letting
1{.} denote the
which
ion.
leads
to the
We first
generated
estimate
following
collect
using
space on which
on (Q,
indicator
approach
n i.i.d.
to estimating
samples
the original
probability
measure
1
of l{r~
as e -+ O. We define
relative
given
normal
level
better
so it
error
interval
more
One
such
RE
number
to be the expected
1 – q!I/2 quantile
to estimate
is
reduction
importance
y
using
technique
to
sampling,
such that
=
where
ACM
L(O)
Transactions
~ L(oJ)P’(0)
OJEL!I
= P( 0)/P’(
on Modeling
co), which
and
Computer
n and a
of a stan-
for n fixed,
(Q, =) for which
note that
for all
P is
of samples
set of events
P‘
measure
>0
are
= @(E’)
of our estimator
for a fixed
difficult
method
Ih
the point
measure
now describe.
Recall that
A is the (measurable)
1{7~ < TO}(w) = 1. Consider
another
probability
P’(I?)
simulatthe
we form
probability
@(eZ’)
–
as system failures
become rare.
we must
utilize
some variance
estimator.
(l).
To}],
,1
ZV be the
we see that
becomes
y by standard
P. Then
the original
1 – @. Letting
distribution,
~ ~ O, and
simulation
Hence,
the relative
P denote
gh~l~k,
72 = @(e’)
of its confidence
confidence
dard
as
width
=
< 7.} under
(T2 = y–
and let
~1, . . . . ~~ of 1{~~ < 1-0}, where
?(n)
and the variance
X is defined,
9) induced
by the Q-matrix
given in
function,
we can write
y = E[ 1{7~ <
B c A, B •~,
= E’[l{TF
is known
Simulation,
standard
obtain
which
a
we
w for which
defined
on
P(B)
>0.
Then,
< TO}L],
as the
Vol.
4, No.
Radon-Nikodym
1, January
1994.
deriva-
Simple Failure-Biasing
or simply
tive,
the
the likelihood
probability
Glynn
measure
and Iglehart
Equation
follows.
using
ratio,
P‘.
[1989]
(9) forms
Generate
basis
i.i.d.
the probability
E‘ is the expectation
See Hammersley
for more
the
n
and
and
of imQort~nce
P‘.
.
operator
Handscomb
61
under
[1964]
and
details.
samples
measure
Method
(11,
sam@ing,
-Ll),...
Then
, (In,
our new point
which
we apply
L.)
of (l{Tf
estimate
is
as
< 7.},
L)
In
?(n)
and the variable
of 1{~~ < ~O}L under
(7
The
goal
is
ensuring
to
that
;k~lfkik?
=
choose
so that
the variance
measure
when
E’[ l{TF< TO}L2] << E[l{~F
using
importance
than that for standard
simulation.
Now we describe the simple failure-biasing
sampling
method
repair
transition
biasing
is 0(1),
as was
is to increase
8< 1, of a failure
failure
transitions
shown
the total
of importance
transition
from
in proportion
in (2). The basic
probability
any state
We do not alter
idea
sampling.
behind
to ti, where
transition
to
in proportion
the transition
probabilities
x ● F. In some sense the simple
failure-biasing
smaller
of any failure
probability
of a
simple
8 = @(1) with
x, and then allocate
8 to the
to their
original
probabilities.
decrease
the total
probability
of a repair
1 – 8 to the individual
repair
transitions
probabilities.
< TO}], thus
is much
Under
the original
probability
measure
P, the probability
and the
transition
from some state x ● U, x # O, is 0(e),
failure
P‘ is
E’[l{TF < ~o}L2]– y2.
‘2 =
P‘
the probability
individual
Also, we
1 – 8 and
to their
from
state
method
allocate
original
O or from
is a natural
way of implementing
importance
sampling
since it preserves
the underlying
structure
of the system. A more precise description
of simple failure
biasing
follows.
For any state x = S, define F(x)
= Z,> ~ P(x,
z) and R(x) = Zz. . P(x,
z). Note
x, and
construct
that
R(x)
is the total
the transition
transition
(i) For
F(x)
is the total
matrix
(x,
y)
probability
probability
matrix
P using
failure
transition
transition
biasing
from
from
from
x. We
the original
algorithm.
@ r,
For (x,
y) = r and
(a) With
x = O,
y)
=
o;
x = U,
P’(x,
(b) With
a failure
a repair
P‘ for simple
the ensuing
P’(x,
(ii)
of taking
of taking
y)=
P(x,
y);
x # O,
i3P(x,
P’(x,
y)
=
ify>x
y)/F(x)
(1 – i3)P(x,
y)/R(x)
Transactions
on Modeling
(lo)
;
otherwise
[ o
ACM
if y <x
and
Computer
Simulation,
Vol.
4, No.
1, January
1994.
62
Marvin
.
(iii)
K. Nakayama
y) ● r and
For (x,
x 6 F,
P’(.z,
y)=
P(x,
y).
Extensive
empirical
work suggests that 8 should be chosen such that 0.5<8
< 0.9; see Lewis and Bohm [ 1984], Goyal et al. [ 1992], and Shahabuddin
et
al. [1988]
for further
are the same
(ii’)
details.
as above,
but
For (x,
y) = r and
(a) With
x = O,
(For
balanced
With
y)
=
l/rzF(x);
ify>x
t5/n F(x)
y)
=
(1–
[
ble
n~(x)
from
failure
(i) and (iii)
x + O,
P’(x,
where
items,
to
x ● U,
P’(x,
(b)
failure-biasing
(ii) is changed
= Z.Y.
state
. 1{( x,
x.
See
(x,
y)
8) P(x,
y)\R(x)
ify<x
o
;
otherwise
● r}
y)
is the
Shahabuddin
number
[ 1994]
for
of failure
further
transitions
details
on
possibalanced
biasing.)
Note
that
for
= r
x G U,
with
@(e
bf.,Y)-.t.
))
if
y
>~
P’(x,’y)=
{
(11)
ify<x
@(l)
= @( El{ Y~l}(b(l.Y)-s(z)))
under
Let
simple failure
biasing.
cr’z be the variance
of 1{~~ < ~O}L under
simple
failure
(4), we obtain the following
representation
of the second
will aid in our analysis
th~oughout
the article:
E’[l{TF
biasing.
moment
term
Using
which
< 70}L2]
= E[l{TF
< TO}L]
(12)
=:
~=r(~O,..
z
,Z.)-~~
L(xo,...,
,X,,
xn)P{(xo,
,Xn)}.
)=(xo,
n>o
We define
RE’
=
6%
(13)
Z,b
Y’
which
is the
Shahabuddin
the proof.
ACM
TransactIons
relative
[1994]
on
error
proved
Modeling
and
of our
simple
the following
Computer
Simulation,
failure-biasing
result;
Vol
estimator
see Shahabuddin
4, No.
1, January
1994.
of y. Then
[1994]
for
Simple Failure-Biasing
PROPOSITION
simple
biasing,
Shahabuddin
RE’
showed
biasing
remains
failure
mance
return
measure
estimate.
Example
to Example
1 and Example
Example
1.
bounded
by example
simple
may
simple failure
biasing
not for others.)
the system
If
1 (Shahabuddin).
failure
leads
Consider
that
not lead
Method
is balanced,
then
when
using
if the
system
is not balanced,
then
relative
error
for the perfor-
1 below also demonstrates
2 below later to gain insight
a system
relative
which
63
as e + 0.
to bounded
to bounded
.
error
has
this. (We will
into why the
for certain
three
types
models
but
of Components
(i.e., C = 3), where
the first
two component
types have a redundancy
of
two (i.e., nl = n2 = 2), and the third
type of component
has a redundancy
of one (i.e.,
e (i.e.,
n~ = 1). Also,
b ~ = bz = 1),
the
and
components
the
of type
component
of
1 and
type
2 have
3 has
failure
failure
rate
●2
rate
There is a single
(i.e., b~ = 2). Thus, b. = 1, and the system is unbalanced.
repairperson
who repairs
components
at rate 1 using
a processor-sharing
discipline.
In our situation,
it is sufficient
to define
the state
of the system
to
be x = (xl,
Xz, X3), where x, is the number
of failed components
of type i.
Initially,
all components
are operational,
and the system is considered
to be
operational
if
operational.
We assume
diagram
ties,
and
of this
and
lumped
only
model
Figure
if
there
there
with
2 is the
all of the states
the
same
with
is
at
least
one
component
of each
type
is no failure
propagation.
Figure
1 is a state
arcs having
the original
transition
probabili-
when
simple
using
X3 = 1 into
the single
failure
biasing.
state
( xl,
We have
X2, 1) = F for
all x ~ and x ~ since the system fails as soon as the component
of type 3 fails.
Also, we have omitted
the arcs from state ( xl, Xz, 1) since these repair
transitions
do not play
It is easy to see that
a role in our analysis.
y = 8(~),
and so r = 1. Now
we examine
the variance
of the estimator
of y obtained
using simple failure
biasing.
Consider
((0, O, O), (O, O, 1)) = A. From (12), the second moment
satisfies
E’[l{TF
<
the path
~o}L2]
z
(.XO,...,
O,. ... x.)}
... X,F)=(X
, XJP{(XO,.
L(xo,...
)=A=A
n>o
>
Thus,
L((o, o, o), (o, o, l))
u ‘2 = E’[1{~~
1)) – W 62 ) = (0(e).
< ~o}L2]
It then
– y2 > -L((O, O, O), (O, O, l)) P((O,
follows
20
RE’=z
as e when
0. Hence,
the system
ACM
L>—
*yG
simple
o, o), (o, o, 1))
P((O,
K
failure
that
for a fixed
number
O, O), (0, 0,
of samples
n,
(3(E1/2)
@(.)
biasing
= &@(’-’/2)
may
+Cc
not give bounded
relative
error
is unbalanced.
Transactions
on Modeling
and
Computer
Simulation,
Vol.
4, No.
1, January
1994.
64
.
Marvin
K. Nakayama
I
Fig.
1.
Now
we
Nakayama
Transition
diagram
examine
[ 1991].
Example
now change
another
for
Examples
1 and
example,
2 with
which
original
was
transition
probabilities,
previously
2. Consider
the same system as in Example
the failure
rate of the third type of component
analyzed
in
1 except that we
to ~ 3 (i.e., b3 = 3).
Thus, the system is still unbalanced.
Figures
1 and 2 are state diagrams
of
this model with the arcs having
the original
transition
probabilities
and the
simple failure-biasing
transition
probabilities,
respectively.
It is easy to see that y = ~ + O( e), and so r = 1. Now consider
the path
contributed
@(~) to the second moment
in
((O, O, O), (O, O, 1)) ● A, which
Example
1, thereby
giving the unbounded
relative
error. Note that now
L((o, o,o), (0,0, l)) P((O, O,O), (0,0, 1)) = ;
and in fact, we can show that E’[1{~~
W e 2). (Recall that 8 is the parameter
ACM
TransactIons
on
Modeling
and
Computer
+
0(E2)
= @(eZ),
< ~O}L2] = ((12 + 8)/46)~2
+ o(~2) =
used in failure
biasing.)
It then follows
Simulation,
Vol
4. No.
1, ,January
1994
Simple Failure-Bias!ng
that
Transition
diagram
a’2 = ((12 + 8)/(48)
for Examples
– l)e2
1 and 2 under
+ o(~2)
which remains
bounded
as ~ ~ O.
Nakayama
[1991]
showed that the
using balanced
failure
biasing
is
.
simple
failure
Transactions
on Modeling
and
\
biasing.
and
variance
of the
estimator
of -y when
Thus,
although
the variances
obtained
using
simple
and balanced
biasing
are of the same asymptotic
order of magnitude,
the variance
using simple failure
biasing
has a smaller
constant
for the lowest-order
in its asymptotic
expansion.
ACM
65
)\
I ( ‘ .DvH
//(’}
Fig. 2.
Method
Computer
Simulation,
Vol.
4, No.
1, January
failure
when
term
1994.
66
Marvin K. Nakayama
.
The
importance
certain
of Example
models,
balanced
failure
simple
failure
biasing,
biasing.
relative
Also,
error
several
even
questions:
2 is twofold.
failure
biasing
thereby
it shows
if the
why
simple
being
does simple
why
failure
demonstrates
smaller
that
variances
we want
biasing
considered
failure
it
yield
motivating
that
model
First,
may
to study
may
give
is unbalanced.
biasing
work
well
for
than
simple
bounded
This
raises
for Example
2
but not for Example
1? For a given model can we determine
if simple failure
biasing
will give bounded
relative
error?
In the following
sections we will develop conditions
on the structure
of the
model
to determine
when
simple
failure
biasing
will
and will
not give
bounded
RE’.
4. SUMMARY
In
this
when
OF THE RESULTS
article
the
bounded
we define
relative
derivative
conditions
performance
measure
error
by using
estimation,
we impose
which,
or its
for
a given
derivatives
simple
failure
some
model,
can
biasing.
additional
(For
being
(co’)
f(xo,...,
1,
(Ci)
r, =r
(Ci’)
If
with
results
on the
restriction
is that
we do not allow
components,
and so b,(x) ===b, for
below,
are in terms
of the asymptotic
given
system
the
structure
see Section
6. L One
failure
rates
for the
conditions,
characterize
be estimated
on
system;
state-dependent
all x E S.) The
structure
of the
considered.
x~)>2r
—rn+b
or b, =bO
Ofor
all (x O,. ... )cA~,
or ?, – 2b0 <r,
r<rrzrz
S2r
—
used
to
–2b,.
r, – b, s FL – bO, then
f(xo,...,
xn)
2r,
–m+b0
forall(
2
2r,
[
S2r,–1
–m+3b0–2b,
forall(
If r, – b, >7,
f(x(,,...,
xO, . . ..)~A~n.rl<m<m
xo, . . ..x.
)~~n,
F,<rn
S2r,
-2
b,+2bO+l
+2
b,–2b0–l
– bo, then
xn)
2FL–m–bo+2b1
forall(xO,
>
27, –m+b
[
(CSi)
The
ri<m<27,
o
x~)~E~,FL5m527,—
forall(xo,...,
1
r, = r or b, = bo.
conditions
which
are
characterize
simple
failure
condition
to show a result
ACM
. . ..x~)~ALn.
Transactions
on
Modeling
and
labeled
with
a prime
(i.e.,
“‘“)
are
ones
biasing.
If there is no prime,
then we use the
about standard
simulation.
Also, if a condition
is
Computer
Simulation,
Vol
4, No
1, January
1994
Simple Failure-Biasing
labeled
the
with
a “O,” then
performance
condition
rate
is used
Basically,
estimate.
to characterize
of component
under
it is an assumption
measure
type
failure
the
needed
it
to establish
is labeled
derivative
with
with
.
67
a result
an
on
“i ,“ then
respect
to the
the
failure
i.
we use Conditions
simple
If
Method
CO’ and
biasing
of certain
C i‘ to ensure
sample
paths
that
the
probabilities
are not too small.
As we
shall see, these conditions
are necessary
and sufficient
for achieving
bounded
relative
error. The first condition
of C i and C Si states that a component
of
a failure
transition
on a most likely
path to failure;
the
type i can trigger
second part stipulates
that the components
of type i have one of the largest
condition.
For more details and
failure
rates. The last part of C i is a technical
insights
on the conditions,
We let
with
RE,
respect
simulation,
Sections
Table
article
denote
to the
and
see Sections
the relative
failure
rate
RE~ denote
gives
of the estimator
of component
the same when
6.2 and 6.3 for the precise
I summarizes
the results
and
5 and 6.3.
error
a fairly
type
using
particular
holds
row and column
the
if and
column
only
headings,
heading
if RE’
and
that
characterization
biasing
method.
The interpretation
of Table I
column
heading
states some condition.
We use
“bounded.”
For each off-diagonal
entry, there is
‘<e “ which means “if and only if”; “ + ,“ which
whi~h means “does not imply.”
The symbol gives
implies
simple
definitions.
and examples
complete
of the derivative
i when
where
is bounded;
failure
biasing.
simple
See
in this
failure-
is as follows.
Each row and
“bald” as an abbreviation
for
one of the following
symbols:
means “implies”;
and “ * ,“
the relationship
between the
“ means
for
“ *
RE~ being
of y
standard
are presented
of the
the “ *
similarly
using
the
the row heading
.“ For
bounded
for
example,
CO’
one value
of i
does not imply that RE’ is bounded;
and RE~ bounded
for all k implies
that
CO’ holds.
Table II gives the justification
for all of the entries in Table I. If an entry in
Table
I is either”=
refers
to a theorem
“ or”=
,“ then the corresponding
entry in Table II either
holds trivially.
If an entry in Table I is “ * ,“
or the result
then the corresponding
ample.
entry
in Table
Some of the most interesting
First,
the Condition
CO’ is
performance
measure
to
have
II refers
to the appropriate
results contained
a necessary
and
bounded
relative
in Table
sufficient
error
counterex-
I are the following.
condition
for the
when
using
simple
failure
biasing.
Since CO’ imposes restrictions
on all sample paths ( XO, ”.., x.)
G Am with
r s m < 2 r — 1, we see that
secondary
paths
to failure
(i.e.,
role in determining
the variance
(.xO,...,
x~)e
A~, m > r) play an important
of the estimator;
see Theorem
2 for further
holds for the derivatives
and Condition
C i‘;
bounded
remains
for some
bounded
details.
A similar
observation
see Theorem
5. Also, if REj is
component
type i, it does not necessarily
follow that RE’
as ~ ~ O. In other
words,
it is possible
to estimate
a
derivative
more efficiently
than the performance
measure
when using simple
failure
biasing;
see Example
3. However,
if we can estimate
all derivatives
with bounded
relative
error when using
also do so for the performance
measure;
ACM
Transactions
on Modeling
and
simple failure
biasing,
see Theorem
7.
Computer
Simulation,
Vol
4, No.
then
we can
1, January
1994.
68
Marvin K. Nakayama
.
Table
==+
System balanced
co’
II
c,
=+
11+
REh/RE
bdd
bdd Vk
RE;
c%’
RE~
bdd
bdd Vk
co’
co’
RE’, RE’
c s,
c,
bdd
a
==+
==?
==+
=+
==$
a
*
=+
1
*
*
+
=+
+
+
=&
*
*
*
1;
I*
I
=+ =+ +
=!4
+
+
I=?l=#l+l*l
*-m
A
+
+
+
+
+
+
-
RE~ bdd Vk
=+
==
==+
+
+
+
.==’
G
+ =+
+
+
*
+
=?+
+,
+’
=+
=$!
==
Cs,
+
==’
==’
==+
==+’
+
==’
*
+
co’, c%
+
==
==+’
=+
==$
+
+
=+
+
+
RE’, RE; bdd
+
==’
===$’
+
+
=+
==+
+
+
.=+
Table
Cond,tion
Systcm
co’
R2
-
R19
R19
R14
R14
R14
R19
R19
R19
R19
RI 4
R14
R14
R19
R4
R15
R1O
R1o
R115
R15
R1o
R1o
R15
I
Rll
I
R16
R16
RIO
R16
R16
R4
1
RI
R9
co’, Cs,
co’, c%
R13
R.
13
RE’, RE; bdd
R13
Rl:
Trivml.
R4:
Theorem
R7:
R1O:
Ii
Rvt
*
R17
R7
)
R7 I R19
I
R19
R19
R19
R1O I R1o I
R5
I—
3
R4
R1
R19
R15
+
m
R2, RI
!
RI<
R19
R] 4
I
R16
I
Rlf
H
R16
I
R16
II
RI E
n
R9
1
R15
R8
R15
R20
R20
R15
R20
R19
R5
Rl
R15
R19
R9
Proposition
R5:
Theorem
5.
Theorem
7.
R8:
Theorem
8,
R9:
Example
1,
Example
1 with t = 3.
Rll:
Example
1 with k = 3.
R12.
Example
2.
R3:
Theorem
2.
R6:
Theorem
6.
RO
R17
1
RI 7
R19
i+%
R2[
R19
1
R13:
Example
2 with a = 1.
R14:
Example
2 with t = 3,
R15
Example
2 with I = 1 and k = 3
R16:
Example
3 with t = 1.
R17:
Example
3 with i = 3.
R18:
Example
4,
R19:
Example
4 with
R20:
Example
5 with
5. ESTIMATING
THE
PERFORMANCE
i = 3
USING SIMPLE
MEASURE
FAILURE
BIASING
We
now
investigate
accomplish
which
failure
ACM
this,
we
do by
biasing.
Transactions
the
we
first
showing
Consider
on Modellng
estimation
must
its
y)
Computer
y
using
a better
connection
(X,
and
of
gain
E r
to
with
Simulation,
simple
failure
understanding
the
likelihood
x G U
Vol.
and
4, N().
of
ratio
biasing.
~( X.,
under
x # O. Using
1, January
1994
H
R17
R1
R2 :
R2, ?.5
R19
R16
R19
4.
t = 3.
I
I
+
R17
RI
1.
==+
+~
R17
R19
RI
R11R31R1
R3
I
R1
R17
R18
~~
R2, RI
R19
R1o
R17
~~~k
R6
R3
C*
RE~ bdd
RE’
bd;
R6
RE, /RE bdd
REj bdd Vk
Ct’
R6
SW
R9
bdd Vk
R4
R12
R17
RE, /RE
bdd
R4
RE’ bdd
bdd Vk
RE, \RE
+’
I
RI
R12
C*’
of Table
=+
*
R2
co’
REk/RE
Justification
bdd
Sy, tem balanced
u
II
c,
RE’
b.danmd
U
RE,/RE
RE; bdd
co’,
I
1
of Results
*
1=+1=+1
&
c,’
Summary
bdd
II balanced \
[
c,
RE’
co’
System
Condition
II
I
To
. . . . x.),
simple
(2) and
Simple Failure-Biasing
Method
.
69
(11), we obtain
P(x,
y)
P’(x,
y)
if y > x. If y < x, then
when
y x x. Since
simple
failure
@( EMx>Y))
both
P(z,
transition
= @(Esf’J)
@( Eb(x.Y)-9(.K))
=
y) and
P(x,
y)
P’(x,
y)
P’(x,
y) are @(l),
and thus,
= @(l)
probabilities
out
of state
O are not
changed
in
biasing,
for all (0, y) = r. It then
P(o,
y)
P’(o,
y)
follows
that
=1
for ( XO, ...,
x.)
G A, the likelihood
ratio
satisfies
L(xO,..
., Xn)=
‘–1
JJ
P(xk,
k=o
xk+l)
P’(xk,
=
(ij(eml{n+l~
2,)
S(X,
))
xk+l)
(14)
since
both
s(0) = b.
the initial
failure
using simple
From (12)
much
the
sum
which
failure
biasing.
we see that the
simple
consider
and
transition
failure
biasing
in the
likelihood
reduces
(xO, . . . , x~) = A. The
definition
has no effect
ratio
the
smaller
is the
second
L(xO,
of f( XO, ...,
x.)
on the likelihood
key
when
to determine
how
moment.
...,
x.)
More
is,
includes
ratio
the
specifically,
less
the
path
( XO,... , x.) will contribute
to the second moment.
From (14) the order of
by /lx O,..., x.). AS we see in the
magnitude
of L(xO, ..., x.) is dictated
following
theorem,
CO’ establishes
the orders of magnitude
for the likelihood
ratio in terms of the /lx O, . . . . x~ ) for each path (xo,...,
x.) that are needed
to achieve
bounded
THEOREM 2.
We
now
restrictions
give
relative
RE’
some
on paths
error.
remains
bounded
insight
of order
into
as
this
c m with
E +
O if and
result.
Note
only
that
if CO’ holds.
CO’
only
r s m s 2 r — 1. The condition
imposes
ensures
that the contribution
of each of these paths to the second moment
is not
for the variance
required
to
greater than e z‘, which is the order of magnitude
achieve bounded
relative
error. It turns out that none of the paths of order
e 2‘ or higher will interfere
with ensuring
a second moment
of order ● z‘, and
proof of Theorem
paths of order e‘ - 1 or lower are not possible. The complete
2 is given in the Appendix.
One interesting
point
about
Theorem
2 is: it
shows
that
to determine
if
simple
mance
failure
failure
biasing
will result
in bounded
relative
error for the performeasure
estimate,
we need to check not only the most likely paths to
(i.e.,
(XO, . . . . xn) G A,)
but
also
some
secondary
paths
(i.e.,
of paths in each IAm 1, m > r,
m > r) as well. The number
(Xo>...,
x~)GAnl,
ACM
Transactions
m Modeling
and Computer
Simulation,
Vol. 4, No. 1, January
1994
70
Marvin K. Nakayama
.
is finite
that
(see Lemma
9(ii)
in the Appendix),
and so the total
number
of paths
must
be considered
is finite
since CO’ only restricts
paths in A ~ for
r s m < 2r — 1. Furthermore,
when
m > 2r, we have 2r — m + bO < 6.,
and by (5), the condition
f( XO, . . . . x.) > 2r – m + 60 used in CO’ becomes
vacuous when m > 2 r.
We now examine
Condition
CO’ more closely. To do so, we will use the
following
result,
PROPOSITION
(i)
(ii)
f’(xo,..
.Mxo,
Now
whose
3.
proof
Forall(
is in the Appendix.
xO,...
– b. + Z~j~l{xh+l
consider
Proposition
L(xo,
Since
~,
.,xn)<m+bo,
xl)
any path
>xk}b(x~,
(xO, . . . , x,)
2r–m+b0
by
j)~A~,
(xO, ...,
3(i). Thus,
. . ..xn
x.)
● Am,
<f(xO,
from
xh+l)
m
=m
> r. If CO’ is valid,
. . ..x~)<r
+bO
then
(15)
(14), we see that
)=Q(~~)
and
● A ~, it follows
that
L(xO,...,
)= O(~z~n)-n).
@(Em)
P’{(xO,.. ., X,F)=(XO,.
=Q(em)
... x.)}
and
@(Em)
P’{(XO,... ,XTF)=XO,O,XJ},XJ}
=
o(ez’-~),
or equivalently,
P’{(XO,..., X,F)=(XO, XJ}=O(l)O(l)
and
P’((XO, .... XTF)=
(Xo,..
.,xn)}
=Q(E-2’).
Hence, Condition
CO’ ensures that a sample path (XO, . . . . x.) = Am, r < m <
2 r – 1, has probability
under
simple
failure
biasing
at most of the order
●‘n
2‘. In particular,
this means that all of the most likely
paths to failure
of order 1
under the original
measure
(i.e., ( XO, . . . , x.) ● A,) have probability
under
simple
failure
biasing.
For example,
if r = 2, then CO’ ensures
that
under simple failure
biasing,
(1) all paths in A ~ have probability
of order 1
and (2) all paths in A ~ have probability
at most of the order ●z.
Also, it follows from (15) and Proposition
3(ii) that for ( .xO, . . . . x.) ● A,
k=O
k=O
If CO’ holds.
b(x, y) > s(x)
Recall
that
for any (x, y) G r with
>1. Thus, if CO’ is valid, then b(xh,
ACM
on
Transactions
Modeling
and
Computer
Slmulatlon,
Vol.
4, No.
y > x, we have that
Xk+l) = S(xh) for all k
1, January
1994
Simple Failure-Biasing
such that
occurs
Xk + ~ > Xh. In other
along
measure).
must
any
of the
If simple
failure
be of the largest
words,
most
biasing
order
consider
likely
a failure
paths
gives
to
.
transition
failure
bounded
of magnitude
Method
the
error,
then
relative
of any failure
(x,
(under
transition
71
y) that
original
P( x, y)
from
state
x.
In
general,
However,
checking
as shown
relative
error
Condition
in Proposition
when
using
anced.
Developing
verified
Now
may be difficult,
we use Theorem
Example
that we
estimated
Al
for
a given
general
failure
biasing
sufficient
and we have
2 to reexamine
model
sufficient
may
condition
is that
conditions
the
which
be tedious.
for bounded
system
can
is bal-
be easily
not done so.
Examples
1 and 2.
1 (continued).
Recall that r = 1, and so Theorem
2 establishes
only have to consider
the paths
in A ~ to determine
if y can be
with bounded
relative
error using simple failure
biasing.
Note that
= {((0,0,0),
((o,
and
simple
more
CO’
1, a simple
fl(O,
(1,0,0),
o, o),
O, O),
(0,0,
(2,0,0
)), ((0,0,0),
(O, 1,0),
(0,2,0)),
1))},
(O, O, 1)) = 1 < 2r
– m + bO = 2 since
m = 1. Thus,
CO’
does not hold.
Example
ascertain
that
2 (continued).
if simple
Since
r = 1, we only need to examine
A ~ to
biasing
will result in bounded
relative
error. Note
failure
Al = {(((),0,0),
(1,0,0),
(2,0,0)),
((0,0,0),
(0, 1,0)>
(0,2,0))},
and f((O, O, O), (1, O, O), (2, O, O)) = ~((0, O, O), (1, O, O), (2, O, O)) = 2>
— m + bO = 2 since m = 1. Thus, CO’ holds.
6. ESTIMATING
In this
section
to component
DERIVATIVES
we discuss
failure
the estimation
rates.
We first
of the derivatives
analyze
estimating
of y with
these
using
4 which
relative
System Structure
To obtain results
more structure.
on derivative
estimates,
we will
The main
differences
are that
state-dependent
failure
propagation.
We now
respect
quantities
standard
simulation.
Then we examine
Condition
C i‘ from Section
characterizes
when simple failure
biasing
will give rise to bounded
error for the derivative
estimates.
6.1 Additional
2r
rates
and
(2) we
assume that the system
(1) we no longer
allow
restrict
the
generality
has
for
of failure
The following
modification
was developed
by Nakayama
[1991].
assume that components
of type i have failure
rate A, which is
independent
of x = S; i.e., A,(x) = A, for all x ● S. We assume that O < A, <
~,, where ~, <00. Furthermore,
when parameterizing
by the rarity
parameter
b’, where ~, > 0, b, > L and b, is integer
e, we assume that A, = A,(E) = ~, ●
valued. Note that b, does not depend on the state x. Also, we no longer allow
Oforalll<i<Candx,
y=S.
the P(Y; x, i) to depend on ~; i.e., dt(x, y)=
In this situation,
we have that 60 = min{b, :1<
i < C}.
ACM
Transactions
on Modeling
and
Computer
Simulation,
Vol.
4, No.
1, January
1994.
72
Marvin K. Nakayama
.
As in Nakayama
simplify
[1991],
we make
(A4’)
VP(Y;
(A5)
If there exists a component
then there exists another
Z, Z) >0
p(y;
O, j)
and P(Y;
+P(Y;
Assumption
A4 stipulates
can trigger
of the
also implies
Z, ~) > Q then
to
b, = b,.
that
if the failure
two
component
for (x,
types
y) 6 r,
y)
=
of either
from
state
are of the
same
where
C(X,
Hence,
the
-Y)
0, d(x,
>
transition
than
a sum
magnitude
ify>x
p(x,
ify<x
y)
[1991]
one
of the
O to
state
A5
largest
component
y
with
type
transition
able
when
that
Assumption
we
having
a different
are
holds
p( y; O, j)
ensure
there
A4
is
failure
to
and
W(X,
of
a single
to determine
Al;
one
of the
see the
y)
the
proof
> 0.
term
order
of
of Lemma
= O. Assumption
is no cancellation
must
rates
condition
systems.
type
type
It
be
A5 is a technical
having
from
state
some
other
which
causes
is not
unreason-
should
j in Assumption
in the calculations
i
a transition
there
failure
This
reliability
component
component
cause
then
largest
probability.
large
some
can
probability,
if the
~ > 0,
consist
is used
respect
there
whose
considering
A5
if
positive
with
assumption
details.
that
j also
This
failure
form:
value,
transitions
with
further
rates
some
b~ = bO and
that
for
failure
failure
of y
stipulates
types
the
otherwise
Assumption
derivative
11 of Nakayama
Assumption
for
as before.
of the
●-order.
C(. X,y)E~(~’y)
y) > 1 are integer
rates
of two different
.x to y, then
q( x, y ) has a certain
{ o
same
assumptions
type i such that b, = bO and p(y; O, i) >0,
component
type j + i such that bJ = b ~ and
a transition
that
q(x,
rather
additional
O, i).
of components
rates
the following
the proofs.
be
A5
noted
satisfies
assumption
of certain
the
used to
expressions
arising
in the derivatives
with respect to Al; see the proof of Lemma
Nakayama
[1991] for further
details.
In the situation
when there is no failure
propagation,
Assumption
11 of
A4 is
automatically
satisfied,
and Assumption
A5 reduces to requiring
that there
are two different
component
types which both have failure
rates of the order
● ~0; i.e., there exists i and j such that
i # j and 6, = bj = bO. (Assumption
A5
holds under
this weaker
condition
for the following
reason.
Suppose
that
when component
type i fails in state O, the system then enters state y, and
suppose that when component
type j # i fails in state O, the system then
enters state z #y. Then P(Y; O, i) = 1 and P(X; O, i) = O for all x #y, and
p(z; O, j) = 1 and P(X; O, j) = O for all x #z. Thus, P(Y; O, i) = 1 #p(y;
O,
j) = O, and so A5 holds.)
If we allow for failure
propagation
but with
the
restriction
that any given failure
transition
can be triggered
by the failure
of
only one type of component
(i.e., for (x, y ) = r, there is at most one component type i for which p( y; x, i) > O), then again it is easy to verify that (1)
Assumption
A4 holds and (2) Assumption
A5 reduces to requiring
that there
exist component
types i and j such that i + j and b, = b] = bO.
ACM
Transactions
on Modeling
and Computer
Simulation,
Vol
4, No, 1, January
1994,
Simple Failure-Biasing
Estimating
6.2
We now
tive
Derivatives
analyze
of y with
standard
likelihood
respect
simulation.
obtained
The
in the following
A(AI,
Using
...,
ratio
results
section
rate
estimators
of the
of component
in this
when
.
73
Simulation
derivative
to the failure
using
simple
failure
employ
the notation
tion
Using Standard
Method
section
type
will
we consider
partial
deriva-
i obtained
be compared
estimating
using
to those
the derivatives
biasing.
Throughout
the rest of the article,
we will
d, A( Al, ..., &) = ( ~/6’A, )A( Al, ..., Ac ) for some func-
AC).
the likelihood
ratio
method
of estimating
derivatives,
we obtain
where
See Nakayama
further
et al. [1990],
Glynn
[1986],
and Reiman
and Weiss
[1989]
for
details.
We estimate
d, y using ~staAndard
Generate
n i.i.d. samples (11, D,,l),...,
probability
measure
P. We then
form
si~ul~tion
in
the
following
(1,, D,.)
of (1{~~ < 7.},
the point estimate
manner.
D,) using
the
k=l
Nakayama
[1991]
established
that
C7ty= @(e
for all
~ sufficiently
small,
mln{rL-bj,Ft-be})
(16)
and
cr21 =
where
r,
and
7, are
cannot say whether
<7, –2boorri–2b,
Var[l{~F
<
as defined
~O}D, ] = @(e
in (6) and
min{r, –2b L,7, –2b0)
(7), respectively.
(17)
),
In
general,
we
r, – bi s 7, – 60 or r, – b, > Pi – b. or whether
r, – 2 b,
>?, – 2 b.. Also, the expression
for d, y is independent
of whether
we use standard
simulation
or importance
sampling.
However,
the
variance
of the estimator
depends on the simulation
method being employed.
Let RE, denote the relative
error of the estimator
of d, y obtained
using
standard
simulation,
i.e.,
Corollaries
4–6 of Nakayama
be estimated
with the same
[1991]
relative
established
accuracy
that certain
derivatives
can
as the performance
measure
itself when using standard
simulation.
(In particular,
it was shown that this
is the case if C Si holds.) We now want to further
characterize
these derivatives. To do so, recall Condition
C i defined
previously
in Section 4. Then we
have the following
result,
whose proof is given in the Appendix.
THEOREM
4.
ACM
REJRE
Transactions
remains
on Modeling
bounded
and
as e -0
Computer
Simulation,
if and
only
Vol.
4, No.
if
Ci
1, January
holds.
1994.
74
Marvin K. Nakayama
.
6.3
Estimating
Using Simple Failure Biasing
Derivatives
We can also use simple
failure
(7Ly = E[l{’TF
Thus,
we estimate
d, y using
biasing
<
to estimate
derivatives.
TO}D1]= E’[l{TF <
s~mple
failure
biasing
Note
that
TO}D,LI.
in the following
manner.
Generate
n i.i.d. samples (11, D, ~, Ll), . . . ,(i~, D, ~, t.)
of (1{~~ < ~0], D,, L)
using the probability
measure
P‘. We then form the point estimate
REl’
Let
simple
denote
failure
the
biasing,
relative
error
of the
estimator
of
d, y obtained
using
i.e.,
~qqrl
RE:
=
d,y
‘
of I{TF < rO}D, L under
the probability
UL‘2 is the variance
next result
establishes
that Condition
C i‘ from Section
where
The
sary and sufficient
condition
have bounded
relative
error
RE~ remains
THEOREM
5.
The basic
idea
orders
of the proof
of magnitude
of the
bounded
as
of Theorem
summands
E +
O if and
of D~ depends
r[ ) occurs. We then
derive
bounds
To take advantage
of this, we express
E’[1{~~
only
5 is as follows.
of D,, and
Xh+l)
satisfies
7F < min(~o,
situations.
measure
P‘.
4 is a neces-
for the derivative
estimator
with respect
when using simple failure
biasing.
whether
or not the transition
( Xh,
i ) > 0. Thus, the order of magnitude
or
(18)
Zti
if
Ci’
We first
note
that
to Al to
holds.
examine
it
nL(Xk_l)ALp(Xk;
on whether
for
the
depends
on
Xk-l,
~~ < ~~ < To
D~ for
these
two
< ~O}D:L2]
= EII{Tr
(19)
< rO}D:L]
= E[l{~,
s TF < 70}D~L]
+ E[l{~~
< min(70,
7,)} D~L].
We separately
analyze
each expectation
on the right-hand
side of ( 19) and
show that Condition
C i‘ characterizes
when simple failure
biasing
will give
bounded
relative
error by using the bounds developed
on D, and some other
bounds
(Lemma
9
in
the
Appendix).
The
complete
proof
is
given
in
the
Appendix.
We
now
examine
Condition
C i‘ more
closely.
First
suppose
that
r, – b, < F,
and note that C i‘ only
imposes
restrictions
on paths
in AZ having
probaprobability
of order em with r, < m < 2 r, – 1 and paths in ~ having
~t < m < 2rl — 2 bl + 2 bO — 1. These conditions
enbility
of order ● m with
sure that the contribution
of each of these paths to the second moment
is not
●‘’2 b,, which is the order of magnitude
for the variance
greater
than
required
to achieve bounded
relative
error. It turns out that none of the paths
of higher order will interfere
with ensuring
a second moment
of order e ‘L 2 hL,
– b.,
ACM
TransactIons
on
Modeling
and
Computer
Simulation,
Vol
4, No.
1, January
1994
Simple Failure-Biasing
and
paths
rl –b,
of lower
>7,
In
general,
the
checking
Condition
following
result
relative
failure
biasing
error
for
that
the
is
sufficient
conditions
have not done so.
THEOREM
biasing,
are not
possible.
If
6.
the
the
shows
that
is
the
using
need
simple
failure
for the
failure
biasing
measure
error
75
occurs
when
balanced.
all
of the
when
may
then
be tedious.
condition
using
Developing
verified
when
more
using
for
simple
general
be difficult,
some examples,
failure
biasing.
to ensure
using
additional
is utilized,
and
may
sufficient
and
simple
we
failure
as e -0.
when
biasing
model
estimators
be easily
sufficient
relative
a given
is balanced,
bounded
CO’ is not
bounded
strates
situation
a simple
derivative
can
system
for
that
system
which
RE~ remains
C i‘
shows
We now use Condition
C i‘ to examine
gain a better
understanding
of simple
with
A similar
.
–bo.
However,
bounded
order
Method
that
differs
failure
C i‘.
the
the
from
relative
always
biasing.
Thus,
somewhat
in which
derivatives
which we will
first
example
we can estimate
simple
Condition
from
Our
that
errors
remain
derivatives
This
demon-
situation
when
of the
bounded;
when
balanced
performance
see Nakayama
[1991].
Example
2 (continued).
We previously
showed that CO’ holds. Now consider component
type 3. Note that A: = ~, At = {((O, O, O), (O, O, l))} and
q
= {((0,0,0),
(1,0,0),
(2,0,0)),
((0,0,0),
1,0),
(o,
(0,2,0))},
and so r~ = 2 and Fa = 1. (Similarly,
we can show that rl = 71 = r2 = F2 = 1.)
((O, O, O), (O, O, 1)) ● A~. Note
Thus, r3 – b~ = – 1 s 73 – bO = O. Consider
that
f((O,
does
not
O, O), (O, O, 1)) = 1<
hold.
The next
follows
(Also,
we
example
from
can
shows
Theorems
may be able to estimate
2r~ – m + bO = 3 since
show
that
2 and
that
Ci’
C3
does
not
m = 2. Hence,
does not necessarily
5 that
when
a derivative
with
using
bounded
imply
simple
CO’. Thus,
failure
relative
biasing,
error
but
3.
Consider
a system
which
has four
types
it
we
not the
performance
measure
itself.
In other words,
it is possible
to obtain
estimates
for a derivative
than for the performance
measure.
Example
C3’
hold.)
of components
better
(i.e.,
C = 4), where each of the first two types has redundancy
one (i.e., nl = nz =
1);the third type has redundancy
two (i.e., nj = 2); and the fourth
type has
redundancy
failure
rate
of type 1 and 2 have
one (i.e., n4 = 1).Also, the components
so b ~ = bz = 1. The component
of type 3 has failure
rate
●, and
Ez, and the fourth
b4 = 4. Thus
at rate
type
of component
bO = 1. There
1 using
has failure
is a single
a processor-sharing
rate
repairperson
discipline.
e 4; and so b ~ = 2 and
who
It is then
repairs
components
sufficient
to define
the state of the system to be z = ( x ~, Xz, X3, xl), where xl is the number
of
failed components
of type i. Initially,
all components
are operational,
and the
system is considered
to be failed if and only if either all components
of types
1, 2, and 3 are failed
or the component
of type 4 is failed.
When
the
ACM
Transactions
on Modeling
and
Computer
Simulation,
Vol.
4, No
1, January
1994
76
Marvin K. Nakayama
.
component
of type 1 fails in state (O, O, 0, O), it causes the component
of type
2 and one of the components
of type 3 to fail, i.e., P((l,
1, 1, O); (O, O, 0, O),
1) = 1. Figure
3 is a state diagram
of the model
of this system
with
the
original
states
transition
with
.X3 since
have
do
the
system
omitted
not
probabilities
X4 = 1 into
the
play
a role
{(1,
1, 2, 0), (xl,
Note
that
Al
=
fails
arcs
in
as the
state
( x ~, X2,
our
analysis.
X3,
on the arcs. We have lumped
all of the
( x 1, X2, X3, 1) ~ F for all xl, X2, and
state
as soon
from
Xj,
given
the single
The
component
of type
X3, 1) since
set
of failed
4 fails.
these
repair
states
is
Also,
we
transitions
given
by
F =
1)}.
f),
A2 = {(a,
h, 1)}
As = {(a,
m),
(a, h, g,k,
1), (a, h,f,
j, 1), (a, h, g, h, 1), (a, h, f, h, 1)}.
We can show that y = (5/6)e2
+ 0(6Z), and so r = 2. Also, rl
r~ = 2, r4 = 3, 71 = 3, F2 = 2, 73 = 3, and Fb = 2. Now consider
path (a, m) = As. Observe
and so CO’ does not hold.
that
m) = 1 < 2r
f(a,
= 2, r-z = 3,
the sample
– m + bO = 2 since
m
= 2,
Now consider
component
type 3. We have that r3 – ba = O <2 = 73 – bO.
Also, A; = A ~, and f(a, h, 1)= 3 > 2r~ – m + bO = 3 since m = 2. FurtherA: = As – {(a, m)}, and we can establish
that
all
(XO, . . . . x.)
= Al.
Additionally,
2r~ – m + bO for
~(xO,...,
more,
– 1 = 1, and
+2b0
(x~,...,
xn)
so the
condition
is vacuous
=x:
since
2rL – 2b, + 2b0 – 1.Therefore,
The
previous
example
the performance
one of the
estimate
achieve
measure
derivatives.
imposed
there
C3’
that
bounded
However,
C3’
on
~(xo,
. . . . x.)
of m for which
for
7L < m <
holds.
demonstrated
with
by
is no value
x.) >2 =
2r~ – 2b~
73 = 3>
the
we may
relative
following
error
result
not be able
to estimate
even if we can do so for
shows
that
if we can
all of the derivatives
with bounded
relative
error, then we can also
the same for the performance
measure.
See the Appendix
for the
proof.
THEOREM
If
7.
Ci’
holds
for
all
component
types
i =
1, ...,
C,
then
CO’
holds.
The next example
shows that being able to estimate
a derivative
with the
same relative
error as the performance
measure
using standard
simulation
does not necessarily
imply that we can use simple failure
biasing
to estimate
the derivative
with bounded
relative
error. (In fact, even if we also assume
that the performance
measure
can be estimated
with bounded
relative
error
when using
simple
failure
biasing,
this still
does not guarantee
that the
derivative
estimate
using simple
failure
biasing
will have bounded
relative
error. See Example
5 in the Appendix.)
Example
Thus, Cl,
ACM
1 (continued).
Note
that
rl
= r2 = r~ =
1 and
71 = F2 = 73 = 1.
C2, and C3 hold.
TransactIons
on
Modeling
and
Computer
Simulation,
Vol
4, No.
1, January
1994,
Simple Failure-Biasing
il,o~’
Fig. 3.
Now
Transition
consider
diagram
component
for Example
type
3 with
3. Observe
original
that
Method
I=
transition
.
fll,l,l,
77
o)
\
probabilities.
r~ – 63 = – I < ~3 – bo = 0.
Consider
the path ((O, O, O), (O, O, 1)) @ At. We have that
fl(o,
0, 0),
(o, O, 1)) = 1< 2r3 – m + bO = 2 since m = 1. Thus, C3’ does not hold.
We now
want
to establish
a simple
sufficient
condition
the derivative
with respect to A, can be estimated
by using simple
failure
biasing.
The next result
from
Section
THEOREM
4 implies
8.
If
To
gain
a better
of y
with
respect
the desired
CO’ and
CSi
conclusion;
hold,
understanding
to
A, to be
then
largest asymptotic
magnitude
if lim inf.
types j. Then Corollary
4 of Nakayama
s, has a largest
ACM
Transactions
asymptotic
on Modeling
Ci’
that
error
C Si
for the proof.
holds.
CSi,
we define
the
sensitivity
= A, - d, y. We define
s, to have a
Is,(
c)/s,(
E)
I
>0
for
all
component
~ ~
[1991] establishes
that
magnitude
and
ensures
see the Appendix
of Condition
s, = s,(~)
that
with bounded relative
shows that Condition
Computer
if and only if C Si holds.
Simulation,
Vol.
4, No
1. January
(20)
1994
Marvin K, Nakayama
78
.
Thus,
if(1)
we can estimate
the performance
measure
with
bounded
error using simple
failure
biasing
and (2) the sensitivity
with
has a largest asymptotic
magnitude,
then simple failure
biasing
bounded
relative
error for the estimate
of d, y.
The
following
example
shows
that
being
able
to estimate
relative
respect to A,
will result in
both
y and
cl,y
with bounded relative
error by using simple failure
biasing
does not necessarily imply that we can estimate
y and d, y with the same relative
error when
using standard
simulation.
Example
4.
C = 3), where
n3 = 2).
Consider
a system
each component
Also,
the
which
type
components
has three
types
has a redundancy
of type
1 and
of components
of two (i.e.,
2 have
bl = bz = 1), and the component
of type 3 has failure
bO == 1.There is a single repairperson
who repairs
using a processor-sharing
discipline.
It is then sufficient
Thus
failure
nl
(i.e.,
= nz
rate
=
e (i.e.,
rate ●3 (i.e., b~ = 3).
components
at rate 1
to define the state of
the system to be x = ( x ~, Xz, X3), where
x, is the number
of failed components of type i. Initially,
all components
are operational,
and the system is
considered
to be operational
if and only if there is at least one component
of
each type operational.
We assume there is no failure
propagation.
It is easy to
see that
y = @(e),
and so r = 1. Also,
rl
= rz = FI = Fz = Is =landr~
=4.
Note that Al = {((O, O, O), (1, O, O), (2, O, 0)), ((O, O, O), (O, 1, O), (O, 2, O))}.
It is easy to check that
CO’ holds,
and so Theorem
2 implies
that
RE’
remains
bounded
as 6 + O.
Now consider
component
type
Wecaneasily
(x0,...,
show
that
3. We have
~(xo,...,
)= A~withr3=4<m
<m<
/1$0,...
X~)~27s7m
+bO=3=m
< m s 5 = 273 + 4bo – 1. Thus,
r~ > r,
and so C3 does not hold.
C2’
7.
hold.)
However,
EXPERIMENTAL
We now present
large computing
x.)
>273
5=2
b~>bo,
r3 – b~ = 1>73
F~+2b~–
forall(x
C3’
that
holds.
2bo–l,
andalso
o,o, x.., x.)~z~
(We
also
and73–2bo
can
– bO = O.
= 7 – m for all
– m – bO + 2b3
that
with
show
that
73=l
Cl’
and
=–l>r~–2ba=–2,
RESULTS
some experimental
system to show that
article
hold when simulating
et al. [ 1990], and Nakayama
results
from
the asymptotic
actual systems.
[ 1991] previously
running
results
simulations
developed
of a
in this
Goyal et al. [1992], Nakayama
examined
the same system but
with different
failure
rates for some components.
We used the SAVE package
[ 1987] to obtain all of our results.
The system is composed
of two types of processors,
A and B, each having
redundancy
2; two types of disk controllers,
each having
redundancy
2; and
six sets of disk clusters,
each having four disks, When a processor of one type
fails, it causes a processor
of another
type to fail also with probability
0.01.
Each type of component
can fail in one of two modes which occur with equal
probability.
The repair rates of the all mode-1 failures
are 1 per hour, and the
repair
rates
of all mode-2
failures
are 1/2
per hour.
There
is a single
repairperson
who fixes failed components
in random-order
service. The data
is replicated
on each of the disk clusters
in such a way that all of the data in
ACM
TransactIons
on Modehng
and Computer
Shmulat,on,
Vol
4, No
1, January
1994
Simple Failure-Biasing
a cluster
type
is accessible
of processor
processors
of a given
its corresponding
at least
even
type
disk
with
1/2000
per
1/12000
Thus
hour.
per
hour,
the system
gives
results
simulation,
obtained
all simulation
is either
is operational
disks
all
the
other
first
disks
cluster
have
have
failure
note
of the
simple
with
using
failure
results
a component
that
biasing,
failure
respect
a numerical
from
the processors
be able to estimate
error
interval)
failure
1/6000
to pAfr
and
value
magnitude
than
of the
of type
still
not
produced
note that
with
per
hour.
that
failure
biasing.
events,
respect
with
agrees
of the 99%
to pAfr
to
Theorem
and
d 1 fr
our
rate
of all
of Condition
we should
about
the same
9970
confidence
is the case, Also,
is much
standard
is worse
estimate
(see
of the sensitivity
Theorem
5). Finally,
with
balanced
in
simulation
than
4 and (20). Using
the
larger
that
simple
of the
failure
estimates
for both
y and its sensitivity
the system is unbalanced
(see Theorems
failure-biasing
acceptable
this
respect
d 1 fr,
to
with
an
half-width
failure
of the
and indeed
respect
with
to pAfr
width
We
where
estimate)
the largest
respect
with
method,
We give the relative
relative
measure,
sensitivity
with
This
the
d 1 fr. Table
with
2 and
respect
to d 1 fr
failure
biasing
reasonably
stable estimates
for all the quantities.
We should also
the confidence
intervals
for the estimates
of -y and its sensitivity
respect
balanced
of the
we obtained
good
to pAfr
even though
8). The simple
rate
d 1 fr =
(nonsimulation)
one million
A have
with
by
sensitivity
sensitivity.
biasing,
respect
is
measured
as the performance
estimate
is
= 1/1000
rate
rate
and balanced
simulating
or repair.
the sensitivity
(as
numerical
other
if(1)
data
have failure
the component
types, and so it satisfies
the second stipulation
Ci. Theorem
4 then states that when using standard
simulation,
relative
The
one of
if and only
All of the disk controllers
in
Each
through
(2) all
(i.e., confidence
interval
half-width
divided
by the point
confidence
interval
for all simulation
estimates.
First
is failed.
clusters
and
79
controllers.
of the system is given in Figure 4.
processors
of type A and B are pAfr
obtained
standard
event
on the disk
is operational
y and its sensitivities
the
cluster
of disk
.
is unbalanced.
We estimated
III
type
respectively.
and
in the
types
The system
of each
The
disks
of the
access the data
accessible. A block diagram
The failure
rates of the
per hour,
one
controllers.
one processor
and 1/2000
if one of the
is paired
Method
to pAfr
failure
are slightly
biasing,
thus
better
for
simple
demonstrating
that
failure
simple
biasing
failure
yield
better
results
than
balanced
failure
biasing.
For
empirical
analysis
of estimates
of performance
measures
see Goyal et al. [1992] and Nakayama
et al. [1990].
8. CONCLUSIONS
AND DIRECTIONS
than
biasing
for
can
a more thorough
and sensitivities,
FOR FUTURE RESEARCH
We have performed
an extensive
study of the simple failure-biasing
method.
In particular,
we established
necessary
and sufficient
conditions
for obtaining
performance
measure
and derivative
estimators
that have bounded
relative
error. One interesting
aspect of these conditions
is that it shows that in order
to guarantee
bounded
ACM
relative
Transactions
error,
on Modeling
we need
and Computer
to consider
Simulation,
not
Vol.
4, No
only
the
most
1, January
1994
80
Marvin K, Nakayama
.
Proce
Disk
Control
Disk
Fig, 4.
Table
III.
1
Cluster
Estimates
Block
Dtsk Cluster
diagram
Measure
0.9421
0.5060
“ &T
0.9945
X 10-3
X
Relative
x 10-3
–. 5770 x 10-5
. &T
–.1013
*
likely
paths
failure.
Also,
bounded
to system
we showed
relative
error
99%
0.9361
Confidence
10-3
Biasing
Failure
Biasing
X 10-3
0.9471
x 10-3
0.5000 x 10-3
x 10-4
–.5329
imply
those
0.4992
for
the
X
10-3
+ 10 6%
—.6030
X 10-5
* 141.1%
failure
but also secondary
paths
that the conditions
for a derivative
do not
k 3.6%
& 6.3%
106.2%
Intervals
Balanced
+ 3.1%
+ 40.5%
dljr
system
Failure
14.6%
0.5477 x
10-3
Disk Cl~ster 6
Simple
Simulation
+
PAfT
with
4
computing
Standard
Result
7
Disk Cluster
of a large
of y and Sensitivities
Numerical
Performance
3
X
10-5
+ 46 3%
leading
to system
estimator
to have
performance
measure
by
constructing
an example
demonstrating
that
it is possible
to estimate
a
derivative
more efficiently
than the performance
measure
when using simple
failure
biasing.
easily
implementable
m.th.d.
One area for future
research
is to develop
which will determine
if simple failure
biasing will or will not work well (i.e., if
CO’ holds) for a given model. A simple sufficient
condition
for this is to check
if the system is balanced.
However,
as we have seen, this condition
is not
necessary.
Also, further
work is required
to characterize
other importancesampling
schemes used for analyzing
highly
reliable
systems. Finally,
as we
have shown
in Example
2, when
simple
failure
biasing
yields
bounded
relative
error,
it may produce
smaller
confidence
intervals
then balanced
failure
biasing.
Thus,
it may be more appropriate
to use simple
failure
biasing in certain
contexts instead
of balanced
failure
biasing,
and additional
research
should be carried
out to identify
classes of models for which this is
ACM
TransactIons
on
Modeling
and
Computer
Slmulatlon,
Vol
4, No,
1, January
1994
Simple Failure-Biasing
the case. Currently,
the only
is most
is experimentation.
appropriate
method
available
Method
for determining
.
which
81
method
APPENDIX
First
we prove
PROOF
tion
Proposition
3.
OF PROPOSITION
for
rn
definition
in
terms
we
3.
of
must
To
the
have
prove
b(xk,
our
Xk+l).
result,
we
first
Consider
derive
any
(xo,
a representa-
...,
Xn)
G Am.
13Y
that
n—1
P{(xo,
. . .. x.,)
= (Xo, . . ..xn)}
= lqP(xk,
Xk+l)
= o(Em).
k=O
The
Also,
probability
(xo,...
of the initial
transition
(X., x ~) is 0( e 6(‘0’”)
that Xk + () for k > 0. Hence,
, Xn ) G Am implies
of any failure
repair
transition
transition
(x~,
- ~“ ) by Eq. (2).
the probability
x~ + ~), k > 0, is @(~ b(X~’xk+‘)) by Eq. (2). Each
has probability
of order
1. Thus,
since
b( x, y) = O if y < x,
n–1
13(xo, X1)
–
b. + ~
Xk+l)
=m,
., x.),
note
b(xh,
(21)
k=l
or equivalently,
which establishes
part (ii).
To convert Eq. (22) into a bound
for any (x,
y) = r
with
on flxo,..
y > x. It then
follows
that
b(x,
y) > s(x)
that
n–1
s(xo)
b. +
–
~
l{xk+l
>Xk}S(Xh)
<m,
k=l
which
To
implies
prove
LEMMA
(i)
(ii)
...,
Theorem
x.)
❑
< m + bO.
2, we will
Consider
9.
(xo,
need the next
E Am, where
. . . . x.)
lA~[
. . . . XTF) = (Xo, . . . . x.)}
= tl(~~)
and
m > r. Then
where
independent
m.
of
Parts
Theorem
of
(i),
1
(xo,
. . . . x~)
(ii),
and
of
ACM Transactions
the
and
and
upper
Nakayama
on Modeling
and
P{(XO,
. . . . X,r)
sufficiently
small,
where
of (X., . . . . x~ ) and m
L(xn, ..., x.) = @(e~{xO’’Xn)-~O)
for &l G > g euffi.i.mtly
.srnall,
PROOF.
proof
n >0
< lS/(n+l)~
Xn)} < a~ ‘~m for all ● >0
(xO,...,
are constants
which are independent
(iv)
lemma.
n<(m+l)~
F’{(XO,
(iii)
~(xo,
L(xn ,. ...
v a~d
bound
[1991].
and Computer
in
The
x.)
<
q a.:
(iii)
first
Simulation,
zqmcf(xo’
constants
are
of
(iii)
~
‘“’x”)–bo
which
established
part
=
a and
in
CZP.
the
follows
Vol. 4, No. 1, January
1994.
82
Marvin K. Nakayama
.
immediately
(iv)
in
from
Eq.
To
prove
the
.>xn)e
(xO,..
the
definition
we
have
in
part
shown
the
first
part
of
validity
of the
upper
bound
(iv),
first
note
that
for
A~,
~
< a/3”Em
k=O
(iii).
From
P’(x,
Eq.
y)
n–l
‘–1 P(xk, .%k+~)
., Xn) =
L(xo,..
by part
of A ~. Also,
(14).
P’(xh,
(11),
= <(x,
we
~k+l)
can
k=O
(23)
“~~k,
~k+l)
express
> X}(b(x,
y)el(~
1
~
y)–s(x))
+
o(El{y>
X)( b(x..Y)– s(x)))
((x, y) is independent
of ●. Define
(’ = min{[(x,
y) :(x, y) = r} and
<. = min{l,
[’/2}.
Note that
P’(x,
y) z ~. e’{y ~ x}(~(Z,.V–Sf~l) for all suffl.
ciently
small
● > 0. Thus, for all sufficiently
small ~ > 0,
where
n–1
JJP’(xk,
x,+,)
2
k=O
(24)
from
part
First
(i). We now examine
note
that
since
b(x,
of ● in Eq. (24).
the exponent
y) = O if y < x,
n–1
~
l{xk+~
‘~~}(b(~k,
~k+~)
‘S(Xk))
k=O
n–1
=
~
n–1
b(X~,Xk+~)
-
~
k=O
l{~k+~
~Xk}S(X~)
k=O
n–1
=
~b(x~,
xk+~)-f(xo,...,
xn)=~t+
bf(~~,~,
~n),
~n)
k=o
by Eq. (21). It then
follows
from
Eq. (24) that
for all sufficiently
small
e >0,
n–1
H
“(xh,
Xh+l)
> ~(~+’%m+ho-f(zu,
,.,),
k=O
Thus,
Eq.
(23)
and the proof
Now
implies
that
is completed
we can prove
by letting
Theorem
v = cz/~,N and
❑
q = fl/~N.
2.
PROOF OF THEOREM 2. Suppose CO’ holds. Since y = W e r ), it is clear from
Eq. (13) that we need to prove that
l?’[1{~~ < ~o}L2 ] = 0( ~2r). Recall our
expression
for the second moment
given
in Eq. (12), and consider
some
ACM
TransactIons
on Modehng
and Computer
Simulation,
Vol
4, No
1, January
1994
Simple Failure-Biasing
● A.
x.)
(x~,...,
L(XO,.
where
r < m
+’{(XO,...,
...
< 2r
X,F)
–
1. Using
Lemma
Method
.
83
9, we see that
(Xo,....xn)} = @(Em+f(xO’ “’x”)-~”),
=
and so CO’ implies
xn)P{(xo,
_L(xo,...,
. . ..x.F)
=
(Xo,. ...xn)} =
o(Em+2r-m)
E2’).
= 0(
Hence
2r–1
xx
m=r
L(XO,...,
....xTF
xn)P{(xo,
(25)
since lA~l < ~ by Lemma
9.
Now consider (xo,...,
x.) ~ Am, n > 0, with
L(xo,...
for all
~ >0
that
xn)}
0(62’)
=
implies
)=(x”,...,
(Xo, .,. .Xn)=Am
n>O
, Xn) <
sufficiently
small
Ez
since
~,.
...
)eAm
– 1.By Lemma
9,
‘X”)–bo< qm
f( XO,...,
x.)
, XJP{(XO,.
L(xo,...
m=2r(~
Vq%f(’o’
m > 2r
...
> b.. Therefore,
X,F)=(XO,.
Lemma
9
. . . Xn)}
Am
n>O
m=2r
(xo, .
, xn)s
~=z~
Am
71>0
= valslN
(qPlslNE)m= 0(62’)
i
m=2r
for
all
sufficiently
E[ 1{~~ < 70}L]
small
e since
= 0( ● 2r ), thereby
qp lSl~
showing
< ~. Using
RE’
Now suppose that CO’ does not hold. Then
Am such that
r<m<2r–
land
f(yo,...,
Lemma
-L(Y(
9 implies
Eq. (25),
remains
there
exists
bobl–
E[l{TF
...,
yh ) ●
Thus,s,
that
l,...,y~)~{(xo,..., x,F)=(yo, yk)}yk)}
= 0(
—
then
that
as ~ -0.
some (y.,
yk)<Zm+
= @( Ef(YO>. ,Yk)–bo)~(E77L) = o(E2r–m+~O-l–~,,)@(Em)
It
we have
bounded
follows
< ~o}Ll
E2’–1).
that
=
L(xo,...,
~
xn)P{(xo,
. . ..x.F)=(xo,
. . . . Xn)}
(XO,...,
)=A=A
n>o
>L(yo,
We now prove
ACM
. . ..yk
)P{(xo,...,
Theorem
4.
TransactIons
on Modeling
and
X,, )=(yo,
Computer
Simulation,
. . ..yk )} =Q(e2r-1),
Vol.
4, No.
1, January
1994.
84
Marvin K. Nakayama
.
PROOF
OF THEOREM
Corollary
as
4.
Recall
4 of Nakayama
6 ~
O if either
r, = r
b, > bO, and 7, – 2b0 <r,
7, — 2b0 <rL – 2b, implies
RE
that
[1991]
= @( 6-’12 ). First
establishes
that
suppose
CZ holds.
remains
bounded
RE,\RE
r, > r,
or b, = bo. Hence,
we may assume
that
– 2b,, and so CT,z= @(e F’-2b” ) by Eq. (17). Also,
that 7, – bO = ~, – 2b0 + bO < ~, – 2b, + bO < r,
– b, since b, > bO, and so dly = @(eFI-b~ ). Furthermore,
r-, > r implies
= @(c-’/2).
Thus,
RE,
~, = r byEq. (8), and so RE, = @( Etr-2bO)/2-fr-bOl)
remains
Now
bounded
suppose
7, —2b O>r,
as ~ ~ O.
that
Ci does
— 2 b,. It then
not
follows
hold,
i.e.,
assume
r, > r,
~, = r by Eq. (8) and
that
that
/RE
and
b, > bo,
a,z = @( e’-
‘b)
r, – b, < F, – b. = r – bo. Then
by Eq. (17). First
suppose
d,y=
@(~rJ-~L),
and so RE~ = @(~~rt–Zb,J/Z–(r,–~,) ) = @(~-rL ‘2). Hence, since r, > r, we have
that
r, – b, > F, – bO = r – bo. Then
RE,/RE
- ~ as ● ~ O. Now suppose
d*y=@(e’-~’
(r – bo)
),
RE,/RE
that
We now
Our
RE, = @( E[rI-2b~/2-(
– (r – bO) = –r/2
and
< (F, – 2bO)/2
first
cc as ● +
prove
lemma
10.
that
(r,
–
7, = r. Thus,
we
Zb,)iz
_
again
see
•1
Theorem
5 by first
establishing
the
of the
[1991]
Note
since
describes
D,; see Nakayama
LEMMA
O.
’-b Ol).
forms
some
summands
preliminary
in the
results.
expression
for
for the proof.
Suppose
y) ● I’
(x,
x ● U. Then
with
for
all
e sufficiently
small,
(i)
~,~($,
Y)/~(x,
pendent
of
31P(x,
Y)/P(x,
(ii)
e, ifeitherx
(iii)
c7,P(x,
pendent
The next
LEMMA
there
(i)
(ii)
y)
ll,(xo,
= P(x,
+ O(E-bZ),
x,
i)
where
p(x,
y)
where
# O andy
Conside~
bounds
( x ~,. ...
~ which
sufficiently
. . . . x,)
=
+ O is
inde-
> O;
+ o(l),
y)
< xorx
establishes
a constant
E >0
(xO,...,
y)E-bL
p(y;
p(x,
>x
y) <0 is independent
with p(y;
x, i) = O;
y)/P(x,
y) = p(x, y)e-~”
+ O(~-~O), where p(x,
of c, ifx = O andy
>0 with p(y; O, i) = O.
11.
Dl(xo,
= P(X,
> x with
# O andy
lemma
exists
for all
Y)
e, ify
y) <0
of
is inde-
on D,.
XJlm,
m,
where
is independent
n > 0
and
of ( XO, . . . . x.)
m > r.
and
E such
Then
that
small,
@(E-b)
and
Ill,
(xo,
. . . , x~)l
<
(m
+
l)@b,
if
@(E-h”)
and
ID, (xO, . . . , Xn)l < (m
+
l)~~-b[l
if
G A’~;
x.)
. . . . Xn~=
(XO,. ... X~)G~’~.
PROOF.
Nakayama
other
results,
The
first
[1991].
note
part
of
Similarly,
(i)
was
we
established
can
prove
in
the
the
first
proof
part
of
n–l
xn)l
<
&
c7tP(xk,
P(xh,
on
To
show
13
of
the
Xk+l)
x,+,)
by the triangle
inequality.
First
suppose that ( XO, . . . . x.)
path (xO, ..., x.) has at least one transition
that satisfies
TransactIons
Lemma
that
lDL(xo,...,
ACM
of
(ii).
Modeling
and
Computer
Slmulatlon,
Vol
4, No
1, January
= A’n. Thus,
the
conditions
1994
the
of
Simple Failure-Biasing
Lemma
10(i).
Lemma
10 then
(x,
y) ~ r
Since
b, > bo, we have
implies
with
that
that
Method
.
and
1 = o(~-~).
e-~” = O(e-bL)
d, P( x, y)/P(
85
x, y) = 0( c-b I) for all transitions
x = U. Now define
p, = max{lp(x,
y)l:(x,
y) = r}.
small c > 0. It follows
2/3,6 - bI for all sufficiently
for all sufficiently
small e,
Then
from
16’,
P(X, y)/P(x, y)l <
Lemma
9(i) that
n–1
Xn)l< ~
ID1(XO,...,
E‘b, = 2nP,~–bL
2p.
< 2(m
+ l) Np,
e–bl,
k=O
thereby
Now
proving
suppose
(i), where @ = 2Np,.
that
(xo,...
, x.) e ~~.
x,) satisfies
either
(%),...,
since 1 = O(~–b O), we have
all sufficiently
small
condition
(ii)
that
ldl P(xh,
e > 0. Then
Then
each
of
the
transitions
of
or (iii) of Lemma
10. Consequently,
xk+l)/F’(xk,
xk+l )1 s 2p, e-~o for
Lemma
9(i) implies
that
n–1
ID, (XO,...,
Xn)l <
~
2p,
e–~0 = 2np,
~–b0 < 2(m
+ l)Np,
E–bO
k=O
for all sufficiently
Now
small
~, which
we are in a position
PROOF
bo.
Then
that
to
OF THEOREM
Eq.
(16)
5.
our
to prove
Assume
implies
establish
completes
that
result,
d,y
Theorem
Ci’
holds.
=
@(er-b’).
we
must
❑
the proof.
5.
First
suppose
Thus,
show
it
that
is
that
r,
clear
from
13’[ l{TF
–
<
b,
s
Eq.
To}~~~21
7, –
(18)
=
o(E2(r’-b’)).
Recall
70}D~L].
(xO,...,
our decomposition
in Eq. (19). First
Consider
any path (xo, . . . . x.) that
Xn) G A’~ for some
m > r,, and so Lemmas
D12(X0, . . .. XJL(XO . . . . . xn)P{(xo,
Now
consider
Ci’ implies
that
~(xo,...,
● sufficiently
with
E[ 1{~1 < r~ <
~, < ~~ < To. Thus,
9 and
. . ..x.F)
=
11 imply
that
(Xo,.. ..xn)}
(x.,...,
x~) ~ AC~, where rl s m < 2r, – 1. Condition
Xn) > .2r, – m + 6., and so it follows from Eq. (27)
. . . . XJP{(XO,
. . ..xn)zixo
Iy’(xo,
for all
a path
that
we work
satisfies
small.
. . .. X.F)=G
CO,...,
x,)}
Consequently,
2r, – 1
D:(xo,
Ex
m=r,
(x “,
. . ..xn)L(xo.
XP{(XO, . . .. X.F)=(XO,
ACM
xn), xn)
... xn)=li;
n>O
Transactions
on Modeling
and
..., Xn)} = o(E2’’-2)’)
Computer
Simulation,
Vol.
4, No.
1, January
1994.
Marvin K, Nakayama
86
.
since
IA ~ I < ~ for
employed
all
to establish
m by
Lemma
9. Using
Eq. (26), we can show
arguments
similar
to those
that
CX.
Dt2(xo, . . ..xn)L(xo . . . . ..xn)
Zx
m=2rL
(-to,..
.xn)=ALm
n>O
XP{(XO,
using
Lemmas
Similarly,
9 and
...,
. .. XnXn)} = o(E2’’-2)’)
Ci’
that
11. Thus,
we can show
Therefore,
xTF)=(xo,
if r, – b, <7,
that
implies
– bo, then
E[l{I-~
< To}D~L]
= 0(e2rL-2~L)
from
Eqs.
(28) and (29). Hence, we have proven that Ci’ ensures
RE~ remains
bounded
as ~ ~ O when r, – bl < F, – b.. Also, we can establish
using similar
arguments that C z‘ implies
that
71 — bo.
By appropriately
modifying
if
does
CO’
conditions
not
hold,
PROOF
Theorem
OF THEOREM
b. = 1. Thus,
6.
prove
bounded
as ~ ~
used to show
that
RE,’
*
that
~ as
O when
RE’
c ~
r, – b, >
+ cc as e +
O if any
O
of the
❑
6.
Since
s(x)
remains
the proof
we can
of C i‘ is violated.
We now prove
and
RE,’
the
= 1 for all
system
is balanced,
x = S. Now
b,
consider
=
1 for
all
any path
1 s
i s
C
(x O,... , x,)
● Am. Note that (xo,.. ., x,) has exactly m + 1 failure transitions (including
the initial
transition),
and so f( X., . . . . x.) = m + 1.To verify that Ci’ holds,
first suppose that r, – b, <7, – 6., and so r, <7,. If (xO, ..., x.) = A[~, then
wemusthave
2r, – m + 1
that
f(xO,...,
)=m+l>2r,
similarly
show
If
r,, andsof(xo,
. . ..x~)=m+
(xO, . . . , Xn) G ~n,
then
rm+3b0b2b,
that
We now prove
PROOF
m>
holds.
C i‘ holds
Theorem
OF THEOREM
l>2r,
–m+b
O=
m > FL > r,,
and
So
=2rlrm+l+l
holds.
we
can
❑
if r, – b, > F, – b..
7.
7.
Suppose
that
CO’
does
not
hold.
Then
there
exists
some (y O,. ... yh)=A~
with r<m<2r–l
such that f(yo, . . ..yk)<2r–
m + bO. Now there must exist some component
type i such that (y., . . . . yk
G A’m , and so A1~ # ~. It then follows
that r, s m.
r, – b, <7,
First
suppose
Eq. (8), we have that
r, > r. Thus,
– bO. From
that
r, < m < 2r, — 1. Also,
we
have
that
rL < m and m < 2r — 1 imply
f(yo,...,
Now
yk) < Zr–m
+ bo SZr,
–m
suppose
r, – b, > F, – bo. Note
+ bo, andso
Ci’ does not hold.
that Eq. (8) implies
that
r, > r and
2b, –2bo
– 1> 2r–
1 since
27, + 2b1 –2bo
– 1> 2r+
f(yo,
. . ..yk)<2m+m+
r,<m<27,
+2 b, —2bo —1. Also,
+ 2bt
– b.
Now
ACM
since
we prove
Transactions
b, > bo. Therefore,
Theorem
on Modeling
Ci’
does not hold.
b, >bO.
bO<27t
❑
8.
and
Computer
Slmulatlon,
Vol.
4, No
1, January
1994
Hence,
—m
)
Simple Failure-Biasing
PROOF
OF
THEOREM
7L2r=rL,
we
Xn)
(xO,...,
2r
8.
have
G Azm with
– 1. AIso,
Assume
that
CO’ and
ri
r,
<
–
CO’
bi
n-t <
<
Zr,
r, = r imply
holds.
F, –
b.
— 1. since
that
Now
suppose
rL – b, >7,
O>2ri–
+ 2bi
bounded
as
standard
simulation
the performance
simple
failure
Example
we
have
> 2r
is used,
r.
Since
that
any
r
m <
<
– m + bO = 2r,
–
with
7, <m
<2r,
– 2bi + 2b0
2r, – 2bZ + 2b0 – 1< 2r – 1.
CO’ gives
and
that
ri > r, and
= A’~ with
us that
b,>bo.
so Fi = r and
r, < m <27,
– l, and so r < m < 2r
it
is possible
diverges
as
that
c +
+ 2b1 –
– l. Hence,
the
same
relative
RE,/RE
O. In
we may be able to estimate
with
=
consider
– m + bO = 27, – m – bO + 2b,.
that
RE, ‘/RE’
measure
5.
).)
r,
Now
87
Now
❑
shows
but
biasing
> 2r
holds.
example
e ~ ~,
r,
.
with
7, < m s 27, — 1. Note that
r < m <
implies
that
f(xo,...,
).) > 2r – m + bO =
CO’
27, – m + bO, and so Ci’
following
).)
– 2b0 – 1 = 2r
F, = r. Thus,
The
assume
any(xo,...,
CO’ implies
that
~(xo,...,
x,)
consider
any (xO, . . . . x.) E ~~
2r – 1 since
=
b..
m+3b0–2bzsincert=r
bi = bO. We may
– bO. Consider
2b0 – 1. Then,2F,
r,
suppose
>
r < m s 2 r – 1. Furthermore,
that
that
b,
f(xo,...,
m + bO. Now consider
any (xO, ..., x~) e ~~
from
r, = r and bi > bO that
– 1. It follows
Thus,
7, > r implies
x~)>2r–m+b
f(xo,...,
Therefore,
C i‘ holds.
First
as
Method
other
remains
words,
if
some derivative
accuracy,
but
and
we cannot
if
is used.
Consider
a system
which
has three
types
of components
(i.e.,
C = 3), where each of the first two types has redundancy
four (i.e., nl = ?zZ =
4), and the third type has redundancy
three (i.e., n~ = 3). The components
of
type
1 and
2 have
type 3 have
repairperson
failure
failure
rate
who repairs
● (i.e.,
rate
bl = bz = 1), and
the
components
discipline.
It is then sufficient
to define the state of the system
x,, x,), where xl is the number
of failed components
of type
components
are operational,
only if all of the components
1 fails
in state
components
and the
system
of all types
(3, O, O), it causes
of type
of
bO = 1. There
is a single
e 2 (i.e., b~ = 2). Thus
components
at rate 1 using
a processor-sharing
3 to fail,
all four
i.e.,
is considered
are failed.
When
components
P((4,
4, 3);
to be x = ( xl,
i. Initially,
all
to be failed
a component
of type
(3,
O, O),
if and
of type
2 and all three
1) = 1. When
a
component
of type 2 fails in state (O, 3, O), it causes all four components
of
type 1 and all three components
of type 3 to fail, i.e., P((4, 4, 3); (0, 3, 0),
1) = 1. When
a component
of type
3 fails
in state
(O, O, 2), it causes
all of the
components
of types 1 and 2 to fail, i.e., P((4, 4, 3); (O, O, 2), 1) = 1.
It is easy to see that y = @(~3), and so r = 2. Also, rl = 3, rz = 3, r3 = 5,
F1 = 3, F2 = 3, and 73 = 3. We can show that CO’ holds. Also, we have that
i3 — 2 = 1 s r3 – 2b3 = 1, and
considering
show that
so C3 holds.
the path ((O, O, O), (O, O, 1),
C3’ does not hold.
(Also,
CS3
(O, O, 2),
is not
(4, 4, 3))
in force.)
● Al,
BY
we can
ACKNOWLEDGMENTS
The author would like to thank Randy Nelson
this problem
and for giving helpful
comments
ACM
Transactions
on Modeling
and
Computer
for initially
suggesting
work on
on a preliminary
draft of this
Simulation,
Vol.
4, No.
1, January
1994
88
Marvin K. Nakayama
.
article.
Also,
useful
tions
gratitude
comments.
that
is expressed
Finally,
improved
to Perwez
the editor
Shahabuddin
and two referees
for
providing
gave numerous
sugges-
the article.
REFERENCES
CARRASCO, J. A,
In
1992.
Proceedings
Computer
5th
Performance
CARRASCO, J. A.
Proceedings
CONWAY,
Failure
of the
Ezzaluatlon.
1991.
Efficient
A.
E.,
AND
Computmg.
based
Elsewer
of stochastic
GLYNN, P. W.
Science
GLYNN,
P. W,,
Manage.
A.
models.
In
1987.
Monte
Proceedings
IEEE
algorithms.
A.,
Stochastic
Conference
AND IGLEHART,
D.
S. S.
SHAHABUDDIN,
Comput.
JUNEJA,
P.,
for
L.
models
with
of
In
152-161.
computer
system
Symposium
on Fault
Large
New
deviations
C’ontr. AC-28,
Carlo
York,
and rare
events
in the
907–920,
optimization.
of
In proceeduzgs
356–364.
Importance
sampling
for
stochastic
simulations,
1987
Modeling
and analysis
of computer
system
availability.
HEIDELBERGER,
P.,
E. E.,
Eng.
J. Watson
1964,
Fast
In
1984,
1991.
Monte
AND GLYNN,
P. W.
dependable
Methods.
Carlo
simulation
of Markovian
of the 22nd
proceedings
Monte
Asymptotic
Carlo
simulation
Research
of likelihood
systems.
1992.
A
IEEE
systems.
Metheun,
London,
reliability/availability
International
of Markov
Yorktown
ratio
N Y
1990,
of highly
Research
dependable
Center,
of queues
IEEE
1989
Trans.
M. I., AND WEISS, A.
systems.
Yorktown
J.
Symposium
unreliability
estimators
quick
Automat.
N.Y,
Sensitivity
Sensitivity
analysls
ratio
To appear
simulation
Contr.
1989.
Likelihood
IBM
Res. Rep. RC 15400,
Heights,
A
derivative
of Res, Rep. RC 76637,
Heightsj
models
Center,
Revision
M. K., GOYAL, A., AND GLYNN, P, W.
S,, AND WALRANL),
REIMAN,
1992,
policies.
F.
Markovian
T J. Watson
networks
P.
repan-
AND BOHM,
Markovian
PARERH,
F,,
of highly
on
models.
Des. 77, 49-62,
reliable
NARAYAMA,
V,
models
Computmg.
NAK.AYAMA, M. K.
highly
NICOLA,
Markovian
C-41, 36-51.
general
Tolerant
LEWIS,
sensitivity
IBM
for
of
Division,
T.
analysis
Research
in Oper.
method
AC-34,
in simulations
Research
for
Division,
Res.
excessive
backlogs
in
54–66.
analysis
for simulations
via
likelihood
ratios,
Res. 37, 830-844.
RUBINSTEIN,
R.
simulation
Y.
1989.
models.
SHAHABUDDIN,
P.
Manage.
SHAHABUDDIN,
P.
Operations
SHAHABUDDIN,
derivatives
P.,
P.,
reduction
Szmulat~on
TransactIons
performance
for the
simulation
extrapolation
for
computer
Importance
sampling
of highly
reliable
Markovian
Scz. To be published.
1990.
Simulation
AND
NAKAYAMA,
highly
V,
mean
1993;
analysis
F.,
time
revised
K,
1992.
and
to
failure
New
York,
1993;
Computer
reliable
systems.
Ph.D.
thesis,
Dept.
Calif.
Fast
systems.
HEIDELBERGER,
June
of highly
, Stanford,
Markovlan
IEEE,
cm Modeling
Univ
M.
reliable
NICOLA,
m
and
Stanford
Conference.
February
and
Res. 37, 72–81.
Research,
in
SHAHABUDDIN,
Oper.
1994.
systems.
ACM
models.
New York,
simulation
for Monte
IEEE,
1989
simulating
S., AND SHAHABUDDIN,
Fault
Received
1983.
G
approximation
Simulation
framework
Trans.
ance
351-365.
Markovian
of the 17th International
Trans. Automat.
HAMMERSLE~-, J, M., AND HANDSCOMB) D, C.
of
systems.
Tools for
Sci. 35, 1367-1393.
unified
Oper.
Amsterdam,
Systems. IEEE,
Carlo
tolerant
J. Res Deu. .?1, 651-664,
GOYAL,
Nut.
B.V.,
of failure/repair
on Ftellable D~stributed
GOYAL,
GOYAL, A., AND LAVENBERG,
IBM
Publishers
fault
Techzuques and
230-235.
1986.
the 1986 Winter
of repamable
on i!lodellmg
simulation
COTTRELL, M., FORT, J. C., AND MALGOUYRES,
study
simulation
Conference
transition
of the 10th Syntposlum
availability/reliability
Tolerant
distance
International
simulation
of transient
Working
P.,
GOYAL,
simulations.
measures
A.,
In
AND GLYNN,
Proceedings
P. W.
1988.
of the
1988
491-499.
accepted
Simulation,
August
Vol
and
their
draft.
1993
4, No.
1, January
1994
VariWinter