WOPR11 was held in San Diego, California on October 23-25, 2008, and was hosted by Intuit. Jude McQuaid and Mike Pearl were the Content Owners.
Attendees
Ross Collard, Dan Downing, Andy Hohenner, Paul Holland, Will Hurley, Curtis Johnson, Jude McQuaid, Ted Neher, Michael Pearl, Jeff Pickett, Eric Proegler, Greg Schmitz, Roland Stens, Nick Wolfe
Theme: Reliability… what can we do?
People involved in developing, testing, and delivering hardware, software, or internet based applications must be able to ensure those solutions meet customer and user expectations… The question that follows – Exactly what are those expectations and how do you ensure they are being met?
WOPR11 will explore the topic of reliability with seasoned professionals, including architects, designers and performance and reliability testers.
The IEEE defines reliability as “The ability of a system or component to perform its required functions under stated conditions for a specified period of time.” Mean Time Between Failure is often described as a measure of reliability. The next question that follows – How is that measure to be used or applied, such that it has meaning?
In the space of computing, ‘reliable’ is an adjective often with ambiguous meaning, because the term can be applied in several ways. Nearly every driver in the world has a sense of whether their car is reliable. Factors such as how the motor sounds, the warning-light panel, how well it drove the last trip, even the smell all contribute to the sense of reliability. However, what metrics can be applied to understand just how reliable a car is? More importantly, is it possible to identify metrics applicable to the computing space? We’re looking for people with experience and passion to contribute papers & experience reports based on actual first person experiences that address one or more of the following aspects of reliability…
Requirements – How are they stated? What makes a good reliability requirement?
Modeling – Do you create mathematical models that try to predict reliability? How accurate are the models?
Risk Management – How reliable does a given system need to be? How to identify risks?
Testing – How do we go about testing the reliability of a system?
1.Verification vs. Validation
2.Field Experience – What happened post-ship?
Overall Design – Are there design or test patterns that dramatically improve (or drastically lower!) reliability?
Reliability Engineering in the Real World – how is it applied? Are measures like MTBF and MTTR useful outside of marketing literature? Are there useful alternatives?