Premature Overoptimization

One of the common mistakes I’ve noticed in the engineering is to overoptimize a subcomponent of a system at the expense of the system as a whole. I saw this a lot back in my manufacturing engineering days. You’d get someone working on a specific machine or process, who doesn’t have a good grasp of the overall problem, and he’ll tune his process to some metric (throughput for instance) that he thinks is important, and it will end up being completely out of sync with the rest of the production system. The end result is that even though his one process may be very “efficient” by some metric, the overall production system has a large increase in required inventory, produces more waste, has more issue with reacting to demand, etc, etc. Basically, one of the truisms in engineering is that overoptimization of a part often makes the system suboptimal.

A recent thread at NASASpaceflight.com brought home an excellent example of that. In this thread, Ross Tierney (of DIRECT fame) brought up the following graphic from the Exploration Systems Architecture Study:


Basically, this chart from Chapter-8 shows the main failures that could lead to killing a crew during a lunar mission, and their relative probabilities. The first thing that jumped out at me when looking at this chart was that fatal launch accidents are predicted to be only a tiny contributor to loss of crew in a lunar mission. By comparison, the Trans Earth Injection burn is predicted to be over 5x more likely to cost the loss of a crew. Another thing that stuck out is that if Ares-I really only has a 1/2106 (or whatever the 4-significant-figures odds were predicted to be) of killing a crew, and that is only 3% of the overall odds of losing a crew on a lunar mission, that means that the odds of losing a crew on a lunar mission are about 1-in-60 chance of losing a crew (this number is backed-up by the way in I think it was Figure 8-2 from the ESAS report). Over a decade of flights, that gives a 25% chance of losing a crew.

Think about that.

But that brings me to the two points I wanted to make with this thread.

First off, if the relative proportions here are even close to right, launcher reliability is really not super important to overall crew safety on a lunar mission. It’s kind of counterintuitive, but increasing the odds of a fatal launch accident by a factor of 2 only increases the odds of a fatal lunar mission from 1.6% to 1.7%, which is probably round-off error for a study like this. Most of the EELV options, DIRECT, and even a 2-launch Ares V architecture all supposedly fall within or near this category (and most of them probably do better if you actually work with the EELV manufacturers to figure out how to close the blackout zones). So, the vaunted Ares-I reliability, if it turns out to be anywhere near as good as claimed, only changes the odds of a lunar mission resulting in a dead crew from about 1-in-58 to about 1-in-63. With those numbers in mind, does spending an extra $10B+ on the launcher development, and an extra ~$1B a year from here on out really justify that modest of a safety increase? Once again, for its primary mission (launching crews for interplanetary missions), Ares-I only makes the overall system 6% safer, not 100% safer. Is that extra 6% really worth $10B? Or could that $10B be better spent elsewhere? Could it go to finding better ways of reducing the odds of a TEI failure or of a Lunar Ascent rendezvous failure?

The other major point I wanted to bring up has to do with the Zero Baseline Vehicle work that NASA and LM are doing for Orion. Basically due to Ares-I not really having any room for growth, they’re having to trim back backup systems and Orion capabilities. While some of those capabilities probably don’t affect the LOC numbers much at all, others of them do. Those LOC numbers shown above were based on two-fault-tolerant RCS and controls. But for the ZBV version of Orion, these are being pared back to single-fault-tolerant. Now, some of those systems may be added back over the next couple of weeks, but if any of them get left out, that’s going to affect the LOC numbers for the three largest LOC events: earth reentry, the TEI burn, and lunar ascent docking. Basically, if Orion loses during any of those phases, it increases the odds of losing the whole mission. Since combined those three failure modes are considered nearly 20x more likely than a launch accident, it only takes a tiny increase in the probability of any of those failures occurring to completely swamp any supposed gains from going with Ares-I as the launch vehicle.

This may end up being a classic case of premature overoptimization leading to a less safe system overall. Going with a slightly less reliable launcher like DIRECT, EELV-Heavy, or Atlas V Phase 2, would likely save money, save time, and make the overall architecture more capable and safer. So, why are we spending so much time and money giving our astronauts (and the American people as a whole) the “Shaft”?

The following two tabs change content below.
Jonathan Goff

Jonathan Goff

President/CEO at Altius Space Machines
Jonathan Goff is a space technologist, inventor, and serial space entrepreneur who created the Selenian Boondocks blog. Jon was a co-founder of Masten Space Systems, and is the founder and CEO of Altius Space Machines, a space robotics startup in Broomfield, CO. His family includes his wife, Tiffany, and five boys: Jarom (deceased), Jonathan, James, Peter, and Andrew. Jon has a BS in Manufacturing Engineering (1999) and an MS in Mechanical Engineering (2007) from Brigham Young University, and served an LDS proselytizing mission in Olongapo, Philippines from 2000-2002.
Jonathan Goff

About Jonathan Goff

Jonathan Goff is a space technologist, inventor, and serial space entrepreneur who created the Selenian Boondocks blog. Jon was a co-founder of Masten Space Systems, and is the founder and CEO of Altius Space Machines, a space robotics startup in Broomfield, CO. His family includes his wife, Tiffany, and five boys: Jarom (deceased), Jonathan, James, Peter, and Andrew. Jon has a BS in Manufacturing Engineering (1999) and an MS in Mechanical Engineering (2007) from Brigham Young University, and served an LDS proselytizing mission in Olongapo, Philippines from 2000-2002.
This entry was posted in ESAS, Launch Vehicles, NASA, Space Development. Bookmark the permalink.

15 Responses to Premature Overoptimization

  1. kert says:

    work with the EELV manufacturers to figure out how to close the blackout zones
    a small nit to pick: there are no blackout zones, not for the planned CEV launch trajetories anyway.
    At last thats what people on the very same nasaspaceflight forums have repeatedly stated, and they were involved in the process of solving these. this was worked out in detail during the OSP program.

    Its another myth that just wont die.

  2. Jon Goff says:

    Kert,
    Sorry, my sarcasm was a little too subtle. 🙂 Of course both EELVs can close the so-called blackout zones without too much trouble. But figuring that out would require NASA to actually speak with the EELV manufacturers, and possibly admit that they (NASA) really don’t know what they’re doing when it comes to designing upper stages.

    🙂

    ~Jon

  3. Karl says:

    Why should any space vehicle that will have a lifetime launch total of somewhere around 100 launches (so I hear) have a failure rate of 1 in 2000? My suspicion is that NASA (especially in light of past failures) will trim corners some and increase the chances of failure substantially.

    For example, let’s say a launch failure of the Ares 1 costs $10 billion (including the loss of value for a trained crew, increases in insurance rates, and a two year gap in future launches with no substitute vehicles). And suppose NASA can save $10 million per launch at the cost of doubling the risk of failure. Then one would expect $500 million more in costs due to launch failure, but reduce direct launch costs by $1 billion for an overall savings of $500 million over the lifetime of the program.

    The point is that even in a perfectly rational safety program, one has good reasons for worsening the reliability of an excessively safe launch vehicle. And my take is that at a reliability of 1 in 2000, there’s probably a good chance that NASA will find corners to cut.

    Finally, at 100 total launches, it’s extremely unlikely that anyone can tell the difference between a failure rate of 1 in 200 and a failure rate of 1 in 2000. If NASA returns to the somewhat disfunctional safety approaches of yesteryear, it’s very likely that the failure rate will be around 1 in 100 no matter what vehicle is used because that is the threshhold at which the bureaucracy will have to respond to safety problems.

  4. Jon Goff says:

    Karl,
    Oh, I agree completely that the 1/2000 number is likely bogus. They’d have to fly 6000 times flawlessly in order to prove that number to the 95% confidence level. And if they get more than a dozen flights off, I’d be amazed.

    The big point I was trying to point out was that according to their own analysis, blowing tons of money on a launch vehicle whose only selling point is its theoretical safety is not only suboptimal, but ironically, less safe. Ie that by going with an existing booster, or a slightly upgraded booster, they could likely make the overall mission safer, and save money in the process. Cutting corners on Ares-I makes even less sense than building it as-planned though. There’s no way you could really cut enough money out by cutting-corners to allow you to improve any of the other areas of risk, you’re still going to be weight limited which will make the Orion design less safe…

    It’s just a stupid idea, even by their own analysis.

    ~Jon

  5. Mike Puckett says:

    It’s worse that that. The enhanced mass margin with the Direct family would allow a more robust Orion with a smaller chance of failing somewhere beyond earth orbit.

    This increased mass could lead to more redundancy and more robust (not shaved thin for every ounce) subsystems.

  6. Anonymous says:

    It makes a lot more sense if they actually don’t expect to ever kick off the Lunar program. That’s my best guess of what will actually happen with ESAS, and a lot of other cynics, too. Maybe someone in charge is guessing the same way. If it’s just Ares I and Orion to the ISS, a lot of those failure modes go away. What a painful and unhelpful way to get to the ISS though.

    Ray

  7. Anonymous says:

    karl said:

    For example, let’s say a launch failure of the Ares 1 costs $10 billion (including the loss of value for a trained crew, increases in insurance rates, and a two year gap in future launches with no substitute vehicles).

    Keep in mind that the government self insures. They don’t buy insurance because they know they can always get the taxpayers to pay when there’s a launch failure.

    I read a lot of complaints about the Ares I. Perhaps those complaints are misguided. The Ares I is designed to meet the real requirements (perpetuate NASA’s workforce and infrastructure) and it does so quite well. Now, if the requirements were to actually be a good launch vehicle, then the complaints are justified.

  8. Anonymous says:

    I think y’all are missing the point.

    One overemphasizes launcher reliability so that a mission failure doesn’t occur live, on television, with lots of fiery graphics.

    A mission failure due to lunar ascent won’t make it on broadcast TV.

    We can all agree that IF my assertion reflects NASA thinking, then they don’t understand YouTube.

    However, bureaucratic a$$ covering can explain the overemphasis on launcher reliability.

  9. Jon Goff says:

    Anonymous,
    I’m not really sure I buy that at all. Look how much attention was given to Apollo 13 (compared to the attention it would’ve gotten without an accident). You have astronauts slowly dieing in lunar orbit, and it will be all over the newscyles for days. In a way, a set of slow deaths like that might even be more traumatic.

    Plus that’s cynical as heck, playing PR games with astronauts lives…

    ~Jon

  10. Karl Hallowell says:

    Hmmm, I wonder how much more expensive a “loss of crew” (LOC) is than a “loss of mission” (LOM) where the mission is completely lost but you keep the crew. Economically, a crew member isn’t that expensive. But there will be considerably more political fallout from a LOC.

  11. Anonymous says:

    The government doesn’t self-insure “because they know they can always get the taxpayers to pay when there’s a launch failure.” It self-insures because the total world market for launch insurance is about $500 million, which would cover only a fraction of the loss in the event of either a manned mission failure or loss of a typical USAF or NRO payload.

  12. Sam says:

    They’re killing about 13,000 people in other parts of the economy by using tax money to make the launcher $10 billion safer vs. giving the money back to the people and letting people spend the money themselves to make their own lives safer. People on average act as if their own lives are worth about $7 million.

    If we had more advanced philosophy we could pay the astronauts directly for the higher risk. We’d probably still get plenty of volunteers to do it for nothing since the market value of a trip just around the Moon is $100 million.

  13. Karl Hallowell says:

    sam, that doesn’t make sense. Keep in mind that overall US GDP was over $13 trillion last year. That only buys 2 million lives by your reckoning. The healthcare system alone buys more than that. There’s no straightforward translation of money into human lives for large sums of money or large numbers of lives.

  14. Sam says:

    Good point. People generally treat their own lives as worth 7 million when making cost benefit tradeoffs for small risks, but obviously they don’t need to spend a full 7 million to mitigate all their risks down to an acceptable level. But what you are saying helps my case, Karl. If it costs only $100,000 to save a life by spending $10 billion on flu shot outreach or AEDs, then 100,000 people can be saved.

  15. historybuff says:

    Based on Shuttle experience, 1/2106 is a gross underestimate.

    BTW, what is wrong with a 25% chance of losing a crew? Are we such wimps that we can’t take chances any more? Back in the days of Columbus and Magellan the chances of losing sailors over multiple trips were close to 100%. Indeed, if you were a sailor your chances of returning from a voyage to the Spice Islands were only about 50%. Today’s astronauts are complete wimps by comparison. We won’t do much of anything useful on a new frontier by being such wimps.

Leave a Reply

Your email address will not be published. Required fields are marked *