Redundancy, Degeneracy and Resilience
why degeneracy rather than redundancy is the key to preserving system resilience
Any system with limited reserve capacity exposes itself to the risk of catastrophic failure. An obvious solution is for critical systems and functions to carry significant slack and redundancy.
Reserve capacity is expensive
However, naive redundancy (unused capacity, large quantities of inventory etc.) is rare as it is usually expensive. Critical system functions also tend to use a significant proportion of the system’s resources. For example, maintaining significant excess capacity in healthcare is expensive as healthcare expenditure is already a substantial component of total expenditure. The same holds for components in biological systems such as the human brain, which consumes 20% of our body’s energy.
If the system is even moderately competitive or selective, any unused excess capacity rarely survives a period of stability. A striking example of this is the effect of adaptation to a subterranean environment on the visual system of the blind mole rat. The blind mole rat saves only 2% of its total energy expenditure by giving up most of its visual system. Nevertheless, it does so because even this saving represents a meaningful competitive advantage1.
Redundancy comes with a sting in the tail
But what if a system is critical and only uses a small proportion of the system’s resources? Even when reserve capacity and redundancy are cheap, they come with a sting in the tail. Redundancy masks deterioration within the system and allows undetected, latent failures to build up.
For example, the pancreas in the human body has a 10x safety factor in its primary function of enzyme secretion and “malabsorption, due to decreased absorption of ingested food by pancreatic proteases and lipases, is not observed until pancreatic enzyme output has dropped to only 10% of normal peak values”2. Yet this is precisely the reason why pancreatic cancer is so difficult to detect. By the time we can detect the symptoms of malabsorption, pancreatic cancer has typically metastasised to other organs. Paradoxically, the resilience of the component increases the risk of undetected catastrophic system failure, i.e. micro-resilience leads to macro-fragility.
A similar example in engineered systems is the phenomenon that the safety researcher Jens Rasmussen coined as the ‘fallacy of defence-in-depth’3.
In any well designed work system, numerous precautions are taken to protect the actors against occupational risk and the system against major accidents, using a ‘defence-in-depth’ design strategy. One basic problem is that in such a system having functionally redundant protective defenses, a local violation of one of the defenses has no immediate, visible effect and then may not be observed in action. In this situation, the boundary of safe behaviour of one particular actor depends on the possible violation of defenses by other actors. Therefore, in systems designed according to the defence-in-depth strategy, the defenses are likely to degenerate systematically through time, when pressure toward cost-effectiveness is dominating. Correspondingly, it is often concluded by accident investigations that the particular accident was actually waiting for its release.
In other words, redundancy renders local failures invisible and often leads to a buildup of local failures that culminates in a catastrophic and systemic collapse. It is precisely this phenomenon that resulted in the Therac-25 disaster, where a redundant hardware lock masked significant errors in the software. The redundancy allowed latent errors to persist. These errors were only exposed when removing the hardware lock led to a catastrophic disaster.
In engineered systems, all such violations must be exposed, and system operators should be made aware of such violations. But as Rasmussen identified, system operators will inevitably ignore failures with no functional consequences. For example, Therac-25 operators became insensitive to malfunction and error messages as they were common and cryptic4.
Degeneracy is the key, not naive redundancy
So if pure redundancy and slack are too expensive and expose the system to the risk of catastrophic failure due to the buildup of latent failures, how can systems be resilient? Biological systems achieve such a state through degeneracy, “the existence of multi-functional components with partially overlapping functions”5.
As Mason et al. argue6, degeneracy is a scientific concept that unfortunately has a very different meaning in the lay discourse (that of decay). The origin of the term lies in quantum physics, where it refers to “a situation in which different measurable states correspond to the same energy level”. The term was initially introduced into biology to explain how “different nucleotide sequences in DNA could code for the same amino acid”.
The essence of degenerate systems is two-fold:
System components tend to be multi-functional, i.e. one component can perform multiple functions.
Each essential function in the system can be performed by multiple components or, more commonly, multiple configurations of components.
In their seminal paper7, Edelman and Gally provide a vivid example of this pattern :
Consider the arm movement of a monkey that wishes to brush away a fly that has landed on its nose. How many different patterns of muscle contractions might it use to accomplish that task? There are so many different degenerate patterns of neuromuscular activity that could accomplish that same task…
It is difficult to over-emphasise the importance of degeneracy in biological systems. Degeneracy is a “ubiquitous property of biological systems at all levels of organisation” and a “prerequisite for and an inescapable product of the process of natural selection itself”. Although degeneracy is an emergent pattern in many other domains (biological, ecological, social and economic), engineered systems “almost never intentionally incorporate degeneracy of the kind found in biological systems, except as a side effect of designs that are not optimal”8.
A defining characteristic of degeneracy is components with partially overlapping and context-dependent functions. This kind of degeneracy is a fundamental feature of language9. Different words are rarely synonymous, i.e. pure redundancy is rare. Instead, most seemingly interchangeable words are degenerate, “different structures with overlapping, context-dependent function(s)”.
We can find even better examples of degeneracy in team sports. For example, here’s a scene in the movie Moneyball where Billy Bean argues for replacing one of their departed stars (Jason Giambi) by “recreating the aggregate”. Players that possess partially overlapping skillsets allow managers to assemble teams that are resilient to injury and loss of talent whilst minimising excess resources.
A simple example of this is the adaptation of a football/soccer team to an injury. For example, an injury to a left-winger may necessitate more than just bringing on another left-winger. Maybe the team doesn’t have an equally good attacking winger on the left side but instead brings on an attacking left-back and compensates for the defensive deficit by putting a defensive-minded midfielder on the left side of midfield. Again, degenerate systems can “recreate the aggregate” even though they rarely possess like-for-like replacements for every position.
Degeneracy enables systems to operate in a near-efficient yet resilient manner through overlap rather than duplication. There is convincing evidence10 that “reliable designs are necessarily tied to degeneracy” rather than redundancy. One of the best studies of the characteristics that make this possible in social systems comes from an analysis of how flight operations at U.S. Navy aircraft carriers operate at the “edge of the envelope”11, preserving reliability at the maximum possible efficiency. Again, degeneracy enables this - “the personnel's cross-familiarity with each other's jobs” allows the carrier to use “existing units with other primary tasks as backups”. As “most of the officers and a fair proportion of senior enlisted men are familiar with several tasks other than the ones they normally perform and could execute them in an emergency”, the system is resilient without having dedicated backup resources for every function. However, this strategy does come at a price: the “higher demands on the training and capability of individuals”.
Degeneracy is efficient and resilient
So how exactly is degeneracy more efficient than simple redundancy? As Whitacre and Bender show12, “excess resources related to a single function can indirectly support multiple unrelated functions within a degenerate system”. We can repair the effects of a component failure in one function even if the excess resources are present in an unrelated part. For example, a football team may only have an extra right-back at hand. However, it may still be able to deal with an injury to a left-winger by, for example, moving one of the right backs to centre-half, moving the left-sided centre-half to left-back and moving the left-back to the left-wing.
Degenerate systems possess the ability to mount a systemic and distributed response. Minimal excess resources can enable “huge reconfiguration opportunities at the system level”. If changes in the environment demand a large-scale reconfiguration of the system, then degenerate systems can meet this challenge. In contrast, purely redundant systems that require the same level of excess resources are much more restricted in their ability to respond to disturbances. In fact, “purely degenerate systems are more robust to perturbations in environmental conditions than are purely redundant ones, with the difference becoming larger as the systems are subjected to increasingly larger perturbations”. The systemic reconfiguration triggered by a stressful event in degenerate systems also ensures that component failure is unlikely to remain latent or unnoticed.
Degeneracy and the NHS Covid response
Looking at the response of the NHS in the United Kingdom to Covid-19, the functional response to divert resources from other non-essential activities is difficult to fault and is consistent with a resilient response. The Nightingale hospitals show that the NHS can overcome a shortage of beds and hospital infrastructure. However, we never answered a much more important question - would the NHS have had enough personnel to deal with a more significant outbreak? The NHS has long been criticised for having an overspecialised workforce with too many specialists and insufficient generalist expertise. To some extent, this is a problem common to healthcare systems worldwide. Increased specialisation is common across domains in the modern economic system.
In any domain, healthcare being no exception, a resilient system requires a minimum level of functional overlap and a training protocol that promotes this overlap. This is not to say that everyone should be a jack of all trades rather than a specialist. However, specialists need to possess limited secondary competencies in other areas that the system can utilise in times of stress. A small amount of functional plasticity goes a long way in preserving system resilience.
However, even degeneracy is not sufficient if regulations and licensing requirements prevent firms and individuals in closely related areas from stepping in and augmenting the supply of goods and services in times of stress (e.g. the inability of new providers to ramp up production of PPE in the United States due to regulatory hurdles).
Achieving a state of resilience does not imply that we need to sacrifice efficiency. On the contrary, the challenge is to be resilient with a near-optimal configuration of resources. Degeneracy is an essential prerequisite of such a system.