Managing Complex Change

There is no doubt about it: Change is Difficult. None of us like it. We all prefer the comfort and security of sameness. We are wired that way. Yet, change is inevitable. So, what can we do?

One option is to ignore it. Another is to resist it. And another is to make the best of it. It is even possible to embrace and celebrate it!

And what if we are responsible for managing change that will have an effect on others? That is a whole order of magnitude more difficult. There is an oft quoted statistic that 70% of change initiatives fail – which proves that it must be more difficult than the originators anticipated. Yet, if we look around we can examples of successful change everywhere – so it must be possible to manage it. How is it done? What are the traps to avoid? What do we need? What don’t we need? Where do we start? Who can we ask for guidance?

If we search the Web or use an AI assistant like ChatGPT, we will discover a multitude of change models such as Kurt Lewin’s Unfreeze-Change-Refreeze model or John Kotter’s Eight Steps. And if we compare and contrast these recipes we will find common themes such as the importance of leadership, and vision and a clear plan. That all makes sense.

What is more difficult to find are root cause analyses of failed changes that we can learn from. No one likes to talk about their failures but we need to compare the successes and failures to find the nuggets of wisdom that we can learn from and use to reduce the risk of failure for our own change initiatives. Learn to fail or fail to learn.

And how do we know if we are on track? What are the early warning signs of an impending failure that we could use to get us back on track or give us the confidence to abandon the attempt before too much time, money, blood, sweat and tears are wasted?


These are questions that have been buzzing around for years and recently I chanced across something that caught my eye. It was diagram that I had not come across before.

Two things immediately struck me. The first was the explicit inclusion of “Skills’ in the recipe for success. That made sense to me. The second was the symptoms of what happens if an ingredient of the complex change recipe is missing. Those made sense to me too because I have experienced them all.

The diagram I found was not attributed so I did a bit of searching – using the five ingredients as a starter. What I discovered was fascinating – a sort of Chinese Whispers story with different names attached to emergent variants of the diagram. I persevered and eventually found the original source – Dr Mary Lippitt who created and copyrighted the diagram in 1987.


The next thing I did was float the Lippitt diagram with other people who are actively working in applying the science of improvement in the health care sector – and who are faced with the challenge of having to manage complex change. The Lippitt diagram resonated strongly with them too – which I saw as a good sign.

I then found Dr Mary Lippitt’s email address and emailed her, out of the blue. And she replied almost immediately, thanked me, and we arranged to have a Zoom chat. It was fascinating. What I learned was that her passion for complex change blossomed when she inherited her father Gordon’s consulting business. He, like his older brother Ronald, worked in the organisational change domain and he wrote a book entitled “Organization Renewal” whose second edition was published in 1982. And I discovered that Ronald Lippitt was a colleague of Kurt Lewin – the Father of Social Psychology. So, the pedigree of the diagram I came across by chance is impeccable!


Changing even a small part of a health care system is a tough sociotechnical challenge and I have learned the hard way that a combination of social and technical skills are required. Many of these skills appear to be missing in health care organisations and that skills gap leads to the commonest source of resistance to change that I see: Anxiety.

It also goes some way to explain why we made significant progress in delivering health care service improvements when we focussed on giving the front line staff

a) the necessary technical skills to diagnose the causes of their service issues, and

b) the skills to redesign their processes to release the improvements they wanted to see.

We now have good evidence that we also, unwittingly, developed the complementary social skills to help spread the word of what is possible and how to achieve it organically across teams, departments and organisations.

So, with her generous permission, we will be using Dr Mary Lippitt’s diagram to tell the story of how to manage complex change, and we will share what we learn as we go.

Errors, Mistakes and Failures

People who work in health care are risk averse. There is a good reason for this. Taking risks can lead to harmed and unhappy patients and that can lead to complaints and litigation.

This principle of “first do no harm” has a long history that dates back to the Greek Physician Hippocrates of Kos who was born about 2500 years ago. The Hippocratic school revolutionised medicine and established it a discipline and as a profession. And with the benefit of 2500 years of practice we now help, heal and do no harm.


Unintended harm is a form of failure and no one likes to feel that they have failed. We feel a sense of anxiety or even fear of failure that can become disabling to the point where we are not even prepared to try.


There are many words associated with failure and two that are commonly used are errors and mistakes. These are often conflated but they are not the same thing.

It is important to be aware that failure is an output or outcome, while errors and mistakes are part of the process that lead to the outcome.

To help illustrate the difference between errors and mistakes it is useful to consider how knowledge is created.

At the start no one knows anything and do not even know what they need to know. This is the zone of Unknown Unknowns. At the end of the knowledge creating process everyone knows everything they need to know. This is the zone of Known Knowns.

Between these end points is a process of learning and development. First we become aware of what we need to know, and then we focus growing that know-how. Finally we focus in how to apply and teach these known knowns. It is not necessary for everyone to discover everything for themselves. We can learn from each other.


We make an error when we do not use what is already a known known.

For example, I know how to use a calculator to add up a set of numbers but if I accidentally press a wrong button then I will fail to get the correct result. That is an avoidable failure. I made an error.

If I do not read the instructions of how to use the calculator, and I press the wrong buttons deliberately, in the erroneous belief that they are correct, then I will fail to get the correct result. I made an error.

There are two different types of error here. One is an error of commission which is when I do the wrong thing (i.e. I do press the wrong button). The other is an error of omission and is when I do not do the right thing (i.e. I do not read the instructions). Not learning and using what is already known has a name, it is called ineptitude. We could also call it the zone of Unknown Knowns.

One observable consequence of a failure that results from a well-intended-but-inept action is surprise, shock and denial. And we often repeat the same action in the hope that the intended outcome happens next time. When it doesn’t we feel a sense of confusion and frustration. That frustration can bubble out as anger and manifest as a behaviour called blame. This phenomenon also has a name, it is called hubris.

Errors of omission often cause more problems because they are harder to see. We are quite literally blind to them. To spot an error of omission we need to know what to look for that is missing, which means we need to know what is known.


Mistakes are different.

We make mistakes when we are working at the edge of our knowledge and we do something that partly achieves what we intend. We know we are on the right track and we know we do not yet have all the knowledge we need. This is the zone of Known Unknowns. It is the appropriate place for research, experimentation, development, learning and improvement. This is an iterative process of learning from both successes and mistakes.

Mistakes are expected in the learning zone and over time they become fewer and fewer as we learn from them and apply that learning. Eventually we know all that we need to achieve our intended outcome and we enter the zone of Known Knowns. This is how we all learned to walk and talk and to do most of the things that, with practice, become second nature.


In a safety-critical system like health care we cannot afford to make errors and we need to know which knowledge zone we are in. If we experience a failure we must be able to differentiate an error from a mistake. This is because what happens next will be different – especially if the outcome is a harmed, unhappy patient.

An error implies that the harm was avoidable because the know-how was available. In an error situation it becomes a question of ineptitude or negligence. If appropriate training is available and is not taken up, or taken up and not applied, then that is negligence. This is easily confused with ineptitude that results when appropriate training is not available or not effective.


One example of this is where new medical information systems are implemented and the users are not given sufficient time and offered effective training to use them correctly. Errors will inevitably result and patients may be harmed as a consequence. The error here is one of omission – i.e. omitting to ensure users are trained to use the new system correctly. At best it is an example of management ineptitude, at worst it is management negligence. It is a management system failure and is avoidable.


Thus far we have been making a tacit assumption that we have not stated – that all failures are bad and should be avoided. This is certainly true in the zone of Known Knowns, and is partially true in the zone of Known Unknowns. But what about in the zone of Unknown Unknowns? The zone of Discovery and Innovation.

Paradoxically, failures here are expected and welcomed because everything new that we try-and-fail is a piece of useful knowledge. The goal here is to fail fast so that we can try as many times as possible and learn what does-not-work as fast as possible . The Laws of Chance say that the more different attempts we make, the more likely one of them will succeed. Louis Pasteur said “Chance favours the prepared mind” and even random attempts are better than none. Rational attempts are even better. Thomas Edison, the famous inventor and developer of a practical incandescent lamp, failed many times before finding a design that worked.


So, when something does not turn out as intended or expected pause to ponder whether it was an avoidable error, an educational mistake, or just another stab at making a hole the dark cloak of ignorance and letting in some light.

And more importantly when something does turn out as expected then pause to ponder whether it was expected, desired or unexpected.

It is dangerous to believe that we can only learn by making mistakes. Whatever outcome we get, success or failure, is an opportunity to learn. And, we can always ask because it is likely that our unknown is known by someone.

Resilience

The rise in the use of the term “resilience” seems to mirror the sense of an accelerating pace of change. So, what does it mean? And is the meaning evolving over time?

One sense of the meaning implies a physical ability to handle stresses and shocks without breaking or failing. Flexible, robust and strong are synonyms; and opposites are rigid, fragile, and weak.

So, digging a bit deeper we know that strong implies an ability to withstand extreme stress while resilient implies the ability to withstanding variable stress. And the opposite of resilient is brittle because something can be both strong and brittle.

This is called passive resilience because it is an inherent property and cannot easily be changed. A ball is designed to be resilient – it will bounce back – and this inherent in the material and the structure. The implication of this is that to improve passive resilience we would need to remove and to replace with something better suited to the range of expected variation.

The concept of passive resilience applies to processes as well, and a common manifestation of a brittle process is one that has been designed using averages.

Processes imply flows. The flow into a process is called demand, while the flow out of the process is called activity. What goes in must come out, so if the demand exceeds the activity then a backlog will be growing inside the process. This growing queue creates a number of undesirable effects – first it takes up space, and second it increases the time for demand to be converted into activity. This conversion time is called the lead-time.

So, to avoid a growing queue and a growing wait, there must be sufficient flow-capacity at each and every step along the process. The obvious solution is to set the average flow-capacity equal to the average demand; and we do this because we know that more flow-capacity implies more cost – and to stay in business we must keep a lid on costs!

This sounds obvious and easy but does it actually work in practice?

The surprising answer is “No”. It doesn’t.

What happens in practice is that the measured average activity is always less than the funded flow-capacity, and so less than the demand. The backlogs will continue to grow; the lead-time will continue to grow; the waits will continue to grow; the internal congestion will continue to grow – until we run out of space. At that point everything can grind to a catastrophic halt. That is what we mean by a brittle process.

This fundamental and unexpected result can easily and quickly be demonstrated in a concrete way on a table top using ordinary dice and tokens. A credible game along these lines was described almost 40 years ago in The Goal by Eli Goldratt, originator of the school of improvement called Theory of Constraints. The emotional impact of gaining this insight can be profound and positive because it opens the door to a way forward which avoids the Flaw of Averages trap. There are countless success stories of using this understanding.


So, when we need to cope with variation and we choose a passive resilience approach then we have to plan to the extremes of the range of variation. Sometimes that is not possible and we are forced to accept the likelihood of failure. Or we can consider a different approach.

Reactive resilience is one that living systems have evolved to use extensively, and is illustrated by the simple reflex loop shown in the diagram.

A reactive system has three components linked together – a sensor (i.e. temperature sensitive nerves endings in the skin), a processor (i.e. the grey matter of the spinal chord) and an effector (i.e. the muscle, ligaments and bones). So, when a pre-defined limit of variation is reached (e.g. the flame) then the protective reaction withdraws the finger before it becomes damaged. The advantage this type of reactive resilience is that it is relatively simple and relatively fast. The disadvantage is that it is not addressing the cause of the problem.

This is called reactive, automatic and agnostic.

The automatic self-regulating systems that we see in biology, and that we have emulated in our machines, are evidence of the effectiveness of a combination of passive and reactive resilience. It is good enough for most scenarios – so long as the context remains stable. The problem comes when the context is evolving, and in that case the automatic/reflex/blind/agnostic approach will fail – at some point.


Survival in an evolving context requires more – it requires proactive resilience.

What that means is that the processor component of the feedback loop gains an extra feature – a memory. The advantage this brings is that past experience can be recalled, reflected upon and used to guide future expectation and future behaviour. We can listen and learn and become proactive. We can look ahead and we can keep up with our evolving context. One might call, this reactive adaptation or co-evolution and it is a widely observed phenomenon in nature.

The usual manifestation is this called competition.

Those who can reactively adapt faster and more effectively than others have a better chance of not failing – i.e. a better chance of survival. The traditional term for this is survival of the fittest but the trendier term for proactive resilience is agile.

And that is what successful organisations are learning to do. They are adding a layer of proactive resilience on top of their reactive resilience and their passive resilience.

All three layers of resilience are required to survive in an evolving context.

One manifestation of this is the concept of design which is where we create things with the required resilience before they are needed. This is illustrated by the design squiggle which has time running left to right and shows the design evolving adaptively until there is sufficient clarity to implement and possibly automate.

And one interesting thing about design is that it can be done without an understanding of how something works – just knowing what works is enough. The elegant and durable medieval cathedrals were designed and built by Master builders who had no formal education. They learned the heuristics as apprentices and through experience.


And if we project the word game forwards we might anticipate a form of resilience called proactive adaptation. However, we sense that is a novel thing because there is no proadaptive word in the dictionary.

PS. We might also use the term Anti-Fragile, which is the name of a thought-provoking book that explores this very topic.

Dis-Ease

I estimate that 99% of the hard work done in heath care is done by the patients themselves. Only a small fraction is done by the system of health care (the highly-trained people, the increasingly-expensive drugs, the high-tech equipment, and the data-hungry processes).

Each one of us is a magical self-healing system.

In fact, all living organisms are self-healing; from the microscopic bugs to the macroscopic biosphere. Everything that we call living is self-regulating and self-healing. Up to a point.

But, how has this living miracle been achieved?


The two concepts self-regulating and self-healing have a common framework called a sensor-processor-effector design. The sensor detects that something is not as expected, the processor decides how to react, and the effector does something to restore the expectation.

An example of self-regulation is the motion sensors in our ears that tell us if we are off balance. Our nervous system has well-rehearsed and coordinated patterns of actions that it can communicate through the nerves to muscles that act though our bones to restore our balance. When we are balanced we feel at ease. When we are out of balance we feel uneasy and we want to act to restore the equilibrium. And most of the time this self-regulation happens automatically. We do not need to consciously think about it.

The self-regulating property of living systems is called homeostasis which is a term that was coined by Dr Walter Cannon less than 100 years ago. One of his discoveries was the mechanism of the fright-flight-fight reaction. This is when we react automatically to a threat with a feeling of fright and an adrenalin-fuelled reaction that prepares our bodies for rapid action. Flight or Fight.

These self-regulation systems are built into us and are the result of millions of years of evolution. They have helped us to survive and to stay healthy. That is one reason we are still here.

The same concept of self-regulation operates at all scales from single microbes to global communities of multi-cellular organisms. All life on Earth.


At the microscopic level we have evolved many self-regulating chemical systems that ensure cellular homeostasis, such as the proteins in the cell membrane that use chemical energy to pump ions across to maintain a slight electrical gradient. This is rather like the domestic electricity supply that is maintained close to 240 volts and allows electrical energy to flow where and when it is needed.

And at the macroscopic level, many self-regulating biological systems have evolved over aeons to create a system of global homeostasis; such as the algae in the oceans that recycle the essential element sulphur back into the atmosphere where it is redistributed via clouds and rain to other organisms.

The flow diagram hints that this is just a small part of a self-regulating global geo-biological system that has been operating for billions of years and has maintained the narrow physical conditions for life to survive and evolve.

But, self-regulation has limits and if those limits are reached and breached then the result is harm. The self-regulating system can become damaged and if the damage is too severe then the deterioration escalates – resulting in a catastrophic, irreversible system failure that we call death.


So, living things have also evolved the property of self-healing.

This uses the same sensor-processor-effector design and the result is the repair of a damaged part of the system. This is remarkable and humbling because, while we have built many self-regulating machines, we have yet to build machines that are self-healing.

This self-healing property is an insurance policy against unpredictability and it happens at all scales. To achieve it implies that there is some form of blueprint that can be used as a reference; and in the case of all biological life that blueprint is encoded in a remarkable molecule called DNA.

Our DNA holds the blueprint of how we self-build, self-regulate, and self-heal.

So, if you accidentally damage something, like a cut finger, then a whole coordinated response springs into action to limit and repair the damage and restore you to health.

Sometimes the self-healing benefits from outside assistance. Which is where the healthcare system comes in. The term we use for this is illness and includes the sense that something external and outside our control is pushing us towards death.

This is what we refer to as dis-ease.


But the healthcare system also has structure, behaviour, sensors, measurements, policies, rules, processes, procedures and effectors. The healthcare system has to adapt dynamically to a changing context and it has to self-regulate to provide a safe, reliable, comfortable and affordable service in case of need.

And like all systems it has limits to this self-regulation which if they are exceeded then the components of the system can become damaged and not function as well, or at all. The healthcare system can become sick and to survive it needs to be able to self-diagnose, to self-treat and to self-heal.

So, is our UK health care system sick? Well, if we look at a sensitive measure of whole system unease, the A&E waiting time, then the chart does not look good. The deterioration has been progressive for over 10 years and the 2 years of covid-19 only created a wobble on the deteriorating trajectory. And before we jump to the usual conclusion that this is because of rising demand, as we saw before covid-19, that is not the case now. The system is even sicker than it was before and covid-19 may have actually accelerated the decline.


A healthcare system is not a living organism endowed with the accumulated wisdom of billions of years of evolution encoded into its DNA. It does not have a built-in, tried-and-tested, self-repair capability. And there is no external healer of healthcare systems. We are on our own.

So, is this a hopeless situation? Do we just have to accept our fate?

No. We don’t.

The health care system is a machine that we created. All these charts are saying is that our current design is not fit for purpose. And when we created the design we seem to have omitted the self-healing bit. Oops.

But, the health care system is full of clever, curious, creative, courageous, compassionate collaborative people who know how to design and build amazing things.

So, it seems that our challenge is to learn how to create a self-healing health care system. It would seem that all our survival depends on it.

And if we don’t do it then who will?

And if we can do that then what else could we self-heal?

Something even bigger maybe?

Out of the Crisis

The News seems to be filled with tales of woe about how the NHS is in a new state of even deeper crisis with thousands of unfilled posts, long waiting lists, and staff off sick with stress. And now people are starting to say the crisis has been creeping up on us for years and that the NHS is doomed to fail. No one has a credible plan. No one seems to know what to do.

What this all adds up is to a mindset that cannot solve the problem it created [Einstein].

This is not a new phenomenon. And the good news is that we know how to go about addressing it. We must diagnose the cause and then we can design a treatment.

The problem is that the cause is our own complacency.

As we collectively overcome challenges and design solutions that work better than before then the pain goes away and we start to take the gains for granted. We become complacent.

As the solutions and systems that support our modern lifestyles become more complicated we are forced into more narrow roles and we lose sight of the contribution that the skills of others make. We become complacent.

As economic growth feeds our insatiable desires we lose sight of the fact that a global ecosystem of finite resources cannot sustain perpetual growth. We become complacent.

The emergent crisis is predictable but it still comes as a nasty shock because we have become so complacent.

Survival is not inevitable. Many ancient civilisations have perished because they grew too big for their ecological boots and were unable to adapt quickly enough. They became complacent and they collapsed. All we find are relics in the dust.

We all need to heed the heuristics of history.

We all need to cultivate some new attitudes, behaviours and competencies. Such as Curiosity, Courage, Commitment, Cooperation, Collaboration and Competence.

There is no room for complacency.

Visual Feedback

We are visual animals. Sight is our most developed sense. We must see to believe but what we see is not what we perceive. Something gets in the way sometimes.

Have you heard the term “Can’t see for looking”?

This happens because what we perceive is influenced by what we expect to see. We can prime our perception by setting an expectation – and we can see faces in the flames.

So, if we don’t expect to see something we can be looking at something but not perceive it.

And if we don’t believe something can exist we can see something and perceive it as something else.

We can be visually deceived – and this weakness is exploited by illusionists and magicians.

So, how can we avoid this cognitive trap?

One way is to look at something from more than one perspective. What is hidden from one view may be plain to see from another. This is one reason why we have two eyes that look in about the same direction – the two slightly different perspectives allow us to perceive depth.

Another tactic is to observe over time and look for movement. In fact our eyes do this automatically because movement in our peripheral vision will alert us. We can also use movement to perceive depth – even with only one eye. The eye is a very sophisticated bit of evolved opto-neural engineering.

What about our mind’s eye?

Our mind’s eye is not actually an eye. It is not even a picture in the same sense as a camera would record. The images we conjure up when our eyes are closed are reconstructions. They are an output of our mental model, just as thoughts and decisions are. Our mind’s eye is what we expect to see.

And this is a massive asset because it allows us to compare expectation with experience and to look for differences. That takes a lot less neural processing and is much faster – so gives us a survival advantage.

But, if our mental model is not accurate then what we expect to see is not accurate and the gap between expectation and experience gets bigger. And if that gap gets to wide then we just feel a sense of confusion. The “Eh?” effect.

In that situation we have to choose between our mind’s eye and our real eye.

Some prefer their mind’s eye … they are called Intuitors.

Some prefer their real eyes … they are called Sensors.

And the inevitable outcome is conflict.

So, what can we do?

It is actually possible to use both – just not at the same time. And to do that we have to make the process conscious and deliberate. We can ask ourselves “What am I expecting to see?” and then “What am I actually seeing?”

And then we can ask “What might account for the gap?”

One possible and plausible cause is that we are making an unconscious assumption that is invalid and is generating an inaccurate expectation. This unrealistic expectation can distort our perception so much that we literally block out what we do not expect to see. And we may not even notice because all this neural processing happens outside of awareness. And it happens very fast. In the blink of an eye.

So, if we do not always accept perceptual distortion as a possibility then we may close our minds to learning.

We have a choice.

Drive-Thru-Care

Some believe that innovation is a high risk strategy with a high probability of failure, and that success is governed more by luck than judgement.

Unfortunately, that limiting belief creates an emotional barrier to change and will trap us in a perpetual maelstrom of disappointment, frustration and anxiety. The Victim Vortex.

So, here is an uplifting story that challenges this limiting belief.

The context is delivering urgent health care in the midst of the COVID pandemic.

How to design and build a care system: from jargon to achievement

Care in your Car

The first innovation was a drive-thru-care service for people urgently seeking help. They were not Accidents or Emergencies so did not need to spend hours waiting in A&E. And they could not be managed with just a telephone consultation; they needed to be seen by someone.

The second innovation was that this service was specified, designed, tested, built and implemented by the people who delivered the care. And they got it right first time. It worked exactly as they designed. Straight out of the box.

The third innovation was to abandon the traditional “suck-it-and-see” approach to healthcare improvement called Plan-Do-Study-Act or PDSA. Instead, they used the systems engineering approach called a Study-Plan-Do cycle which sounds similar but is fundamentally different. It starts with Study. Look Before Leap. First, they studied the behaviour of the system using a variety of simulation techniques to play with ideas in prototype before implementing them in practice.

It sounds simple and obvious but it is not business as usual in health care.


The drive-thru-care design included some other innovations. One was the booking system – it was done on line and patients were given timed-tickets. This sounds like a recipe for risk for an unscheduled urgent care service until we remember that if people just turn up when they feel like if then we can quickly get chaotic queues. Waiting in a queue for urgent assessment is scary. That is how congested A&E departments cause risk and harm.

Another innovation was to study the proposed delivery process; to simulate it using actual staff, actual equipment, a few bits of cardboard and some actors playing patients – and to actually measure the flow-capacity with clocks. Then to use this evidence to plan how to allocate the timed-tickets. Guesswork is not required. Only then did they plan and do. And it worked. Right first time. No queues. No chaos. No extra risk. No frustration. Just calm efficiency and delighted clients.

Making the creative leap: a healthcare case study (ICJ, April 2022)

Drive-Thru-Urgent-Care

The first implementations of this radical care-in-your-car concept were in some hastily assembled facilities in car-parks around East Birmingham using converted shipping containers and awnings. The first one was in South Car Park 5 at the NEC and went from design to delivery in three weeks; just in time for the first COVID tsunami that peaked Easter 2020.

Two years ago.

And it worked exactly as designed. Right first time. No queues. No chaos. Safe, calm, efficient and effective.

In fact, it worked so well that the commissioned provider (a local out-of-hours primary care service called Badger Medical) decided to invest in and implement a permanent drive-thru-care facility which is shown in the photo above. This was constructed in a disused warehouse and was officially opened in late October 2021 by Mayor Andy Street. Just in time to help mitigate the most recent waves of the COVID epidemic.

It too was designed by those who deliver the care – and it worked right-first-time.

And it included other evolving innovations such as an online booking system that allows patients to choose, cancel and change their timed-ticket – without needing to phone anyone!


Innovation does not need to be high risk; and in health care it cannot be allowed to be high risk. Creative leaps can be made in the safety of a simulation and the emergent “plan” informed by the learning from the study. Then the “do” works Right First Time. On Time. In Full. In Budget.

This is systems engineering. It works inside health care just as it works outside.

Fly-by-Wire

The big wide world outside health care is stuffed with proven solutions that we can use as metaphors to guide improvement. Aviation has already given us checklists that are now an integral part of surgical services – the WHO Safer Surgery Checklist. [We will skip the uncomfortable fact that safety checklists have been used in aviation for decades longer than in surgery].

So what other aviation metaphors might be of use?

Well, one that is of particular value in these turbulent times is fly-by-wire.

When the first heavier-than-air craft, the Wright Flyer I, took off on Dec 17th, 1903 the controls were purely mechanical. The way that the pilot’s stick moved the control surfaces was only one of many innovations that the Wright brothers had created and tested over years of careful experimentation. They were self-taught, practical engineers.

Decades later the link between the pilot and the plane became replaced by a more complicated electromechanical system of sensors, processors and actuators illustrated in the diagram below. And the reason that this is necessary is that modern aircraft are bigger, faster and need to be much safer, more comfortable, more efficient, and more reliable. As fare-paying passengers we take for granted all of the advanced engineering of modern aircraft when we climb aboard. We only notice when something goes wrong which, because of the hard work of the engineers, is a very rare event.

The essence of the fly-by-wire design is the triad of sensor-processor-actuator which we could translate into input-process-output (IPO) or even study-plan-do. Each of the three components is necessary and together the three components are sufficient. Interestingly, this is exactly how most biological systems are constructed – including us. Sensors, processors and effectors.

The part of the aircraft system that takes in the multi-sensory information and transforms it into signals that drive the control surfaces is NOT the pilot; it is the flight control computer (FCC). One of the sensory inputs comes from the pilot’s control stick and it gives the FCC an indication of what the pilot wants to achieve; turn, climb, descend and so on. The FCC works out the fine details of how to do it because it requires a complex and coordinated movement of more than one control surface. The pilot cannot fly the aircraft without the FCC.

But the FCC can fly the aircraft without the pilot. It just needs to get its objectives from somewhere else. For example, a GPS-enabled autonomous drone just needs a destination and maybe a set of way-points along a route and it can fly itself there – and dynamically adapt to the unpredictable and invisible winds along the way that would otherwise blow it off a rigidly predefined plan.


So, how does all this relate to health care?

Well, the key to the success of fly-by-wire is the design of the feedback loop. It has to use the correct sensory information; that information needs to be accurate and up-to-date; and it needs to be processed quickly enough to respond proactively. By that I mean the FCC needs to be able to predict what will happen in the near future and to plan accordingly to achieve the objectives and avoid the obstacles. The effect is smooth, stable, fast and efficient flight that looks and feels effortless. But that is an illusion. It is not easy and it is not effortless. If a pilot tries to do it, even a moment of hesitation or a minor gust of wind will cause the craft to become unstable, spin out of control and crash.

A human pilot cannot sense, process, plan and decide what to do accurately or quickly enough. Even if the pilot’s decisions and actions are correct, they will come too-late and have the unintended effect of actually destabilising the system.

This metaphor translates to health care. If we attempt to “fly” the health care system using a top-down-command-and-manual-control design that is akin to the 1903 Wright Flyer I, then we will experience frequent instability and occasional crashes – just as the early aircraft did. And if the “weather” gets blustery then we are even less likely to be able to control our health care system that way. We have learned the hard way that we are safer staying in our “hanger” and doing nothing. But, an aircraft parked in a hanger is ineffective as a vehicle and a hospital constrained by fear-fuelled-do-nothing-top-down-command-and-control policies is ineffective as a vehicle for care delivery. Especially when unpredictable challenges appear – such as a global pandemic of a novel infectious disease.


So, if we reflect on the fly-by-wire metaphor we could ponder on what we might need in health care: Embedded multi-sensors; automated real-time sensory data communication and transformation; and accurate, fast feedback loops that connect directly to where the micro-decisions are made and the micro-actions happen. The front line of health care delivery.

And we do not actually need centralised control hardware and software (i.e. flight control computers) because they are designed for the predictable and passive world of aerodynamics where the Non-Negotiable Laws of Physics decree what happens.

We need to be able to adapt to shifting preferences and policies. And something we already have in our health care system is “distributed chimpware” – trained and experienced health care staff. This resource, when used effectively, is much better suited to the shifting subjectivity of day-to-day health care delivery.

Humans have evolved with a highly developed visual sensory system – so the more effective and efficient way to feed back the real-time information is as a dynamic picture of what has happened, what is happening and what is about to happen. And so long as that picture is accurate and up-to-date enough; and we have a clear enough objective; and we have learned a validated method for transforming the information into decision and action – then the health care delivery system can be trusted to fly itself. And it would.


But we don’t have that in health care. We have a fragmented health care system that cannot work in a coordinated and collaborative way because it was never designed to.

So, we are forced to fall back on our out-of-date, ponderous and potentially destabilising and dangerous top-down-command-and-control approach.

Question: Why haven’t we designed the health care system to be as safe, stable, efficient and trusted as a modern passenger aircraft?

Could it be because we don’t have embedded health care systems engineers who can design, build, implement and maintain such things?


But there is some hope that this is possible and some evidence that it is useful.

There are a small number of trained and experienced systems engineers working in the NHS today. They have been working behind the scenes to mitigate some of the effects of the COVID pandemic; and they have demonstrated the art of the possible.

One innovative solution is a drive-thru urgent primary care service in Birmingham where patients with urgent problems (i.e. not emergency or routine) can be referred by their GP or book themselves – and are seen and assessed safely and quickly and without queuing. The prototype was created very quickly at the start of the pandemic and it has proved so effective that it has been developed and implemented as a more permanent solution.

Part of the control system design challenge was to ensure that the clinical resource capacity was well utilised (i.e. delivered value for money) but was not over-burdened. Even with the unpredictable nature of unscheduled urgent demand.

So, one component of the system design is a real-time visual feedback loop that the service provider uses to dynamically balance requests with resources. The screenshot below is real and recent.

The dotted lines show the averaged historical pattern of hourly demand (blue) and flow-capacity (red) and these are used to create the solid lines which are cumulative sums. The green line shows ‘now’ so the past is to the left and the future is to the right. The black line is the current cumulative booked capacity and it cannot cross the red line (available slot-capacity) because the bookings are done on-line and a patient cannot attend without a ‘ticket’. Just as we do for trains, etc. The chart is updated automatically every 2 minutes so the service provider can see what is coming over the time horizon and use that information to dynamically tweak the process. This way, it is possible to achieve high utilisation (i.e. minimal waste of expensive and scarce clinical resources) while also avoiding chaotic queues and long, potentially unsafe, delays for patients.

So, maybe health care could use more of these aerospace engineering metaphors and grow some more health care systems engineers?

The growing body of evidence shows the art of the possible.

Business as Usual

At last the light appears to be visible at the end of the tunnel for Covid-19 in the UK. And we have our fingers-crossed that we can contemplate getting back to business as usual. What ever that is.

For the NHS, attention will no doubt return to patient access targets. The data has continued to be collected, processed and published for the last 16 months, so we are able to see the impact that the Covid-19 epidemic has had on the behaviour of the hospital-based emergency system. The Emergency Departments.

The run chart below shows the monthly reported ED metrics for England from Nov 2010.

The solid grey line is the infamous 4 Hr target – the proportion of ED attendances that are seen and admitted or discharged within 4 hours. It reveals that the progressive decline over the last decade improved during the first and second waves. And if we look for plausible causes we can see that the ED attendances dropped precipitously (blue dotted line) in both the first and second waves. We dutifully “Stayed at Home to Protect the NHS and Save Lives”.

The drop in ED attendances was accompanied by a drop in ED admissions (dotted red line) but a higher proportion of those who did attend were admitted (solid orange line) – which suggests they were sicker patients. So, all that makes sense.

And as restrictions are relaxed we can see that attendances, admissions, 4 Hr yield and proportion admitted are returning to the projected levels. Business as Usual.


Up to March 2021 the chart says that 70-75% of patients who attend ED did not need to be admitted to hospital. So this begs a raft of questions

Q1: What is that makes nearly 35,000 people per day to go to ED and then home?

Q2. How can the ED footfall drop by 50% almost overnight?

Q3. Where did those patients go for the services they were previously seeking in ED?

Q4. What were their outcomes?

Q5. What are the reasons they were choosing to go to ED rather than their GP before March 2020?

Q6. How much of the ED demand is spillover from Primary Care?

Q7: How much of the ED workload is diagnostic testing to exclude serious illness?

Q8: What lessons can be learned to mitigate the growing pressure on EDs?

Q9: Can urgent care services for this 70% be provided in a more distributed way?


And if we can do drive-thru urgent testing during Covid-19 and we can do and drive-thru urgent treatment during Covid-19 then perhaps we can do more drive-thru urgent care after Covid-19?

Sting in the Tail

Monday 19th July 2021 was the official end of COVID-19 restrictions in England – yet the number of positive tests, hospital admissions and deaths is rising. “How can that make any sense!” wail the doom mongers. Is it irresponsible? Are we destined for a deadly third wave? Is a nasty sting in the tail on the way?

To address these questions we need to step back and look at the bigger picture.

As we have seen, the evolution of the COVID-19 pandemic has been tricky to predict because the virus and the host have been co-evolving. The host has implemented social distancing and developed vaccines to attenuate the viral spread and illness severity. The virus has mutated and more contagious variants have emerged as the dominant players.

And trying to work out how all these factors combine together is beyond the computational ability of the 1.4 kg of chimpware between our ears. Our intuition is confounded by the counter-intuitive complexity. We need help.

Here is the published data … the orange line is the daily reported positive COVID tests and the red dotted line is the daily reported COVID deaths. There is a clear temporal association but the size of the peaks don’t seem to make sense – even when we note that the test and death lines are plotted on very different scales.

One problem here is that the number of positive tests reported is very dependent on the testing process. In the first wave only hospital admissions were tested; in the second wave there was much more community-based testing of symptomatic people; and now many people are self-testing regularly to provide evidence of wellness.

The only way to unravel this Gordian Knot of interacting influences is to use the data to build and calibrate a causal structure model (CSM). Conventional statistical analysis is not up to the job because it conflates association and causation. We need something which is able to provide a diagnosis and a prognosis. Something that can use the past to help predict the future.

The blue line in the chart below is the output of a CSM that has been designed using proven principles of epidemic dynamics, and calibrated using historical data. And it predicts that there is indeed a third wave underway and that it is minor in comparison with the first two in terms of the predicted mortality.

The emergence of a third wave is the combined effect of three things:
a) The relaxing of social distancing rules.
b) The emergence and spread of more contagious variants of the virus.
c) The known fact that the vaccine is not 100% effective.
d) The known fact that immunity after illness or vaccination will wane with time.

One use of a CSM is to conduct counterfactual analysis which helps us to deepen our understanding of how complex systems behave. These are called “What would have happened if?” experiments.

One such experiment is “What would have happened if the vaccine was completely effective?

Here is the CSM prediction for a 100% effective vaccine: The first and second waves were the same because the vaccination programme did not start until the peak of the second wave – and there is no third wave even with complete relaxation of social distancing.

But the actual data disproves this causal hypothesis because there is a third wave developing.


So, here is the CSM prediction for a 0% effective vaccine: The first and second waves are largely unchanged and now we have a third wave as bad as the second. A nasty sting in the tail.

But then the epidemic fizzles out because all the host “fuel” of susceptible people has been used up.


Setting the aggregate effectiveness of the vaccine to 75% gives us the best fit to the historical data; and that value is consistent with the pilot studies of vaccine effectiveness.

And what is the most useful evidence that suggests this latest prediction is reliable? It is that the infection rate is predicted to be falling already, despite distancing rules being relaxed, and that is what the data is showing.

And with this re-calibrated CSM we can estimate the impact of the vaccination programme in terms of lives saved … at it comes out at about 40,000 people! That is a lot.

So what next?

Well, we know that immunity will wane with time, and we know that new viral variants will emerge, and we know that coronavirus will be with us for the foreseeable future at a background level.

And we have seen how this pandemic has exposed the vulnerabilities of our current socioeconomic systems – health and social care, education, transport, communication, commerce and so on. Every part of the system has been affected because everything is interconnected.

We cannot just go back to business as usual. The world has been changed. And our immediate challenge is to redesign and rebuild a health care system that is safer, more efficient and more agile and that will serve us better in the future.

Another lesson learned is just how useful systems engineering theory, tools and techniques has been – the CSM demonstrated above is a standard systems engineering technique.

So, we will need some more health care systems engineers. A lot more. And they will need to be embedded at all levels in the NHS as an integral part of the system.

A self-healing health care system.

Emergence

The last year has been dominated by one theme – the SARS-2-CoV global pandemic. It has been a roller coaster ride of ups and downs and twists and turns, often in darkness and accompanied by the baleful drone of doom-mongers and naysayers. But there have also been bright flashes of insight that have illuminated the way and surges of innovation that have carved new designs out of old paradigms.

What we are experiencing is the evolution of a complex adaptive system and what we are seeing is the emergence of a new normal.

Almost nothing will be the same again.

The diagram above tells many inter-weaved story threads that cannot be untangled. Two Chapters are complete – CRC and UTC. We are just starting Chapter 3.

The first Thread of Tragedy is shown by the red dotted line. It is the number of daily COVID-19 associated deaths reported in the UK. The total stands at just over 127,000 which is a more than enough to fill the whole of Wembley Stadium. And a lot more.

The solid red line on the diagram is the result of removing the 7-day oscillation caused the the reporting process which opts to take weekends off.

COVID-19 is busy 24 x 7.

The first reported COVID-19 death in the UK was in the first week of March 2020. The WHO declared a global pandemic the following week, and the UK implemented the first part of a national lock down the week after. It closed some pubs in London. The need for speed was because hospital admissions and deaths were growing exponentially. As the chart shows – deaths were doubling every few days.

The Chancellor’s Magic Blank Chequebook appeared and several Nightingales were rapidly assembled to absorb the predicted storm surge. However, critically ill patients require specialised equipment and highly trained staff – and those necessities were already in short supply. As was the personal protective equipment (PPE) the front-line staff needed to keep them safe.

The Nightingales were never going to be able to sing. It was a doomed design from Day #1.

The bigger problem was the millions of potentially infectious people who would get poorly but not unwell enough to go to hospital. What was the national plan for them? It seemed that there wasn’t one. So, we created our own. The COVID Referral Centre. CRC.

This was Chapter One and the story of that has already been shared here.

The CRC was an innovative drive-thru design and a temporary solution that was conceived, commissioned, constructed and opened in 3 weeks (the red box at the top of the first diagram). It worked as designed and it was disassembled, as planned, at the predicted end of the First Wave (the orange box at the top of the first diagram).

What happened next is even more interesting. We had demonstrated, by doing it, that a drive-thru design was feasible and now we had a new challenge. Most of the elective and urgent services had been mothballed to free up space and staff to fight the First Wave. And we had no clear picture what would happen if lock down restrictions were released. The Nightingales were held in readiness. An expensive and ineffective insurance policy.

Could the drive-thru design be used for a handful of small, temporary urgent treatment centres (UTCs)?

A key lesson from the CRC was the critical importance of managing the inflow to avoid a traffic jam of anxious and potentially very poorly people. We solved this using an electronic triage and referral app that was rapidly designed, developed and delivered for the opening of the CRC. Doing that took a whole week using the JEDI method (Just ‘Effing Do It) also known as Agile.

By August 2020 things were getting back to sort-of-normal. People were having summer holidays. Schools and universities were concocting elaborate plans to re-open in the autumn. And we were thinking ahead to Winter 2020 and the prospect of seasonal flu on top of a possible resurgence of COVID. The much-feared Second Wave.

So, just before the CRC was decommissioned we took the opportunity to measure how many people could be vaccinated in an innovative drive-thru compared with a conventional walk-in. An important constraint was we did not want queues of vulnerable elderly people inside or outside. This time we had the luxury of being able to map and measure the process properly and it revealed that the drive-thru option was feasible.

We now had the information we needed to design a high efficiency flow scheduler which would set the rate at which patients could arrive without causing queues and chaos, and at the same time make good use of the available and valuable resources.

The next design question we had to answer was “How will the booking be done?” and the immediate answers offered were “on-line” and “by the patient”.

But, this was not how the CRC worked. In that service the patient had to speak to a GP who assessed their symptoms and, if deemed necessary, referred them electronically to the CRC for a face-2-face assessment. The e-referral app was designed to limit the number of referrals to prevent a traffic jam and it also automatically assigned the next available free slot to make best use of the resources. There was no patient choice.

The other question that spun out of this exercise was “If patients could book their own appointments for a routine flu jab then could they refer themselves to a drive-thru urgent treatment centre?”

Now we were shaking the trees a bit too hard. The general consensus was “No“. But why not? Surely the patient is best placed to decide how urgent they feel their problem is? And anyway, an online self-referral can be quickly screened and any inappropriate ones addressed proactively. It is probably a better design than a walk-in service.

So, we decided to design a prototype online self-referral system and we looked on the Web for ideas that solved a similar “niggle” of being able provide convenient 24 x 7 online access to a traditional face-to-face 9-5 Mon-Fri service. Rather like the niggle of trying to get an urgent appointment at your GP practice. Or the niggle of finding an increasingly rare Post Office to go to and to get the right postage stamps for an urgent big letter / small parcel.

We discovered that the postage stamp niggle had been solved with an online app for a pay-and-print-postage-label. So, that gave us a validated design to start from.

All this digital innovation was going on during the Blue Period on the first diagram, along with the planning of a cluster of small, temporary, drive-thru UTCs placed in more convenient locations for patients. And by the time the whole caboodle was ready-to-roll it was apparent that the feared Second Wave was building momentum.

The drive-thru UTC service opened its gates in early October 2020 and only four weeks later the nation was commanded to lock down for a second time. The return of pupils to schools and students to university had created the perfect COVID incubator and the emergence of a hyper-contagious Mutant. The first diagram shows when the ‘fire-break’ lock down was eased, and when the Mutant exploded out of its cage, wiped out Christmas and doubled the UK death toll.

But, the drive-thru UTCs weathered the winter storms – figuratively and literally. They valiantly delivered a much needed service while the hospitals were swamped with a third tsunami of critically ill. The NHS was better prepared this time, which is just as well because the Third Wave was much bigger than the First.

And the data the UTCs collected themselves showed that the prototype self-referral app worked as designed. We have seen gradual adoption over the seven months since it was first piloted (see below). The day-to-day variation is not random. The weekly spikes on the chart coincide with weekends when GP practices are shut and A&Es are busy dealing with accidents and emergencies (not anyone and everything).

So what does the future hold?

When COVID is just a bad/sad memory and the NHS is grappling with the elephantine challenge of post-COVID recovery amidst yet another re-disorganisation, would a more permanent drive-thru urgent care service be a viable service delivery option?

Based on the hard evidence shown I would say “Yes“.

Necessity is the Mother of Invention.

Engineers Design Things to be Fit-for-Purpose.

One Year On

This is a picture that tells a story. In fact, it is a picture of millions of stories. Some tragic. Some heroic. Most neither. This is a story of a system adapting to an unexpected and deadly challenge. Over 125,000 souls have been lost. Much has been learned. We cannot return to what was before. The world has changed.

There are three lines on this chart.

The dotted red line is the daily reported deaths, and the obvious pattern is the weekly oscillation. This is caused by the fact that for two days of the week many people do not sit at their computers processing data. These are called weekends. So, they have to catch up with the data backlog when they return to work on Monday.

The solid red line illustrates what actually happened … the actual number of souls lost per day … peaking at over 1000 in January 2021. The ups and downs show the effect of three drastic interventions to limit the spread of a merciless virus that was mutating, evolving and competing with itself to spread faster.

This is a picture of a system learning how the Universe works – the hard, painful way.

The blue line is a prediction of how many souls would be lost, and it is surprisingly accurate. The blue line was generated by a computer. Not a multi-million pound supercomputer like the ones used to predict the weather – but a laptop like those millions of people use every day. And the reason the prediction is so accurate is because epidemics follow simple mathematical rules – and these rules were worked out about 100 years ago.

The tricky bit is turning these simple mathematical formulae into an accurate prediction … in our heads … intuitively. And the reason it is so tricky is because our brains have not evolved to do that. It is not a matter of lack of intelligence … it is just that a human brain is the wrong tool for that job.

But, what our brains are superbly evolved to do is conceptualise, innovate and collaborate to create tools like computers and Excel spreadsheets.

And many have said that in one year we have achieved ten years worth of innovation. We had to. Our lives depended on it.

So, now we have seen what is possible with a burning platform pushing us. How about we keep going with burning ambition pulling us to innovate and improve further?

Our lives and livelihoods will depend on it.

The Crystal Ball

A crystal ball or orbuculum is a crystal or glass ball and is associated with the performance of clairvoyance and the ability to predict future events.

Before the modern era, those who claimed to be able to see the future were treated with suspicion and branded as alchemists, magicians and heretics.

Nowadays we take it for granted that the weather can be predicted with surprising accuracy for a few days at least – certainly long enough to influence our decisions.

And weather forecasting is a notoriously tricky challenge because small causes can have big effects – and big causes can have no effects.  The reason for this is that weather forecasting is called a nonlinear problem and to solve it we have had to resort to using sophisticated computer simulations run on powerful computers.

In contrast, predicting the course of the COVID-19 epidemic is a walk in the park.   It too is a nonlinear problem but much a less complicated one that can be solved using a simple computer simulation on a basic laptop.

The way it is done is to use the equations that describe how epidemics work (which have been known for nearly 100 years) and then use the emerging data to calibrate the model, so over time it gets more accurate.

Here’s what it looks like for COVID-19 associated mortality in the UK.  The red dotted line is the reported data and the oscillation is caused by the reporting process with weekend delays.  The solid red line is the same data with the 7-day oscillation filtered out to reveal the true pattern.  The blue line is the prediction made my the model.

And we can see how accurate the prediction is, especially since the peak of the third wave.

What this chart does not show is the restrictions being gradually lifted and completely removed by April 2020.

The COVID Crystal Ball says it will be OK so long as nothing unexpected happens – like a new variation that evades our immune systems, or even a new bug completely.

It has been a tough year.  We have learned a lot through hardship and heroism and that a random act of nature can swat us like an annoying fly.

So, perhaps our sense of hope should be tempered with some humility because the chart above did not need to look like that.  We have the knowledge, tools and skills to to better.  We have lots of Crystal Balls.

End In Sight

We are a month into Lock-down III.

Is there any light at the end of the tunnel?

Here is the reported UK data.  As feared the Third Wave was worse than the First and the Second, and the cumulative mortality has exceeded 100,000 souls.  But the precipitous fall in reported positive tests is encouraging and it looks like the mortality curve is also turning the corner.

The worst is over.

So, was this turnaround caused by Lock-down III?

It is not possible to say for sure from this data.  We would need a No Lock-down randomised control group to keep the statistical purists happy and we could not do that.

Is there another way?

Yes, there is.  It is called a digital twin.  The basic idea is we design, build, verify and calibrate a digital simulation model of the system that we are interested and use that to explore cause-and-effect hypotheses.  Here is an example: The solid orange line in the chart above (daily reported positive tests) is closely related to the dotted grey line in the chart below (predicted daily prevalence of infectious people).   Note the almost identical temporal pattern and be aware that in the first wave we only reported positive tests of patients admitted to hospital.

What does our digital twin say was the cause?

It says that the primary cause of the fall in daily prevalence of infectious people is because the number of susceptible people (the solid blue line) has fallen to a low enough level for the epidemic to fizzle out on its own.  Without any more help from us.

And it says that Lock-down III has contributed a bit by flattening and lowering the peak of infections, admissions and deaths.

And it says that the vaccination programme has not contributed to the measured fall in prevalence.

What are the implications if our digital twin is speaking the truth?

Firstly, that the epidemic is already self-terminating.
Secondly, that the restrictions will not be needed after the end of February.
Thirdly, that a mass vaccination programme is a belt-and-braces insurance policy.

I would say that is all good news.  The light the end would appear to be in sight.

No Queue Vaccination

Vaccinating millions of vulnerable people in the middle of winter requires a safe, efficient and effective process.

It is not safe to have queues of people waiting outside in the freezing cold.  It is not safe to have queues of people packed into an indoor waiting area.

It is not safe to have queues full stop.

And let us face it, the NHS is not brilliant at avoiding queues.

My experience is that the commonest cause of queues in health care processes something called the Flaw of Averages.

This is where patients are booked to arrive at an interval equal to the average rate they can be done.

For example, suppose I can complete 15 vaccinations in an hour … that is one every 4 minutes on average … so common sense tells me it that the optimum way to book patients for their jab is one every four minutes.  Yes?

Actually, No.  That is the perfect design for generating a queue – and the reason is because, in reality, patients don’t arrive exactly on time, and they don’t arrive at exactly one every three minutes, and  there will be variation in exactly how long it takes me to do each jab, and unexpected things will happen.  In short, there are lots of sources of variation.  Some random and some not.  And just that variation is enough to generate a predictably unpredictable queue.  A chaotic queue.

The Laws of Physics decree it.


So, to illustrate the principles of creating a No Queue design here are some videos of a simulated mass vaccination process.

The process is quite simple – there are three steps that every patient must complete in sequence:

1) Pre-Jab Safety Check – Covid Symptoms + Identity + Clinical Check.
2) The Jab.
3) Post-Jab Safety Check (15 minutes of observation … just-in-case).

And the simplest layout of a sequential process is a linear one with the three steps in sequence.

So, let’s see what happens.

Notice where the queue develops … this tells us that we have a flow design problem.  A queue is signpost that points to the cause.

The first step is to create a “balanced load, resilient flow” design.

Hurrah! The upstream queue has disappeared and we finish earlier.  The time from starting to finishing is called the makespan and the shorter this is, the more efficient the design.

OK. Let’s scale up and have multiple, parallel, balanced-load lanes running with an upstream FIFO (first-in-first-out) buffer and a round-robin stream allocation policy (the sorting hat in the video).  Oh, and can we see some process performance metrics too please.

Good, still no queues.  We are making progress.  Only problem is our average utilisation is less than 90% and The Accountants won’t be happy with that.  Also, the Staff are grumbling that they don’t get rest breaks.

Right, let’s add a Flow Coordinator to help move things along quicker and hit that optimum 100% utilisation target that The Accountants desire.

Oh dear!  Adding a Flow Coordinator seems to made queues worse rather than better; and we’ve increased costs so The Accountants will be even less happy.  And the Staff are still grumbling because they still don’t get any regular rest breaks.  The Flow Coordinator is also grumbling because they are running around like a blue a***d fly.  Everyone is complaining now.  That was not the intended effect.  I wonder what went wrong?

But, to restore peace let’s take out the Flow Coordinator and give the Staff regular rest breaks.

H’mm.  We still seem to have queues.  Maybe we just have to live with the fact that patients have to queue.  So long as The Accountants are happy and the Staff  get their breaks then that’s as good as we can expect. Yes?

But … what if … we flex the Flow Coordinator to fill staggered Staff rest breaks and keep the flow moving calmly and smoothly all day without queues?

At last! Everyone is happy. Patients don’t wait. Staff are comfortably busy and also get regular rest breaks. And we actually have the most productive (value for money) design.

This is health care systems engineering (HCSE) in action.

PS. The Flaw of Averages error is a consequence of two widely held and invalid assumptions:

  1. That time is money. It isn’t. Time costs money but they are not interchangeable.
  2. That utilisation and efficiency are interchangeable.  They aren’t.  It is actually often possible to increase efficiency and reduce utilisation at the same time!

The Final Push

It is New Year 2021 and the spectre of COVID-4-Christmas came true.  We are now in the depths of winter and in the jaws of the Third Wave.  What happened?  Let us look back at the UK data for positive tests and deaths to see how this tragic story unfolded.

There was a Second Wave that started to build when Lock-down I was relaxed in July 2020.  And it looks like Lock-down II in November 2020 did indeed have a beneficial effect – but not as much as was needed.  So, when it too was relaxed at the start of December 2020 then … infections took off again … even faster than before!

That is the nature of epidemics and of exponential growth.  It seems we have not learned those painful lessons well enough.

And we all so desperately wanted a more normal Xmas that we conspired to let the COVID cat out of the bag again.  The steep rise in positive tests is real and we know that because a rise in deaths is following about three weeks behind.  And that means hospitals have filled up again.

Are we back to square one?

The emerging news of an even more contagious variant has only compounded our misery, but it is hard to separate the effect of that from all the other factors that are fuelling the Third Wave.

Is there no end to this recurring nightmare?

The short answer is – “It will end“.  It cannot continue forever.  All epidemics eventually burn themselves out when there are too few susceptible people left to infect and we enter the “endemic” phase.  When that happens the R number will gravitate to 1.0 again which some might find confusing.  The confusion is caused by mixing up Ro and Rt.

How close are we to that end game?

Well, we are certainly a lot closer than we were in July 2020 because millions more people have been exposed, infected and recovered and many of those were completely asymptomatic.  It is estimated that about a third of those who catch it do not have any symptoms – so they will not step forward to be tested and will not appear in the statistics.  But they can unwittingly and silently spread the virus while they are infectious.  And many who are symptomatic do not come get tested so they won’t appear in the statistics either.

And there are now two new players in the COVID-19 Game … the Pfizer vaccine and the Oxford vaccine.  They are the White Knights and they are on our side.

Hurrah!

Now we must manufacture, distribute and administer these sickness-and-death-preventing vaccines to 65 million people as soon as possible.  That alone is a massive logistical challenge when we are already fighting battles on many fronts.  It seems impossible.

Or do we?

It feels obvious but is it the most effective strategy?  Should we divert our limited, hard-pressed, exhausted health care staff to jabbing the worried-well?  Should we eke out our limited supplies of precious vaccine to give more people a first dose by delaying the second dose for others?

Will the White Knights save us?

The short answer is – “Not on their own“.

The maths is simple enough.

Over the last three weeks we have, through Herculean effort, managed to administer 1 million first doses of the Pfizer vaccine.  That sounds like a big number but when put into the context of a UK population of 65 million it represents less than 2% and offers only delayed and partial protection.  The trial evidence confirmed that two doses of the Pfizer vaccine given at a three week interval would confer about 90% protection.  That is the basis of the licence and the patient consent.

So, even if we delay second doses and double the rate of first dose delivery we can only hope to partially protect about 2-3% of the population by the end of January 2021.  That is orders of magnitude too slow.

And the vaccines are not a treatment.  The vaccine cannot mitigate the fact that a large number of people are already infected and will have to run the course of their illness.  Most will recover, but many will not.

So, how do we get our heads around all these interacting influences?  How do we predict how the Coronavirus Game is likely to play out over the next few weeks? How do we decide what to do for the best?

I believe it is already clear that trying to answer these questions using the 1.3 kg of wetware between our ears is fraught with problems.

We need to seek the assistance of some hardware, software and some knowledge of how to configure them to illuminate the terrain ahead.


Here is what the updated SEIR-V model suggests will happen if we continue with the current restrictions and the current vaccination rate.  I’ve updated it with the latest data and added a Vaccination component.

The lines to focus on are the dotted ones: grey = number of infected cases, yellow = number ill enough to justify hospital treatment, red = critically ill and black = not survived.

The vertical black line is Now and the lines to the right of that is the most plausible prediction.

It says that a Third Wave is upon us and that it could be worse than the First Wave.  That is the bad news. The good news is that the reason that the infection rate drops is because the epidemic will finally burn itself out – irrespective of the vaccinations.

So, it would appear that the White Knights cannot rescue us on their own … but we can all help to accelerate the final phase and limit the damage – if we all step up and pull together, at the same time and in the same direction.

We need a three-pronged retaliation:

  1. Lock-down:  “Stay at home. Protect the NHS. Save Lives”.  It worked in the First Wave and it will work in the Third Wave.
  2. Care in the Community:  For those who will become unwell and who will need the support of family, friends, neighbours and the NHS.
  3. Volunteer to Vaccinate:  To protect everyone as soon as is practically feasible.

Here is what it could look like.  All over by Easter.

There is light at the end of the tunnel.  The end is in sight.  We just have to pull together in the final phase of the Game.


PS. For those interested in how an Excel-based SEIR-V model is designed, built and used here’s a short (7 minute) video of the highlights:

This is health care systems engineering (HCSE) in action.

And I believe that the UK will need a new generation of HCSEs to assist in the re-designing and re-building of our shattered care services.  So, if you are interested then click here to explore further.

Second Wave

The summer holidays are over and schools are open again – sort of.

Restaurants, pubs and nightclubs are open again – sort of.

Gyms and leisure facilities are open again – sort of.

And after two months of gradual easing of social restrictions and massive expansion of test-and-trace we now have the spectre of a Second Wave looming.  It has happened in Australia, Italy, Spain and France so it can happen here.

As usual, the UK media are hyping up the general hysteria and we now also have rioting disbelievers claiming it is all a conspiracy and that re-applying local restrictions is an infringement of their liberty.

So, what is all the fuss about?

We need to side-step the gossip and get some hard data from a reliable source (i.e. not a newspaper). Here is what worldometer is sharing …

OMG!  It looks like The Second Wave is here already!  There are already as many cases now as in March and we still have the mantra “Stay At Home – Protect the NHS – Save Lives” ringing in our ears.  But something is not quite right.  No one is shouting that hospitals are bursting at the seams.  No one is reporting that the mortuaries are filling up.  Something is different.  What is going on?  We need more data.That is odd!  We can clearly see that cases and deaths went hand-in-hand in the First Wave with about 1:5 cases not making it.  But this time the deaths are not rising with the cases.

Ah ha!  Maybe that is because the virus has mutated into something much more benign and because we have got much better at diagnosing and treating this illness – the ventilators and steroids saved the day.  Hurrah!  It’s all a big fuss about nothing … we should still be able to have friends round for parties and go on pub crawls again!

But … what if there was a different explanation for the patterns on the charts above?

It is said that “data without context is meaningless” … and I’d go further than that … data without context is dangerous because if it leads to invalid conclusions and inappropriate decisions we can get well-intended actions that cause unintended harm.  Death.

So, we need to check the context of the data.

In the First Wave the availability of the antigen (swab) test was limited so it was only available to hospitals and the “daily new cases” were in patients admitted to hospital – the ones with severe enough symptoms to get through the NHS 111 telephone triage.  Most people with symptoms, even really bad ones, stayed at home to protect the NHS.  They didn’t appear in the statistics.

But did the collective sacrifice of our social lives save actual lives?

The original estimates of the plausible death toll in the UK ranged up to 500,000 from coronavirus alone (and no one knows how many more from the collateral effects of an overwhelmed NHS).  The COVID-19 body count to date is just under 50000, so putting a positive spin on that tragic statistic, 90% of the potential deaths were prevented.  The lock-down worked.  The NHS did not collapse.  The Nightingales stood ready and idle – an expensive insurance policy.  Lives were actually saved.

Why isn’t that being talked about?

And the context changed in another important way.  The antigen testing capacity was scaled up despite being mired in confusing jargon.  Who thought up the idea of calling them “pillars”?

But, if we dig about on the GOV.UK website long enough there is a definition:

So, Pillar 1 = NHS testing capacity Pillar 2 = commercial testing capacity and we don’t actually know how much was in-hospital testing and how much was in-community testing because the definitions seem to reflect budgets rather than patients.  Ever has it been thus in the NHS!

However, we can see from the chart below that testing activity (blue bars) has increased many-fold but the two testing streams (in hospital and outside hospital) are combined in one chart.  Well, it is one big pot of tax-payers cash after all and it is the same test.

To unravel this a bit we have to dig into the website, download the raw data, and plot it ourselves.  Looking at Pillar 2 (commercial) we can see they had a late start, caught the tail of the First Wave, and then ramped up activity as the population testing caught up with the available capacity (because hospital activity has been falling since late April).

Now we can see that the increased number of positive tests could be explained by the fact that we are now testing anyone with possible COVID-19 symptoms who steps up – mainly in the community.  And we were unable to do this before because the testing capacity did not exist.

The important message is that in the First Wave we were not measuring what was happening in the community – it was happening though – it must have been.  We measured the knock on effects: hospital admissions with positive tests and deaths after positive tests.

So, to present the daily positive tests as one time-series chart that conflates both ‘pillars’ is both meaningless and dangerous and it is no surprise that people are confused.


This raises a question: Can we estimate how many people there would have been in the community in the First Wave so that we can get a sense of what the rising positive test rate means now?

The way that epidemiologists do this is to build a generic simulation of the system dynamics of an epidemic (a SEIR multi-compartment model) and then use the measured data to calibrate the this model so that it can then be used for specific prediction and planning.

Here is an example of the output of a calibrated multi-compartment system dynamics model of the UK COVID-19 epidemic for a nominal 1.3 million population.  The compartments that are included are Susceptible, Exposed, Infectious, and Recovered (i.e. not infectious) and this model also simulates the severity of the illness i.e. Severe (in hospital), Critical (in ITU) and Died.

The difference in size of the various compartments is so great that the graph below requires two scales – the solid line (Infectious) is plotted on the left hand scale and the others are plotted on the right hand scale which is 10 times smaller.  The green line is today and the reported data up to that point has been used to calibrate the model and to estimate the historical metrics that we did not measure – such as how many people in the community were infectious (and would have tested positive).

At the peak of the First Wave, for this population of 1.3 million, the model estimates there were about 800 patients in hospital (which there were) and 24,000 patients in the community who would have tested positive if we had been able to test them.  24,000/800 = 30 which means the peak of the grey line is 30 x higher than the peak of the orange line – hence the need for the two Y-axes with a 10-fold difference in scale.

Note the very rapid rise in the number of infectious people from the beginning of March when the first UK death was announced, before the global pandemic was declared and before the UK lock-down was enacted in law and implemented.  Coronavirus was already spreading very rapidly.

Note how this rapid rise in the number of infectious people came to an abrupt halt when the UK lock-down was put into place in the third week of March 2020.  Social distancing breaks the chain of transmission from one infectious person to many other susceptible ones.

Note how the peaks of hospital admissions, critical care admissions and deaths lag after the rise in infectious people (because it takes time for the coronavirus to do its damage) and how each peak is smaller (because only about 1:30 get sick enough to need admission, and only 1:5 of hospital admissions do not survive.

Note how the fall in the infectious group was more gradual than the rise (because the lock-down was partial,  because not everyone could stay at home (essential services like the NHS had to continue), and because there was already a big pool of infectious people in the community.


So, by early July 2020 it was possible to start a gradual relaxation of the lock down and from then we can see a gradual rise in infectious people again.  But now we were measuring them because of the growing capacity to perform antigen tests in the community.  The relatively low level and the relatively slow rise are much less dramatic than what was happening in March (because of the higher awareness and the continued social distancing and use of face coverings).  But it is all too easy to become impatient and complacent.

But by early September 2020 it was clear that the number on infectious people was growing faster in the community – and then we saw hospital admissions reach a minimum and start to rise again.  And then the number if deaths reach a minimum and start to rise again.  And this evidence proves that the current level of social distancing is not enough to keep a lid on this disease.  We are in the foothills of a Second Wave.


So what do we do next?

First, we must estimate the effect that the current social distancing policies are having and one way to do that would be to stop doing them and see what happens.  Clearly that is not an ethical experiment to perform given what we already know.  But, we can simulate that experiment using our calibrated SEIR model.  Here is what is predicted to happen if we went back to the pre-lockdown behaviours: There would be a very rapid spread of the virus followed by a Second Wave that would be many times bigger than the first!!  Then it would burn itself out and those who had survived could go back to some semblance of normality.  The human sacrifice would be considerable though.

So, despite the problems that the current social distancing is causing, they pale into insignificance compared to what could happen if they were dropped.

The previous model shows what is predicted would happen if we continue as we are with no further easing of restrictions and assuming people stick to them.  In short, we will have COVID-for-Christmas and it could be a very nasty business indeed as it would come at the same time as other winter-associated infectious diseases such as influenza and norovirus.

The next chart shows what could happen if we squeeze the social distancing brake a bit harder by focusing only on the behaviours that the track-and-trace-and-test system is highlighting as the key drivers of the growth infections, admissions and deaths.

What we see is an arrest of the rise of the number of infectious people (as we saw before), a small and not sustained increase in hospital admissions, then a slow decline back to the levels that were achieved in early July – and at which point it would be reasonable to have a more normal Christmas.

And another potential benefit of a bit more social distancing might be a much less problematic annual flu epidemic because that virus would also find it harder to spread – plus we have a flu vaccination which we can use to reduce that risk further.


It is not going to be easy.  We will have to sacrifice a bit of face-to-face social life for a bit longer.  We will have to measure, monitor, model and tweak the plan as we go.

And one thing we can do immediately is to share the available information in a more informative and less histrionic way than we are seeing at the moment.


Update: Sunday 1st November 2020

Yesterday the Government had to concede that the policy of regional restrictions had failed and bluffing it out and ignoring the scientific advice was, with the clarity of hindsight, an unwise strategy.

In the face of the hard evidence of rapidly rising COVID+ve hospital admissions and deaths, the decision to re-impose a national 4-week lock-down was announced.  This is the only realistic option to prevent overwhelming the NHS at a time of year that it struggles with seasonal influenza causing a peak of admissions and deaths.

Paradoxically, this year the effect of influenza may be less because social distancing will reduce the spread of that as well and also because there is a vaccination for influenza.  Many will have had their flu jab early … I certainly did.

So, what is the predicted effect of a 4 week lock down?  Well, the calibrated model (also used to generate the charts above) estimates that it could indeed suppress the Second Wave and mitigate a nasty COVID-4-Christmas scenario.  But even with it the hospital admissions and associated mortality will continue to increase until the effect kicks in.

Brace yourselves.

Coronavirus


The start of a new year, decade, century or millennium is always associated with a sense of renewal and hope.  Little did we know that in January 2020 a global threat had hatched and was growing in the city of Wuhan, Hubei Province, China.  A virus of the family coronaviridae had mutated and jumped from animal to man where it found a new host and a vehicle to spread itself.   Several weeks later the World became aware of the new threat and in the West … we ignored it.  Maybe we still remember the SARS epidemic which was heralded as a potential global catastrophe but was contained in the Far East and fizzled out.  So, maybe we assumed this SARS-like virus would do the same.

It didn’t.  This mutant was different.  It caused a milder illness and unwitting victims were infectious before they were symptomatic.  And most got better on their own, so they spread the mutant to many other people.  Combine that mutant behaviour with the winter (when infectious diseases spread more easily because we spend more time together indoors), Chinese New Year and global air travel … and we have the perfect recipe for cooking up a global pandemic of a new infectious disease.  But we didn’t know that at the time and we carried on as normal, blissfully unaware of the catastrophe that was unfolding.

By February 2020 it became apparent that the mutant had escaped containment in China and was wreaking havoc in other countries – with Italy high on the casualty list.  We watched in horror at the scenes on television of Italian hospitals overwhelmed with severely ill people fighting for breath as the virus attacked their lungs.  The death toll rose sharply but we still went on our ski holidays and assumed that the English Channel and our Quarantine Policy would protect us.

They didn’t.  This mutant was different.  We now know that it had already silently gained access into the UK and was growing and spreading.  The first COVID-19 death reported in the UK was in early March 2020 and only then did we sit up and start to take notice.  This was getting too close to home.

But it was too late.  The mathematics of how epidemics spread was worked out 100 years ago, not long after the 1918 pandemic of Spanish Flu that killed tens of millions of people before it burned itself out.  An epidemic is like cancer.  By the time it is obvious it is already far advanced because the growth is not linear – it is exponential.

As a systems engineer I am used to building simulation models to reveal the complex and counter-intuitive behaviour of nonlinear systems using the methods first developed by Jay W. Forrester in the 1950’s.  And when I looked up the equations that describe epidemics (on Wikipedia) I saw that I could build a system dynamics model of a COVID-19 epidemic using no more than an Excel spreadsheet.

So I did.  And I got a nasty surprise.  Using the data emerging from China on the nature of the spread of the mutant virus, the incidence of severe illness and the mortality rate … my simple Excel model predicted that, if COVID-19 was left to run its natural course in the UK, then it would burn itself out over several months but the human cost would be 500,000 deaths and the NHS would be completely overwhelmed with a “tsunami of sick”.  And I could be one of them!  The fact that there is no treatment and no vaccine for this novel threat excluded those options.  My basic Excel model confirmed that the only effective option to mitigate this imminent catastrophe was to limit the spread of the virus through social engineering i.e. an immediate and drastic lock-down.  Everyone who was not essential to maintaining core services should “Stay at home, Protect the NHS and Save lives“.  That would become the mantra.  And others were already saying this – epidemiologists whose careers are spent planning for this sort of eventuality.  But despite all this there still seemed to be little sense of urgency, perhaps because their super-sophisticated models predicted that the peak of the UK epidemic would be in mid-June so there was time to prepare.  My basic model predicted that the peak would be in mid-April, in about 4 weeks, and that it was already too late to prevent about 50,000 deaths.

It turns out I was right.  That is exactly what happened.  By mid-March 2020 London was already seeing an exponential rise in hospital admissions, intensive care admissions and deaths and suddenly the UK woke up and panicked.  By that time I had enlisted the help of a trusted colleague who is a public health doctor and who had studied epidemiology, and together we wrote up and published the emerging story as we saw it:

An Acute Hospital Demand Surge Planning Model for the COVID-19 Epidemic using Stock-and-Flow Simulation in Excel: Part 1. Journal of Improvement Science 2020: 68; 1-20.  The link to download the full paper is here.

I also shared the draft paper with another trusted friend and colleague who works for my local clinical commissioning group (CCG) and I asked “Has the CCG a sense of the speed and magnitude of what is about to happen and has it prepared for the tsunami of sick that primary care will need to see?

What then ensued was an almost miraculous emergence of a coordinated and committed team of health care professionals and NHS managers with a single, crystal clear goal:  To design, build and deliver a high-flow, drive-through community-based facility to safely see-and-assess hundreds of patients per day with suspected COVID-19 who were too sick/worried to be managed on the phone, but not sick enough to go to A&E.  This was not a Nightingale Ward – that was a parallel, more public and much more expensive endeavour designed as a spillover for overwhelmed acute hospitals.  Our purpose was to help to prevent that and the time scale was short.  We had three weeks to do it because Easter weekend was the predicted peak of the COVID-19 surge if the national lock-down policy worked as hoped.  No one really had an accurate estimate how effective the lock-down would be and how big the peak of the tsunami of sick would rise as it crashed into the NHS.  So, we planned for the worst and hoped for the best.  The Covid Referral Centre (CRC) was an insurance policy and we deliberately over-engineered it use to every scrap of space we had been offered in a small car park on the south side of the NEC site.

The CRC needed to open by Sunday 12th April 2020 and we were ready, but the actual opening was delayed by NHS bureaucracy and politics.  It did eventually open on 22nd April 2020, just four weeks after we started, and it worked exactly as designed.  The demand was, fortunately, less than our worst case scenario; partly because we had missed the peak by 10 days and we opened the gates to a falling tide; and partly because the social distancing policy had been more effective than hoped; and partly because it takes time for risk-averse doctors to develop trust and to change their ingrained patterns of working.  A drive-thru COVID-19 see-and-treat facility? That was innovative and untested!!

The CRC expected to see a falling demand as the first wave of COVID-19 washed over, and that exactly is what happened.  So, as soon as that prediction was confirmed, the CRC was progressively repurposed to provide other much needed services such as drive-thru blood tests, drive-thru urgent care, and even outpatient clinics in the indoor part of the facility.

The CRC closed its gates to suspected COVID-19 patients on 31st July 2020, as planned and as guided by the simple Excel computer model.

This is health care systems engineering in action.

And the simple Excel model has been continuously re-calibrated as fresh evidence has emerged.  The latest version predicts that a second peak of COVID-19 (that is potentially worse than the first) will happen in late summer or autumn if social distancing is relaxed too far (see below).

But we don’t know what “too far” looks like in practical terms.  Oh, and a second wave could kick off just just when we expect the annual wave of seasonal influenza to arrive.  Or will it?  Maybe the effect of social distancing for COVID-19 in other countries will suppress the spread of seasonal flu as well?  We don’t know that either but the data of the incidence of flu from Australia certainly supports that hypothesis.

We may need a bit more health care systems engineering in the coming months. We shall see.

Oh, and if we are complacent enough to think a second wave could never happen in the UK … here is what is happening in Australia.

A New Decade of Hope

At the end of the decade it is the time to reflect on what has happened in the past before planning for the future.  As always, the hottest topic in health care is the status of the emergency care services, and we have the data – it is public.

This shows the last 9 years of aggregate, monthly data for Scotland (red), England (blue), Wales (teal) and N.Ireland (orange).  It does not take a data scientist and a supercomputer to interpret – there is a progressive system-wide progressive deterioration year-on-year.  The winter dips are obvious and the worst of these affect all four countries indicating a systemic cause … the severity of the winter weather/illness cycle -i.e. the Flu Season.

What this chart also says is that all the effort and money being expended in winter planning is not working well enough – and the nagging question is “Why not?”

Many claim that it is the predicted demographic “time bomb” … but if it is predicted then how come it has not been mitigated?

Many claim that it is a growing funding gap … but most NHS funding is spent on staff and  and training nurses, doctors and allied health professionals (AHPs) takes time.  Again, a predicted eventuality that has not been mitigated.

This looming crisis in a lack of heath care workers is a global health challenge … and is described by Mark Britnell in “Human – Solving the global workforce crisis in healthcare“.

Mark was the CEO of University Hospitals Birmingham from 2000 and has worked for KPMG since 2009 in a global health role so is well placed to present a strategic overview.


But, health care workers deliver care to patients – one at a time.  They are not responsible for designing the system of health care delivery; or ensuring all the pieces of that vast jigsaw link up and work in a synchronised way; or for the long term planning needed to mitigate the predictable effects of demographic drift and technology advances.

Who is responsible for that challenge and are they adequately trained to do it?

The evidence would appear to suggest that there is a gap that either no one has noticed or that no one is prepared to discuss.  An Undiscussable?


The global gap in the healthcare workforce is predicted to be about 20% by 2030.  That is a big gap to fill because with the NHS workforce of 1.3 million people – that implies training 260,000 new staff of all types in the next 10 years, in addition to replacing those that leave.

Assuming the processes and productivity stay as they are now.

So, perhaps there is a parallel approach, one that works more quickly and a lower cost.


When current health care processes are examined through a flow engineering lens they are found to be poorly designed. They are both ineffective (do not reliably deliver the intended outcome) and inefficient (waste a lot of resources in delivering any outcome).  Further examination reveals that the processes have never been designed … they have evolved.

And just because something is described as current practice does not prove that it is good design.

An expected symptom of a poorly designed process is a combination of chronic queues, delays, chaos, reactive fire-fighting and burnout.  And the assumed cause is often lack of resources because when extra resource is added the queues and chaos subsides, for a while.

But, if the unintentional poor design of the process is addressed then a sequence of surprising things can happen. The chaos evaporates immediately without any extra resources. A feeling of calm is restored and the disruptive fire-fighting stops. The health care workers are able to focus on what they do best and pride-in-work is restored. Patient experience improves and staff feel that feedback and become more motivated. The complaining abates, sickness and absence falls, funded-but-hard-to-recruit-to posts are refilled and there are more hands on the handle of a more efficient/effective/productive pump.  The chronic queues and delays start to melt away – as if by magic.

And if that all sounds totally impossible then here are a couple of recent, real-world case studies written by different teams in different cities in different parts of the UK.  One from cancer care and one from complex diabetic care.

They confirm that this chaos-to-calm transformation is possible.

So, is there a common thread that links these two examples?

Yes, there is, and once again the spotlight is shone on the Undiscussable Gap … the fact that the NHS does not appear to have the embedded capability to redesign itself.

There is a hidden workforce gap that none of the existing programmes will address – because it is not a lack of health care workers – it is a lack of appropriately trained health care manager-designers.


The Undiscussable Elephant Is In The Room … the Undiscussable Emperor Has No Clothes.

And if history teaches us anything, Necessity is the Mother of Innovation and the chart at the top of the page shows starkly that there is an Growing Urgent Necessity.

And if two embedded teams can learn this magic trick of flipping chaos into calm at no cost, then perhaps others can too?

Welcome to the New Decade of Hope and Health Care Systems Engineering.

Co-Diagnosis, Co-Design and Co-Delivery

The thing that gives me the biggest buzz when it comes to improvement is to see a team share their story of what they have learned-by-doing; and what they have delivered that improves their quality of life and the quality of their patients’ experience.

And while the principles that underpin these transformations are generic, each story is unique because no two improvement challenges are exactly the same and no two teams are exactly the same.

The improvement process is not a standardised production line.  It is much more organic and adaptive experience and that requires calm, competent, consistent, compassionate and courageous facilitation.

So when I see a team share their story of what they have done and learned then I know that behind the scenes there will have been someone providing that essential ingredient.

This week a perfect example of a story like this was shared.

It is about the whole team who run the Diabetic Complex Cases Clinic at Guy’s and St. Thomas’ NHS Trust in London.  Everyone involved in the patient care was involved.  It tells the story of how they saw what might be possible and how they stepped up to the challenge of learning to apply the same principles in their world.  And it tells their story of what they diagnosed, what they designed and what they delivered.

The facilitation and support was provided Ellen Pirie who works for the Health Innovation Network (HIN) in South London and who is a Level 2 Health Care Systems Engineer.

And the link to the GSTT Diabetic Complex Clinic Team story is here.

Restoring Pride-in-Work

In 1986, Dr Don Berwick from Boston attended a 4-day seminar run by Dr W. Edwards Deming in Washington.  Dr Berwick was a 40 year old paediatrician who was also interested in health care management and improving quality and productivity.  Dr Deming was an 86 year old engineer and statistician who, when he was in his 40’s, helped the US to improve the quality and productivity of the industrial processes supporting the US and Allies in WWII.

Don Berwick describes attending the seminar as an emotionally challenging life-changing experience when he realised that his well-intended attempts to improve quality by inspection-and-correction was a counterproductive, abusive approach that led to fear, demotivation and erosion of pride-in-work.  His blinding new clarity of insight led directly to the Institute of Healthcare Improvement in the USA in the early 1990’s.

One of the tenets of Dr Deming’s theories is that the ingrained beliefs and behaviours that erode pride-in-work also lead to the very outcomes that management do not want – namely conflict between managers and workers and economic failure.

So, an explicit focus on improving pride-in-work as an early objective in any improvement exercise makes very good economic sense, and is a sign of wise leadership and competent management.


Last week a case study was published that illustrates exactly that principle in action.  The important message in the title is “restore the calm”.

One of the most demotivating aspects of health care that many complain about is the stress caused a chaotic environment, chronic crisis and perpetual firefighting.  So, anything that can restore calm will, in principle, improve motivation – and that is good for staff, patients and organisations.

The case study describes, in detail, how calm was restored in a chronically chaotic chemotherapy day unit … on Weds, June 19th 2019 … in one day and at no cost!

To say that the chemotherapy nurses were surprised and delighted is an understatement.  They were amazed to see that they could treat the same number of patients, with the same number of staff, in the same space and without the stress and chaos.  And they had time to keep up with the paperwork; and they had time for lunch; and they finished work 2 hours earlier than previously!

Such a thing was not possible surely? But here they were experiencing it.  And their patients noticed the flip from chaos-to-strangely-calm too.

The impact of the one-day-test was so profound that the nurses voted to adopt the design change the following week.  And they did.  And the restored calm has been sustained.


What happened next?

The chemotherapy nurses were able to catch up with their time-owing that had accumulated from the historical late finishes.  And the problem of high staff turnover and difficultly in recruitment evaporated.  Highly-trained chemotherapy nurses who had left because of the stressful chaos now want to come back.  Pride-in-work has been re-established.  There are no losers.  It is a win-win-win result for staff, patients and organisations.


So, how was this “miracle” achieved?

Well, first of all it was not a miracle.  The flip from chaos-to-calm was predicted to happen.  In fact, that was the primary objective of the design change.

So, how what this design change achieved?

By establishing the diagnosis first – the primary cause of the chaos – and it was not what the team believed it was.  And that is the reason they did not believe the design change would work; and that is the reason they were so surprised when it did.

So, how was the diagnosis achieved?

By using an advanced systems engineering technique called Complex Physical System (CPS) modelling.  That was the game changer!  All the basic quality improvement techniques had been tried and had not worked – process mapping, direct observation, control charts, respectful conversations, brainstorming, and so on.  The system structure was too complicated. The system behaviour was too complex (i.e. chaotic).

What CPS revealed was that the primary cause of the chaotic behaviour was the work scheduling policy.  And with that clarity of focus, the team were able to re-design the policy themselves using a simple paper-and-pen technique.  That is why it cost nothing to change.

So, why hadn’t they been able to do this before?

Because systems engineering is not a taught component of the traditional quality improvement offerings.  Healthcare is rather different to manufacturing! As the complexity of the health care system increases we need to learn the more advanced tools that are designed for this purpose.

What is the same is the principle of restoring pride-in-work and that is what Dr Berwick learned from Dr Deming in 1986, and what we saw happen on June 19th, 2019.

To read the story of how it was done click here.

Carveoutosis Multiforme Fulminans

This is the name given to an endemic, chronic, systemic, design disease that afflicts the whole NHS that very few have heard of, and even fewer understand.

This week marked two milestones in the public exposure of this elusive but eminently treatable health care system design illness that causes queues, delays, overwork, chaos, stress and risk for staff and patients alike.

The first was breaking news from the team in Swansea led by Chris Jones.

They had been grappling with the wicked problem of chronic queues, delays, chaos, stress, high staff turnover, and escalating costs in their Chemotherapy Day Unit (CDU) at the Singleton Hospital.

The breakthrough came earlier in the year when we used the innovative eleGANTT® system to measure and visualise the CDU chaos in real-time.

This rich set of data enabled us, for the first time, to apply a powerful systems engineering  technique called counterfactual analysis which revealed the primary cause of the chaos – the elusive and counter-intuitive design disease carvoutosis multiforme fulminans.

And this diagnosis implied that the chaos could be calmed quickly and at no cost.

But that news fell on slightly deaf ears because, not surprisingly, the CDU team were highly sceptical that such a thing was possible.

So, to convince them we needed to demonstrate the adverse effect of carveoutosis in a way that was easy to see.  And to do that we used some advanced technology: dice and tiddly winks.

The reaction of the CDU nurses was amazing.  As soon as they ‘saw’ it they clicked and immediately grasped how to apply it in their world.  They designed the change they needed to make in a matter of minutes.


But the proof-of-the-pudding-is-in-the eating and we arranged a one-day-test-of-change of their anti-carveout design.

The appointed day arrived, Wednesday 19th June.  The CDU nurses implemented their new design (which cost nothing to do).  Within an hour of the day starting they reported that the CDU was strangely calm.   And at the end of the day they reported that it had remained strangely calm all day; and that they had time for lunch; and that they had time to do all their admin as they went; and that they finished on time; and that the patients did not wait for their chemotherapy; and that the patients noticed the chaos-to-calm transformation too.

They treated just the same number of patients as usual with the same staff, in the same space and with the same equipment.  It cost nothing to make the change.

To say they they were surprised is an understatement!  They were so surprised and so delighted that they did not want to go back to the old design – but they had to because it was only a one-day-test-of-change.

So, on Thursday and Friday they reverted back to the carveoutosis design.  And the chaos returned.  That nailed it!  There was a riot!!  The CDU nurses refused to wait until later in the year to implement their new design and they voted unanimously to implement it from the following Monday.  And they did.  And calm was restored.


The second milestone happened on Thursday 11th July when we ran a Health Care Systems Engineering (HCSE) Masterclass on the very same topic … chronic systemic carveoutosis multiforme fulminans.

This time we used the dice and tiddly winks to demonstrate the symptoms, signs and the impact of treatment.  Then we explored the known pathophysiology of this elusive and endemic design disease in much more depth.

This is health care systems engineering in action.

It seems to work.

Leverage Points

One of the most surprising aspects of systems is how some big changes have no observable effect and how some small changes are game-changers. Why is that?

The technical name for this phenomenon is leverage points.

When a nudge is made at a leverage point in a real system the impact is amplified – so a small cause can have a big effect.

And when a big kick is made where there is no leverage point the effort is dissipated. Like flogging a dead horse.

Other names for leverage points are triggers, buttons, catalysts, fuses etc.


The fact that there is a big effect does not imply it is a good effect.

Poking a leverage point can trigger a catastrophe just as it can trigger a celebration. It depends on how it is poked.

Perhaps that is one reason people stay away from them.

But when our heath care system performance is in decline, if we do nothing or if we act but stay away from leverage points (i.e. flog the dead horse) then we will deny ourselves the opportunity of improvement.

So, we need a way to (a) identify the leverage points and (b) know how to poke them positively and know how to not poke them into delivering a catastrophe.


Here is a couple of real examples.


The time-series chart above shows the A&E performance of a real acute trust.  Notice the pattern as we read left-to-right; baseline performance is OKish and dips in the winters, and the winter dips get deeper but the baseline performance recovers.  In April 2015 (yellow flag) the system behaviour changes, and it goes into a steady decline with added winter dips.  This is the characteristic pattern of poking a leverage point in the wrong way … and the fact it happened at the start of the financial year suggests that Finance was involved.  Possibly triggered by a cost-improvement programme (CIP) action somewhere else in the system.  Save a bit of money here and create a bigger problem over there. That is how systems work. Not my budget so not my problem.

Here is a different example, again from a real hospital and around the same time.  It starts with a similar pattern of deteriorating performance and there is a clear change in system behaviour in Jan 2015.  But in this case the performance improves and stays improved.  Again, the visible sign of a leverage point being poked but this time in a good way.

In this case I do know what happened.  A contributory cause of the deteriorating performance was correctly diagnosed, the leverage point was identified, a change was designed and piloted, and then implemented and validated.  And it worked as predicted.  It was not a fluke.  It was engineered.


So what is the reason that the first example much more commonly seen than the second?

That is a very good question … and to answer it we need to explore the decision making process that leads up to these actions because I refuse to believe that anyone intentionally makes decisions that lead to actions that lead to deterioration in health care performance.

And perhaps we can all learn how to poke leverage points in a positive way?

Commissioned Improvement

This recent tweet represents a significant milestone.  It formally recognises and celebrates in public the impact that developing health care systems engineering (HCSE) capability has had on the culture of the organisation.

What is also important is that the HCSE training was not sought and funded by the Trust, it was discovered by chance and funded by their commissioners, the local clinical commissioning group (CCG).


The story starts back in the autumn of 2017 and, by chance, I was chatting with Rob, a friend-of-a-friend, about work. As you do. It turned out that Rob was the CCG Lead for Unscheduled Care and I was describing how HCSE can be applied in any part of any health care system; primary care, secondary care, scheduled, unscheduled, clinical, operational or whatever.  They are all parts of the same system and the techniques and tools of improvement-by-design are generic.  And I described lots of real examples of doing just that and the sustained improvements that had followed.

So he asked “If you were to apply this approach to unscheduled care in a large acute trust how would you do it?“.  My immediate reply was “I would start by training the front line teams in the HCSE Level 1 stuff, and the first step is to raise awareness of what is possible.  We do that by demonstrating it in practice because you have to see it and experience it to believe it.

And so that is what we did.

The CCG commissioned a one-year HCSE Level 1 programme for four teams at University Hospitals of North Midlands (UHNM) and we started in January 2018 with some One Day Flow Workshops.

The intended emotional effect of a Flow Workshop is surprise and delight.  The challenge for the day is to start with a simulated, but very realistic, one-stop outpatient clinic which is chaotic and stressful for everyone.  And with no prior training the delegates transform it into a calm and enjoyable experience using the HCSE approach.  It is called emergent learning.  We have run dozens of these workshops and it has never failed.

After directly experiencing HCSE working in practice the teams that stepped up to the challenge were from ED, Transformation, Ambulatory Emergency Care and Outpatients.


The key to growing HCSE capability is to assemble small teams, called micro-system design teams (MSDTs) and to focus on causes that fall inside their circle of control.

The MSDT sessions need to be regular, short, and facilitated by an experienced HCSE who has seen it, done it and can teach it.

In UHNM, the Transformation team divided themselves between the front-line teams and they learned HCSE together.  Here’s a picture of the ED team … left to right we have Alex, Mark and Julie (ED consultants) then Steve and Janina (Transformation).  The essential tools are a big table, paper, pens, notebooks, coffee and a laptop/projector.

The purpose of each session is empirical learning-by-doing i.e. using a real improvement challenge to learn and practice the method so that before the end of the programme the team can confidently “fly” solo.

That is the key to continued growth and sustained improvement.  The HCSE capability needs to become embedded.

It is good fun and immensely rewarding to see the “ah ha” moments and improvements happen as the needle on the emotometer moves from “Can’t Do” to “Can Do”.

Metamorphosis is re-arranging what you already have in a way that works better.


The tweet is objective evidence that demonstrates the HCSE programme delivers as designed.  It is fit-for-purpose.  It is called validation.

The other objective evidence of effectiveness comes from the learning-by-doing projects themselves.  And for an individual to gain a coveted HCSE Level 1 Certificate of Competency requires writing up to a publishable quality and sharing the story. Warts-and-all.

To read the full story of just click here

And what started this was the CCG who had the strategic vision, looked outside themselves for innovative approaches, and demonstrated the courage to take a risk.

Commissioned Improvement.

Measuring Chaos

One of the big hurdles in health care improvement is that most of the low hanging fruit have been harvested.

These are the small improvement projects that can be done quickly because as soon as the issue is made visible to the stakeholders the cause is obvious and the solution is too.

This is where kaizen works well.

The problem is that many health care issues are rather more difficult because the process that needs improving is complicated (i.e. it has lots of interacting parts) and usually exhibits rather complex behaviour (e.g. chaotic).

One good example of this is a one stop multidisciplinary clinic.

These are widely used in healthcare and for good reason.  It is better for a patient with a complex illness, such as diabetes, to be able to access whatever specialist assessment and advice they need when they need it … i.e. in an outpatient clinic.

The multi-disciplinary team (MDT) is more effective and efficient when it can problem-solve collaboratively.

The problem is that the scheduling design of a one stop clinic is rather trickier than a traditional simple-but-slow-and-sequential new-review-refer design.

A one stop clinic that has not been well-designed feels chaotic and stressful for both staff and patients and usually exhibits the paradoxical behaviour of waiting patients and waiting staff.


So what do we need to do?

We need to map and measure the process and diagnose the root cause of the chaos, and then treat it.  A quick kaizen exercise should do the trick. Yes?

But how do we map and measure the chaotic behaviour of lots of specialists buzzing around like blue-***** flies trying to fix the emergent clinical and operational problems on the hoof?  This is not the linear, deterministic, predictable, standardised machine-dominated production line environment where kaizen evolved.

One approach might be to get the staff to audit what they are doing as they do it. But that adds extra work, usually makes the chaos worse, fuels frustration and results in a very patchy set of data.

Another approach is to employ a small army of observers who record what happens, as it happens.  This is possible and it works, but to be able to do this well requires a lot of experience of the process being observed.  And even if that is achieved the next barrier is the onerous task of transcribing and analysing the ocean of harvested data.  And then the challenge of feeding back the results much later … i.e. when the sands have shifted.


So we need a different approach … one that is able to capture the fine detail of a complex process in real-time, with minimal impact on the process itself, and that can process and present the wealth of data in a visual easy-to-assess format, and in real-time too.

This is a really tough design challenge …
… and it has just been solved.

Here are two recent case studies that describe how it was done using a robust systems engineering method.

Abstract

Abstract

System Dynamics

On Thursday we had a very enjoyable and educational day.  I say “we” because there were eleven of us learning together.

There was Declan, Chris, Lesley, Imran, Phil, Pete, Mike, Kate, Samar and Ellen and me (behind the camera).  Some are holding their long-overdue HCSE Level-1 Certificates and Badges that were awarded just before the photo was taken.

The theme for the day was System Dynamics which is a tried-and-tested approach for developing a deep understanding of how a complex adaptive system (CAS) actually works.  A health care system is a complex adaptive system.

The originator of system dynamics is Jay Wright Forrester who developed it around the end of WW2 (i.e. about 80 years ago) and who later moved to MIT.  Peter Senge, author of The Fifth Discipline was part of the same group as was Donella Meadows who wrote Limits to Growth.  Their dream was much bigger – global health – i.e. the whole planet not just the human passengers!  It is still a hot topic [pun intended].


The purpose of the day was to introduce the team of apprentice health care system engineers (HCSEs) to the principles of system dynamics and to some of its amazing visualisation and prediction techniques and tools.

The tangible output we wanted was an Excel-based simulation model that we could use to solve a notoriously persistent health care service management problem …

How to plan the number of new and review appointment slots needed to deliver a safe, efficient, effective and affordable chronic disease service?

So, with our purpose in mind, the problem clearly stated, and a blank design canvas we got stuck in; and we used the HCSE improvement-by-design framework that everyone was already familiar with.

We made lots of progress, learned lots of cool stuff, and had lots of fun.

We didn’t quite get to the final product but that was OK because it was a very tough design assignment.  We got 80% of the way there though which is pretty good in one day from a standing start.  The last 20% can now be done by the HCSEs themselves.

We were all exhausted at the end.  We had worked hard.  It was a good day.


And I am already looking forward to the next HCSE Masterclass that will be in about six weeks time.  This one will address another chronic, endemic, systemic health care system “disease” called carveoutosis multiforme fulminans.

Warts-and-All

This week saw the publication of a landmark paper – one that will bring hope to many.  A paper that describes the first step of a path forward out of the mess that healthcare seems to be in.  A rational, sensible, practical, learnable and enjoyable path.


This week I also came across an idea that triggered an “ah ha” for me.  The idea is that the most rapid learning happens when we are making mistakes about half of the time.

And when I say ‘making a mistake’ I mean not achieving what we predicted we would achieve because that implies that our understanding of the world is incomplete.  In other words, when the world does not behave as we expect, we have an opportunity to learn and to improve our ability to make more reliable predictions.

And that ability is called wisdom.


When we get what we expect about half the time, and do not get what we expect about the other half of the time, then we have the maximum amount of information that we can use to compare and find the differences.

Was it what we did? Was it what we did not do? What are the acts and errors of commission and omission? What can we learn from those? What might we do differently next time? What would we expect to happen if we do?


And to explore this terrain we need to see the world as it is … warts and all … and that is the subject of the landmark paper that was published this week.


The context of the paper is improvement of cancer service delivery, and specifically of reducing waiting time from referral to first appointment.  This waiting is a time of extreme anxiety for patients who have suspected cancer.

It is important to remember that most people with suspected cancer do not have it, so most of the work of an urgent suspected cancer (USC) clinic is to reassure and to relieve the fear that the spectre of cancer creates.

So, the sooner that reassurance can happen the better, and for the unlucky minority who are diagnosed with cancer, the sooner they can move on to treatment the better.

The more important paragraph in the abstract is the second one … which states that seeing the system behaviour as it is, warts-and-all,  in near-real-time, allows us to learn to make better decisions of what to do to achieve our intended outcomes. Wiser decisions.

And the reason this is the more important paragraph is because if we can do that for an urgent suspected cancer pathway then we can do that for any pathway.


The paper re-tells the first chapter of an emerging story of hope.  A story of how an innovative and forward-thinking organisation is investing in building embedded capability in health care systems engineering (HCSE), and is now delivering a growing dividend.  Much bigger than the investment on every dimension … better safety, faster delivery, higher quality and more affordability. Win-win-win-win.

The only losers are the “warts” – the naysayers and the cynics who claim it is impossible, or too “wicked”, or too difficult, or too expensive.

Innovative reality trumps cynical rhetoric … and the full abstract and paper can be accessed here.

So, well done to Chris Jones and the whole team in ABMU.

And thank you for keeping the candle of hope alight in these dark, stormy and uncertain times for the NHS.

Congratulations Kate!

This week, it was my great pleasure to award the first Health Care Systems Engineering (HCSE) Level 2 Medal to Dr Kate Silvester, MBA, FRCOphth.

Kate is internationally recognised as an expert in health care improvement and over more than two decades has championed the adoption of improvement methods such as Lean and Quality Improvement in her national roles in the Modernisation Agency and then the NHS Institute for Innovation and Improvement.

Kate originally trained as a doctor and then left the NHS to learn manufacturing systems engineering with Lucas and Airbus.  Kate then brought these very valuable skills back with her into the NHS when she joined the Cancer Services Collaborative.

Kate is co-founder of the Journal of Improvement Science and over the last five years has been highly influential in the development of the Health Care Systems Engineering Programme – the first of its kind in the world that is designed by clinicians for clinicians.

The HCSE Programme is built on the pragmatic See One-Do Some-Teach Many principle of developing competence and confidence through being trained and coached by a more experienced practitioner while doing projects of increasing complexity and training and coaching others who are less experienced.

Competence is based on evidence-of-effectiveness, and Kate has achieved HCSE Level 2 by demonstrating that she can do HCSE and that she can teach and coach others how to do HCSE as well.

To illustrate, here is a recent FHJ paper that Kate has authored which illustrates the HCSE principles applied in practice in a real hospital.  This work was done as part of the Health Foundation’s Flow, Cost and Quality project that Kate led and recent evidence proves that the improvements have sustained and spread.  South Warwickshire NHS Foundation Trust is now one of the top-performing Trusts in the NHS.

More recently, Kate has trained and coached new practitioners in Exeter and North Devon who have delivered improvements and earned their HCSE 1 wings.

Congratulations Kate!

From Push to Pull

One of the most frequent niggles that I hear from patients is the difficultly they have getting an appointment with their general practitioner.  I too have personal experience of the distress caused by the ubiquitous “Phone at 8AM for an Appointment” policy, so in June 2018 when I was approached to help a group of local practices redesign their appointment booking system I said “Yes, please!


What has emerged is a fascinating, enjoyable and rewarding journey of co-evolution of learning and co-production of an improved design.  The multi-skilled design team (MDT) we pulled together included general practitioners, receptionists and practice managers and my job was to show them how to use the health care systems engineering (HCSE) framework to diagnose, design, decide and deliver what they wanted: A safe, calm, efficient, high quality, value-4-money appointment booking service for their combined list of 50,000 patients.


This week they reached the start of the ‘decide and deliver‘ phase.  We have established the diagnosis of why the current booking system is not delivering what we all want (i.e. patients and practices), and we have assembled and verified the essential elements of an improved design.

And the most important outcome for me is that the Primary Care MDT now feel confident and capable to decide what and how to deliver it themselves.   That is what I call embedded capability and achieving it is always an emotional roller coaster ride that we call The Nerve Curve.

What we are dealing with here is called a complex adaptive system (CAS) which has two main components: Processes and People.  Both are complicated and behave in complex ways.  Both will adapt and co-evolve over time.  The processes are the result of the policies that the people produce.  The policies are the result of the experiences that the people have and the explanations that they create to make intuitive sense of them.

But, complex systems often behave in counter-intuitive ways, so our intuition can actually lead us to make unwise decisions that unintentionally perpetuate the problem we are trying to solve.  The name given to this is a wicked problem.

A health care systems engineer needs to be able to demonstrate where these hidden intuitive traps lurk, and to explain what causes them and how to avoid them.  That is the reason the diagnosis and design phase is always a bit of a bumpy ride – emotionally – our Inner Chimp does not like to be challenged!  We all resist change.  Fear of the unknown is hard-wired into us by millions of years of evolution.

But we know when we are making progress because the “ah ha” moments signal a slight shift of perception and a sudden new clarity of insight.  The cognitive fog clears a bit and a some more of the unfamiliar terrain ahead comes into view.  We are learning.

The Primary Care MDT have experienced many of these penny-drop moments over the last six months and unfortunately there is not space here to describe them all, but I can share one pivotal example.


A common symptom of a poorly designed process is a chronically chaotic queue.

[NB. In medicine the term chronic means “long standing”.  The opposite term is acute which means “recent onset”].

Many assume, intuitively, that the cause of a chronically chaotic queue is lack of capacity; hence the incessant calls for ‘more capacity’.  And it appears that we have learned this reflex response by observing the effect of adding capacity – which is that the queue and chaos abate (for a while).  So that proves that lack of capacity was the cause. Yes?

Well actually it doesn’t.  Proving causality requires a bit more work.  And to illustrate this “temporal association does not prove causality trap” I invite you to consider this scenario.

I have a headache => I take a paracetamol => my headache goes away => so the cause of my headache was lack of paracetamol. Yes?

Errr .. No!

There are many contributory causes of chronically chaotic queues and lack of capacity is not one of them because the queue is chronic.  What actually happens is that something else triggers the onset of chaos which then consumes the very resource we require to avoid the chaos.  And once we slip into this trap we cannot escape!  The chaos-perpretuating behaviour we observe is called fire-fighting and the necessary resource it consumes is called resilience.


Six months ago, the Primary Care MDT believed that the cause of their chronic appointment booking chaos was a mismatch between demand and capacity – i.e. too much patient demand for the appointment capacity available.  So, there was a very reasonable resistance to the idea of making the appointment booking process easier for patients – they justifiably feared being overwhelmed by a tsunami of unmet need!

Six months on, the Primary Care MDT understand what actually causes chronic queues and that awareness has been achieved by a step-by-step process of explanation and experimentation in the relative safety of the weekly design sessions.

We played simulation games – lots of them.

One particularly memorable “Ah Ha!” moment happened when we played the Carveout Game which is done using dice, tiddly-winks, paper and coloured-pens.  No computers.  No statistics.  No queue theory gobbledygook.  No smoke-and-mirrors.  No magic.

What the Carveout Game demonstrates, practically and visually, is that an easy way to trigger the transition from calm-efficiency to chaotic-ineffectiveness is … to impose a carveout policy on a system that has been designed to achieve optimum efficiency by using averages.  Boom!  We slip on the twin banana skins of the Flaw-of-Averages and Sub-Optimisation, slide off the performance cliff, and career down the rocky slope of Chronic Chaos into the Depths of Despair – from which we cannot then escape.

This visual demonstration was a cognitive turning point for the MDT.  They now believed that there is a rational science to improvement and from there we were on the step-by-step climb to building the necessary embedded capability.


It now felt like the team were pulling what they needed to know.  I was no longer pushing.  We had flipped from push-to-pull.  That is called the tipping point.

And that is how health care systems engineering (HCSE) works.


Health care is a complex adaptive system, and what a health care systems engineer actually “designs” is a context-sensitive  incubator that nurtures the seeds of innovation that already exist in the system and encourages them to germinate, grow and become strong enough to establish themselves.

That is called “embedded improvement-by-design capability“.

And each incubator needs to be different – because each system is different.  One-solution-fits-all-problems does not work here just as it does not in medicine.  Each patient is both similar and unique.


Just as in medicine, first we need to diagnose the actual, specific cause;  second we need to design some effective solutions; third we need to decide which design to implement and fourth we need to deliver it.

This how-to-do-it framework feels counter-intuitive.  If it was obvious we would already be doing it.  But the good news is that the evidence proves that it works and that anyone can learn how to do HCSE.

Spring the Trap

trapped_in_question_PA_300_wht_3174[Beeeeeep] It was time for the weekly coaching chat.  Bob, a seasoned practitioner of flow science, dialled into the teleconference with Lesley.

<Bob> Good afternoon Lesley, can I suggest a topic today?

<Lesley> Hi Bob. That would be great, and I am sure you have a good reason for suggesting it.

<Bob> I would like to explore the concept of time-traps again because it something that many find confusing. Which is a shame because it is often the key to delivering surprisingly dramatic and rapid improvements; at no cost.

<Lesley> Well doing exactly that is what everyone seems to be clamouring for so it sounds like a good topic to me.  I confess that I am still not confident to teach others about time-traps.

<Bob> OK. Let us start there. Can you describe what happens when you try to teach it?

<Lesley> Well, it seems to be when I say that the essence of a time-trap is that the lead time and the flow are independent.  For example, the lead time stays the same even though the flow is changing.  That really seems to confuse people; and me too if I am brutally honest.

<Bob> OK.  Can you share the example that you use?

<Lesley> Well it depends on who I am talking to.  I prefer to use an example that they are familiar with.  If it is a doctor I might use the example of the ward round.  If it is a manager I might use the example of emails or meetings.

<Bob> Assume I am a doctor then – an urgent care physician.

<Lesley> OK.  Let us take it that I have done the 4N Chart and the  top niggle is ‘Frustration because the post-take ward round takes so long that it delays the discharge of patients who then often have to stay an extra night which then fills up the unit with waiting patients and we get blamed for blocking flow from A&E and causing A&E breaches‘.

<Bob> That sounds like a good example. What is the time-trap in that design?

<Lesley> The  post-take ward round.

<Bob> And what justification is usually offered for using that design?

<Lesley> That it is a more efficient use of the expensive doctor’s time if the whole team congregate once a day and work through all the patients admitted over the previous 24 hours.  They review the presentation, results of tests, diagnosis, management plans, response to treatment, decide the next steps and do the paperwork.

<Bob> And why is that a time-trap design?

<Lesley> Because  it does not matter if one patient is admitted or ten, the average lead time from the perspective of the patient is the same – about one day.

<Bob> Correct. So why is the doctor complaining that there are always lots of patients to see?

<Lesley> Because there are. The emergency short stay ward is usually full by the time the post take ward round happens.

<Bob> And how do you present the data that shows the lead time is independent of the flow?

<Lesley> I use a Gantt chart, but the problem I find is that there is so much variation and queue jumping it is not blindingly obvious from the Gantt chart that there is a time-trap. There is so much else clouding the picture.

<Bob>Is that where the ‘but I do not understand‘ conversation starts?

<Lesley> Yes. And that is where I get stuck too.

<Bob> OK.  The issue here is that a Gantt chart is not the ideal visualisation tool when there are lots of crossed-streams, frequently changing priorities, and many other sources of variation.  The Gantt chart gets ‘messy’.   The trick here is to use a Vitals Chart – and you can derive that from the same data you used for the Gantt chart.

<Lesley> You are right about the Gantt chart getting messy. I have seen massive wall-sized Gantt charts that are veritable works-of-art and that have taken hours to create; and everyone standing looking at it and saying ‘Wow! That is an impressive piece of work.  So what does it tell us? How does it help?

<Bob> Yes, I have experienced that too. I think what happens is that those who do the foundation training and discover the Gantt chart then try to use it to solve every flow problem – and in their enthusiasm they discount any warning advice.  Desperation drives over-inflated expectation which is often the pre-cursor to disappointment, and then disillusionment.  The Nerve Curve again.

<Lesley> But a Vitals Chart is an HCSE level technique and you said that we do not need to put everyone through HCSE training.

<Bob>That is correct. I am advocating an HCSE-in-training using a Vitals Chart to explain the concept of a time-trap so that everyone understands it well enough to see the flaw in the design.

<Lesley> Ah ha!  Yes, I see.  So what is my next step?

<Bob> I will let you answer that.

<Lesley> Um, let me think.

The outcome I want is everyone understands the concept of a time-trap well enough to feel comfortable with trying a time-trap-free design because they can see the benefits for them.

And to get that depth of understanding I need to design a table top exercise that starts with a time-trap design and generates raw data that we can use to build both a Gantt chart and the Vitals Chart; so I can point out and explain the characteristic finger-print of a time trap.

And then we can ‘test’ an alternative time-trap-free design and generate the prognostic Gantt and Vitals Chart and compare with the baseline diagnostic charts to reveal the improvement.

<Bob> That sounds like a good plan to me.  And if you do that, and your team apply it to a real improvement exercise, and you see the improvement and you share the story, then that will earn you a coveted HCSE Certificate of Competency.

<Lesley>Ah ha! Now I understand the reason you suggested this topic!  I am on the case!