BaseLine – HCSE Blog

08/06/2019

Commissioned Improvement

This recent tweet represents a significant milestone. It formally recognises and celebrates in public the impact that developing health care systems engineering (HCSE) capability has had on the culture of the organisation.

What is also important is that the HCSE training was not sought and funded by the Trust, it was discovered by chance and funded by their commissioners, the local clinical commissioning group (CCG).

The story starts back in the autumn of 2017 and, by chance, I was chatting with Rob, a friend-of-a-friend, about work. As you do. It turned out that Rob was the CCG Lead for Unscheduled Care and I was describing how HCSE can be applied in any part of any health care system; primary care, secondary care, scheduled, unscheduled, clinical, operational or whatever. They are all parts of the same system and the techniques and tools of improvement-by-design are generic. And I described lots of real examples of doing just that and the sustained improvements that had followed.

So he asked “If you were to apply this approach to unscheduled care in a large acute trust how would you do it?“. My immediate reply was “I would start by training the front line teams in the HCSE Level 1 stuff, and the first step is to raise awareness of what is possible. We do that by demonstrating it in practice because you have to see it and experience it to believe it.“

And so that is what we did.

The CCG commissioned a one-year HCSE Level 1 programme for four teams at University Hospitals of North Midlands (UHNM) and we started in January 2018 with some One Day Flow Workshops.

The intended emotional effect of a Flow Workshop is surprise and delight. The challenge for the day is to start with a simulated, but very realistic, one-stop outpatient clinic which is chaotic and stressful for everyone. And with no prior training the delegates transform it into a calm and enjoyable experience using the HCSE approach. It is called emergent learning. We have run dozens of these workshops and it has never failed.

After directly experiencing HCSE working in practice the teams that stepped up to the challenge were from ED, Transformation, Ambulatory Emergency Care and Outpatients.

The key to growing HCSE capability is to assemble small teams, called micro-system design teams (MSDTs) and to focus on causes that fall inside their circle of control.

The MSDT sessions need to be regular, short, and facilitated by an experienced HCSE who has seen it, done it and can teach it.

In UHNM, the Transformation team divided themselves between the front-line teams and they learned HCSE together. Here’s a picture of the ED team … left to right we have Alex, Mark and Julie (ED consultants) then Steve and Janina (Transformation). The essential tools are a big table, paper, pens, notebooks, coffee and a laptop/projector.

The purpose of each session is empirical learning-by-doing i.e. using a real improvement challenge to learn and practice the method so that before the end of the programme the team can confidently “fly” solo.

That is the key to continued growth and sustained improvement. The HCSE capability needs to become embedded.

It is good fun and immensely rewarding to see the “ah ha” moments and improvements happen as the needle on the emotometer moves from “Can’t Do” to “Can Do”.

Metamorphosis is re-arranging what you already have in a way that works better.

The tweet is objective evidence that demonstrates the HCSE programme delivers as designed. It is fit-for-purpose. It is called validation.

The other objective evidence of effectiveness comes from the learning-by-doing projects themselves. And for an individual to gain a coveted HCSE Level 1 Certificate of Competency requires writing up to a publishable quality and sharing the story. Warts-and-all.

To read the full story of just click here

And what started this was the CCG who had the strategic vision, looked outside themselves for innovative approaches, and demonstrated the courage to take a risk.

Commissioned Improvement.

25/05/2019

System Dynamics

On Thursday we had a very enjoyable and educational day. I say “we” because there were eleven of us learning together.

There was Declan, Chris, Lesley, Imran, Phil, Pete, Mike, Kate, Samar and Ellen and me (behind the camera). Some are holding their long-overdue HCSE Level-1 Certificates and Badges that were awarded just before the photo was taken.

The theme for the day was System Dynamics which is a tried-and-tested approach for developing a deep understanding of how a complex adaptive system (CAS) actually works. A health care system is a complex adaptive system.

The originator of system dynamics is Jay Wright Forrester who developed it around the end of WW2 (i.e. about 80 years ago) and who later moved to MIT. Peter Senge, author of The Fifth Discipline was part of the same group as was Donella Meadows who wrote Limits to Growth. Their dream was much bigger – global health – i.e. the whole planet not just the human passengers! It is still a hot topic [pun intended].

The purpose of the day was to introduce the team of apprentice health care system engineers (HCSEs) to the principles of system dynamics and to some of its amazing visualisation and prediction techniques and tools.

The tangible output we wanted was an Excel-based simulation model that we could use to solve a notoriously persistent health care service management problem …

“How to plan the number of new and review appointment slots needed to deliver a safe, efficient, effective and affordable chronic disease service?“

So, with our purpose in mind, the problem clearly stated, and a blank design canvas we got stuck in; and we used the HCSE improvement-by-design framework that everyone was already familiar with.

We made lots of progress, learned lots of cool stuff, and had lots of fun.

We didn’t quite get to the final product but that was OK because it was a very tough design assignment. We got 80% of the way there though which is pretty good in one day from a standing start. The last 20% can now be done by the HCSEs themselves.

We were all exhausted at the end. We had worked hard. It was a good day.

And I am already looking forward to the next HCSE Masterclass that will be in about six weeks time. This one will address another chronic, endemic, systemic health care system “disease” called carveoutosis multiforme fulminans.

20/05/2018

Making NHS Data Count

The debate about how to sensibly report NHS metrics has been raging for decades.

So I am delighted to share the news that NHS Improvement have finally come out and openly challenged the dogma that two-point comparisons and red-amber-green (RAG) charts are valid methods for presenting NHS performance data.

Their rather good 147-page guide can be downloaded: HERE

The subject is something called a statistical process control (SPC) chart which sounds a bit scary! The principle is actually quite simple:

Plot data that emerges over time as a picture that tells a story – #plotthedots

The main trust of the guide is learning the ropes of how to interpret these pictures in a meaningful way and to avoid two traps (i.e. errors).

Trap #1 = Over-reacting to random variation.
Trap #2 = Under-reacting to non-random variation.

Both of these errors cause problems, but in different ways.

Over-reacting to random variation

Random variation is a fact of life. No two days in any part of the NHS are the same. Some days are busier/quieter than others.

Plotting the daily-arrivals-in-A&E dots for a trust somewhere in England gives us this picture. (The blue line is the average and the purple histogram shows the distribution of the points around this average.)

Suppose we were to pick any two days at random and compare the number of arrivals on those two days? We could get an answer anywhere between an increase of 80% (250 to 450) or a decrease of 44% (450 to 250).

But if we look at the while picture above we get the impression that, over time:

There is an expected range of random-looking variation between about 270 and 380 that accounts for the vast majority of days.
There are some occasional, exceptional days.
There is the impression that average activity fell by about 10% in around August 2017.

So, our two-point comparison method seriously misleads us – and if we react to the distorted message that a two-point comparison generates then we run the risk of increasing the variation and making the problem worse.

Lesson: #plotthedots

One of the downsides of SPC is the arcane and unfamiliar language that is associated with it … terms like ‘common cause variation‘ and ‘special cause variation‘. Sadly, the authors at NHS Improvement have fallen into this ‘special language’ trap and therefore run the risk of creating a new clique.

The lesson here is that SPC is a specific, simplified application of a more generic method called a system behaviour chart (SBC).

The first SPC chart was designed by Walter Shewhart in 1924 for one purpose and one purpose only – for monitoring the output quality of a manufacturing process in terms of how well the product conformed to the required specification.

In other words: SPC is an output quality audit tool for a manufacturing process.

This has a number of important implications for the design of the SPC tool:

The average is not expected to change over time.
The distribution of the random variation is expected to be bell-shaped.
We need to be alerted to sudden shifts.

Shewhart’s chart was designed to detect early signs of deviation of a well-performing manufacturing process. To detect possible causes that were worth investigating and minimise the adverse effects of over-reacting or under-reacting.

However, for many reasons, the tool we need for measuring the behaviour of healthcare processes needs to be more sophisticated than the venerable SPC chart. Here are three of them:

The average is expected to change over time.
The distribution of the random variation is not expected to be bell-shaped.
We need to be alerted to slow drifts.

Under-Reacting to Non-Random Variation

Small shifts and slow drifts can have big cumulative effects.

Suppose I am a NHS service manager and I have a quarterly performance target to meet, so I have asked my data analyst to prepare a RAG chart to review my weekly data.

The quarterly target I need to stay below is 120 and my weekly RAG chart is set to show green when less than 108 (10% below target) and red when more than 132 (10% above target) because I know there is quite a lot of random week-to-week variation.

On the left is my weekly RAG chart for the first two quarters and I am in-the-green for both quarters (i.e. under target).

Q: Do I need to do anything?

A: The first quarter just showed “greens” and “ambers” so I relaxed and did nothing. There are a few “reds” in the second quarter, but about the same number as the “greens” and lots of “ambers” so it looks like I am about on target. I decide to do nothing again.

At the end of Q3 I’m in big trouble!

The quarterly RAG chart has flipped from Green to Red and I am way over target for the whole quarter. I missed the bus and I’m looking for a new job!

So, would a SPC chart have helped me here?

Here it is for Q1 and Q2. The blue line is the target and the green line is the average … so below target for both quarters, as the RAG chart said.

The was a dip in Q1 for a few weeks but it was not sustained and the rest of the chart looks stable (all the points inside the process limits). So, “do nothing” seemed like a perfectly reasonable strategy. Now I feel even more of a victim of fortune!

So, let us look at the full set of weekly date for the financial year and apply our retrospectoscope.

This is just a plain weekly performance run chart with the target limit plotted as the blue line.

It is clear from this that there is a slow upward drift and we can see why our retrospective quarterly RAG chart flipped from green to red, and why neither our weekly RAG chart nor our weekly SPC chart alerted us in time to avoid it!

This problem is often called ‘leading by looking in the rear view mirror‘.

The variation we needed to see was not random, it was a slowly rising average, but it was hidden in the random variation and we missed it. So we under-reacted and we paid the price.

This example illustrates another limitation of both RAG charts and SPC charts … they are both insensitive to small shifts and slow drifts when there is lots of random variation around, which there usually is.

So, is there a way to avoid this trap?

Yes. We need to learn to use the more powerful system behaviour charts and the systems engineering techniques and tools that accompany them.

But that aside, the rather good 147-page guide from NHS Improvement is a good first step for those still using two-point comparisons and RAG charts and it can be downloaded: HERE

02/12/2017

The Strangeness of LoS

It had been some time since Bob and Leslie had chatted so an email from the blue was a welcome distraction from a complex data analysis task.

<Bob> Hi Leslie, great to hear from you. I was beginning to think you had lost interest in health care improvement-by-design.

<Leslie> Hi Bob, not at all. Rather the opposite. I’ve been very busy using everything that I’ve learned so far. It’s applications are endless, but I have hit a problem that I have been unable to solve, and it is driving me nuts!

<Bob> OK. That sounds encouraging and interesting. Would you be able to outline this thorny problem and I will help if I can.

<Leslie> Thanks Bob. It relates to a big issue that my organisation is stuck with – managing urgent admissions. The problem is that very often there is no bed available, but there is no predictability to that. It feels like a lottery; a quality and safety lottery. The clinicians are clamoring for “more beds” but the commissioners are saying “there is no more money“. So the focus has turned to reducing length of stay.

<Bob> OK. A focus on length of stay sounds reasonable. Reducing that can free up enough beds to provide the necessary space-capacity resilience to dramatically improve the service quality. So long as you don’t then close all the “empty” beds to save money, or fall into the trap of believing that 85% average bed occupancy is the “optimum”.

<Leslie> Yes, I know. We have explored all of these topics before. That is not the problem.

<Bob> OK. What is the problem?

<Leslie> The problem is demonstrating objectively that the length-of-stay reduction experiments are having a beneficial impact. The data seems to say they they are, and the senior managers are trumpeting the success, but the people on the ground say they are not. We have hit a stalemate.

<Bob> Ah ha! That old chestnut. So, can I first ask what happens to the patients who cannot get a bed urgently?

<Leslie> Good question. We have mapped and measured that. What happens is the most urgent admission failures spill over to commercial service providers, who charge a fee-per-case and we have no choice but to pay it. The Director of Finance is going mental! The less urgent admission failures just wait on queue-in-the-community until a bed becomes available. They are the ones who are complaining the most, so the Director of Governance is also going mental. The Director of Operations is caught in the cross-fire and the Chief Executive and Chair are doing their best to calm frayed tempers and to referee the increasingly toxic arguments.

<Bob> OK. I can see why a “Reduce Length of Stay Initiative” would tick everyone’s Nice If box. So, the data analysts are saying “the length of stay has come down since the Initiative was launched” but the teams on the ground are saying “it feels the same to us … the beds are still full and we still cannot admit patients“.

<Leslie> Yes, that is exactly it. And everyone has come to the conclusion that demand must have increased so it is pointless to attempt to reduce length of stay because when we do that it just sucks in more work. They are feeling increasingly helpless and hopeless.

<Bob> OK. Well, the “chronic backlog of unmet need” issue is certainly possible, but your data will show if admissions have gone up.

<Leslie> I know, and as far as I can see they have not.

<Bob> OK. So I’m guessing that the next explanation is that “the data is wonky“.

<Leslie> Yup. Spot on. So, to counter that the Information Department has embarked on a massive push on data collection and quality control and they are adamant that the data is complete and clean.

<Bob> OK. So what is your diagnosis?

<Leslie> I don’t have one, that’s why I emailed you. I’m stuck.

<Bob> OK. We need a diagnosis, and that means we need to take a “history” and “examine” the process. Can you tell me the outline of the RLoS Initiative.

<Leslie> We knew that we would need a baseline to measure from so we got the historical admission and discharge data and plotted a Diagnostic Vitals Chart®. I have learned something from my HCSE training! Then we planned the implementation of a visual feedback tool that would show ward staff which patients were delayed so that they could focus on “unblocking” the bottlenecks. We then planned to measure the impact of the intervention for three months, and then we planned to compare the average length of stay before and after the RLoS Intervention with a big enough data set to give us an accurate estimate of the averages. The data showed a very obvious improvement, a highly statistically significant one.

<Bob> OK. It sounds like you have avoided the usual trap of just relying on subjective feedback, and now have a different problem because your objective and subjective feedback are in disagreement.

<Leslie> Yes. And I have to say, getting stuck like this has rather dented my confidence.

<Bob> Fear not Leslie. I said this is an “old chestnut” and I can say with 100% confidence that you already have what you need in your T4 kit bag?

<Leslie>Tee-Four?

<Bob> Sorry, a new abbreviation. It stands for “theory, techniques, tools and training“.

<Leslie> Phew! That is very reassuring to hear, but it does not tell me what to do next.

<Bob> You are an engineer now Leslie, so you need to don the hard-hat of Improvement-by-Design. Start with your Needs Analysis.

<Leslie> OK. I need a trustworthy tool that will tell me if the planned intervention has has a significant impact on length of stay, for better or worse or not at all. And I need it to tell me that quickly so I can decide what to do next.

<Bob> Good. Now list all the things that you currently have that you feel you can trust.

<Leslie> I do actually trust that the Information team collect, store, verify and clean the raw data – they are really passionate about it. And I do trust that the front line teams are giving accurate subjective feedback – I work with them and they are just as passionate. And I do trust the systems engineering “T4” kit bag – it has proven itself again-and-again.

<Bob> Good, and I say that because you have everything you need to solve this, and it sounds like the data analysis part of the process is a good place to focus.

<Leslie> That was my conclusion too. And I have looked at the process, and I can’t see a flaw. It is driving me nuts!

<Bob> OK. Let us take a different tack. Have you thought about designing the tool you need from scratch?

<Leslie> No. I’ve been using the ones I already have, and assume that I must be using them incorrectly, but I can’t see where I’m going wrong.

<Bob> Ah! Then, I think it would be a good idea to run each of your tools through a verification test and check that they are fit-4-purpose in this specific context.

<Leslie> OK. That sounds like something I haven’t covered before.

<Bob> I know. Designing verification test-rigs is part of the Level 2 training. I think you have demonstrated that you are ready to take the next step up the HCSE learning curve.

<Leslie> Do you mean I can learn how to design and build my own tools? Special tools for specific tasks?

<Bob> Yup. All the techniques and tools that you are using now had to be specified, designed, built, verified, and validated. That is why you can trust them to be fit-4-purpose.

<Leslie> Wooohooo! I knew it was a good idea to give you a call. Let’s get started.

[Postscript] And Leslie, together with the other stakeholders, went on to design the tool that they needed and to use the available data to dissolve the stalemate. And once everyone was on the same page again they were able to work collaboratively to resolve the flow problems, and to improve the safety, flow, quality and affordability of their service. Oh, and to know for sure that they had improved it.

12/08/2017

One Step Back; Two Steps Forward.

This week a ground-breaking case study was published.

It describes how a team in South Wales discovered how to make the flows visible in a critical part of their cancer pathway.

Radiology.

And they did that by unintentionally falling into a trap! A trap that many who set out to improve health care services fall into. But they did not give up. They sought guidance and learned some profound lessons.

Part 1 of their story is shared here.

One lesson they learned is that, as they take on more complex improvement challenges, they need to be equipped with the right tools, and they need to be trained to use them, and they need to have practiced using them.

Another lesson they learned is that making the flows in a system visible is necessary before the current behaviour of the system can be understood.

And they learned that they needed a clear diagnosis of how the current system is not performing; before they can attempt to design an intervention to deliver the intended improvement.

They learned how the Study-Plan-Do cycle works, and they learned the reason it starts with “Study”, and not with “Plan”.

They tried, failed, took one step back, asked, listened and learned.

Then with their new knowledge, more advanced tools, and deeper understanding they took two steps forward; diagnosed problem, designed an intervention, and delivered a significant improvement.

And visualised just how significant.

Then they shared Part 2 of their story … here.

27/05/2017

Unknown-Knowns

This is the now-infamous statement that Donald Rumsfeld made at a Pentagon Press Conference which triggered some good-natured jesting from the assembled journalists.

But there is a problem with it.

There is a fourth combination that he does not mention: the Unknown-Knowns.

Which is a shame because they are actually the most important because they cause the most problems. Avoidable problems.

Suppose there is a piece of knowledge that someone knows but that someone else does not; then we have an unknown-known.

None of us know everything and we do not need to, because knowledge that is of no value to us is irrelevant for us.

But what happens when the unknown-known is of value to us, and more than that; what happens when it would be reasonable for someone else to expect us to know it; because it is our job to know.

A surgeon would be not expected to know a lot about astronomy, but they would be expected to know a lot about anatomy.

So, what happens if we become aware that we are missing an important piece of knowledge that is actually already known? What is our normal human reaction to that discovery?

Typically, our first reaction is fear-driven and we express defensive behaviour. This is because we fear the potential loss-of-face from being exposed as inept.

From this sudden shock we then enter a characteristic emotional pattern which is called the Nerve Curve.

After the shock of discovery we quickly flip into denial and, if that does not work then to anger (i.e. blame). We ignore the message and if that does not work we shoot the messenger.

And when in this emotionally charged state, our rationality tends to take a back seat. So, if we want to benefit from the discovery of an unknown-known, then we have to learn to bite-our-lip, wait, let the red mist dissipate, and then re-examine the available evidence with a cool, curious, open mind. A state of mind that is receptive and open to learning.

Recently, I was reminded of this.

The context is health care improvement, and I was using a systems engineering framework to conduct some diagnostic data analysis.

My first task was to run a data-completeness-verification-test … and the data I had been sent did not pass the test. There was some missing. It was an error of omission (EOO) and they are the hardest ones to spot. Hence the need for the verification test.

The cause of the EOO was an unknown-known in the department that holds the keys to the data warehouse. And I have come across this EOO before, so I was not surprised.

Hence the need for the verification test.

I was not annoyed either. I just fed back the results of the test, explained what the issue was, explained the cause, and they listened and learned.

The implication of this specific EOO is quite profound though because it appears to be ubiquitous across the NHS.

To be specific it relates to the precise details of how raw data on demand, activity, length of stay and bed occupancy is extracted from the NHS data warehouses.

So it is rather relevant to just about everything the NHS does!

And the error-of-omission leads to confusion at best; and at worst … to the following sequence … incomplete data => invalid analysis => incorrect conclusion => poor decision => counter-productive action => unintended outcome.

Does that sound at all familiar?

So, if would you like to learn about this valuable unknown-known is then I recommend the narrative by Dr Kate Silvester, an internationally recognised expert in healthcare improvement. In it, Kate re-tells the story of her emotional roller-coaster ride when she discovered she was making the same error.

Here is the link to the full abstract and where you can download and read the full text of Kate’s excellent essay, and help to make it a known-known.

That is what system-wide improvement requires – sharing the knowledge.

31/12/2016

How Do We Know We Have Improved?

Phil and Pete are having a coffee and a chat. They both work in the NHS and have been friends for years.

They have different jobs. Phil is a commissioner and an accountant by training, Pete is a consultant and a doctor by training.

They are discussing a challenge that affects them both on a daily basis: unscheduled care.

Both Phil and Pete want to see significant and sustained improvements and how to achieve them is often the focus of their coffee chats.

<Phil> We are agreed that we both want improvement, both from my perspective as a commissioner and from your perspective as a clinician. And we agree that what we want to see improvements in patient safety, waiting, outcomes, experience for both patients and staff, and use of our limited NHS resources.

<Pete> Yes. Our common purpose, the “what” and “why”, has never been an issue. Where we seem to get stuck is the “how”. We have both tried many things but, despite our good intentions, it feels like things are getting worse!

<Phil> I agree. It may be that what we have implemented has had a positive impact and we would have been even worse off if we had done nothing. But I do not know. We clearly have much to learn and, while I believe we are making progress, we do not appear to be learning fast enough. And I think this knowledge gap exposes another “how” issue: After we have intervened, how do we know that we have (a) improved, (b) not changed or (c) worsened?

<Pete> That is a very good question. And all that I have to offer as an answer is to share what we do in medicine when we ask a similar question: “How do I know that treatment A is better than treatment B?” It is the essence of medical research; the quest to find better treatments that deliver better outcomes and at lower cost. The similarities are strong.

<Phil> OK. How do you do that? How do you know that “Treatment A is better than Treatment B” in a way that anyone will trust the answer?

<Pete> We use a science that is actually very recent on the scientific timeline; it was only firmly established in the first half of the 20th century. One reason for that is that it is rather a counter-intuitive science and for that reason it requires using tools that have been designed and demonstrated to work but which most of us do not really understand how they work. They are a bit like magic black boxes.

<Phil> H’mm. Please forgive me for sounding skeptical but that sounds like a big opportunity for making mistakes! If there are lots of these “magic black box” tools then how do you decide which one to use and how do you know you have used it correctly?

<Pete> Those are good questions! Very often we don’t know and in our collective confusion we generate a lot of unproductive discussion. This is why we are often forced to accept the advice of experts but, I confess, very often we don’t understand what they are saying either! They seem like the medieval Magi.

<Phil> H’mm. So these experts are like ‘magicians’ – they claim to understand the inner workings of the black magic boxes but are unable, or unwilling, to explain in a language that a ‘muggle’ would understand?

<Pete> Very well put. That is just how it feels.

<Phil> So can you explain what you do understand about this magical process? That would be a start.

<Pete> OK, I will do my best. The first thing we learn in medical research is that we need to be clear about what it is we are looking to improve, and we need to be able to measure it objectively and accurately.

<Phil> That makes sense. Let us say we want to improve the patient’s subjective quality of the A&E experience and objectively we want to reduce the time they spend in A&E. We measure how long they wait.

<Pete> The next thing is that we need to decide how much improvement we need. What would be worthwhile? So in the example you have offered we know that reducing the average time patients spend in A&E by just 30 minutes would have a significant effect on the quality of the patient and staff experience, and as a by-product it would also dramatically improve the 4-hour target performance.

<Phil> OK. From the commissioning perspective there are lots of things we can do, such as commissioning alternative paths for specific groups of patients; in effect diverting some of the unscheduled demand away from A&E to a more appropriate service provider. But these are the sorts of thing we have been experimenting with for years, and it brings us back to the question: How do we know that any change we implement has had the impact we intended? The system seems, well, complicated.

<Pete> In medical research we are very aware that the system we are changing is very complicated and that we do not have the power of omniscience. We cannot know everything. Realistically, all we can do is to focus on objective outcomes and collect small samples of the data ocean and use those in an attempt to draw conclusions can trust. We have to design our experiment with care!

<Phil> That makes sense. Surely we just need to measure the stuff that will tell us if our impact matches our intent. That sounds easy enough. What’s the problem?

<Pete> The problem we encounter is that when we measure “stuff” we observe patient-to-patient variation, and that is before we have made any changes. Any impact that we may have is obscured by this “noise”.

<Phil> Ah, I see. So if the our intervention generates a small impact then it will be more difficult to see amidst this background noise. Like trying to see fine detail in a fuzzy picture.

<Pete> Yes, exactly like that. And it raises the issue of “errors”. In medical research we talk about two different types of error; we make the first type of error when our actual impact is zero but we conclude from our data that we have made a difference; and we make the second type of error when we have made an impact but we conclude from our data that we have not.

<Phil> OK. So does that imply that the more “noise” we observe in our measure for-improvement before we make the change, the more likely we are to make one or other error?

<Pete> Precisely! So before we do the experiment we need to design it so that we reduce the probability of making both of these errors to an acceptably low level. So that we can be assured that any conclusion we draw can be trusted.

<Phil> OK. So how exactly do you do that?

<Pete> We know that whenever there is “noise” and whenever we use samples then there will always be some risk of making one or other of the two types of error. So we need to set a threshold for both. We have to state clearly how much confidence we need in our conclusion. For example, we often use the convention that we are willing to accept a 1 in 20 chance of making the Type I error.

<Phil> Let me check if I have heard you correctly. Suppose that, in reality, our change has no impact and we have set the risk threshold for a Type 1 error at 1 in 20, and suppose we repeat the same experiment 100 times – are you saying that we should expect about five of our experiments to show data that says our change has had the intended impact when in reality it has not?

<Pete> Yes. That is exactly it.

<Phil> OK. But in practice we cannot repeat the experiment 100 times, so we just have to accept the 1 in 20 chance that we will make a Type 1 error, and we won’t know we have made it if we do. That feels a bit chancy. So why don’t we just set the threshold to 1 in 100 or 1 in 1000?

<Pete> We could, but doing that has a consequence. If we reduce the risk of making a Type I error by setting our threshold lower, then we will increase the risk of making a Type II error.

<Phil> Ah! I see. The old swings-and-roundabouts problem. By the way, do these two errors have different names that would make it easier to remember and to explain?

<Pete> Yes. The Type I error is called a False Positive. It is like concluding that a patient has a specific diagnosis when in reality they do not.

<Phil> And the Type II error is called a False Negative?

<Pete> Yes. And we want to avoid both of them, and to do that we have to specify a separate risk threshold for each error. The convention is to call the threshold for the false positive the alpha level, and the threshold for the false negative the beta level.

<Phil> OK. So now we have three things we need to be clear on before we can do our experiment: the size of the change that we need, the risk of the false positive that we are willing to accept, and the risk of a false negative that we are willing to accept. Is that all we need?

<Pete> In medical research we learn that we need six pieces of the experimental design jigsaw before we can proceed. We only have three pieces so far.

<Phil> What are the other three pieces then?

<Pete> We need to know the average value of the metric we are intending to improve, because that is our baseline from which improvement is measured. Improvements are often framed as a percentage improvement over the baseline. And we need to know the spread of the data around that average, the “noise” that we referred to earlier.

<Phil> Ah, yes! I forgot about the noise. But that is only five pieces of the jigsaw. What is the last piece?

<Pete> The size of the sample.

<Phil> Eh? Can’t we just go with whatever data we can realistically get?

<Pete> Sadly, no. The size of the sample is how we control the risk of a false negative error. The more data we have the lower the risk. This is referred to as the power of the experimental design.

<Phil> OK. That feels familiar. I know that the more experience I have of something the better my judgement gets. Is this the same thing?

<Pete> Yes. Exactly the same thing.

<Phil> OK. So let me see if I have got this. To know if the impact of the intervention matches our intention we need to design our experiment carefully. We need all six pieces of the experimental design jigsaw and they must all fall inside our circle of control. We can measure the baseline average and spread; we can specify the impact we will accept as useful; we can specify the risks we are prepared to accept of making the false positive and false negative errors; and we can collect the required amount of data after we have made the intervention so that we can trust our conclusion.

<Pete> Perfect! That is how we are taught to design research studies so that we can trust our results, and so that others can trust them too.

<Phil> So how do we decide how big the post-implementation data sample needs to be? I can see we need to collect enough data to avoid a false negative but we have to be pragmatic too. There would appear to be little value in collecting more data than we need. It would cost more and could delay knowing the answer to our question.

<Pete> That is precisely the trap than many inexperienced medical researchers fall into. They set their sample size according to what is achievable and affordable, and then they hope for the best!

<Phil> Well, we do the same. We analyse the data we have and we hope for the best. In the magical metaphor we are asking our data analysts to pull a white rabbit out of the hat. It sounds rather irrational and unpredictable when described like that! Have medical researchers learned a way to avoid this trap?

<Pete> Yes, it is a tool called a power calculator.

<Phil> Ooooo … a power tool … I like the sound of that … that would be a cool tool to have in our commissioning bag of tricks. It would be like a magic wand. Do you have such a thing?

<Pete> Yes.

<Phil> And do you understand how the power tool magic works well enough to explain to a “muggle”?

<Pete> Not really. To do that means learning some rather unfamiliar language and some rather counter-intuitive concepts.

<Phil> Is that the magical stuff I hear lurks between the covers of a medical statistics textbook?

<Pete> Yes. Scary looking mathematical symbols and unfathomable spells!

<Phil> Oh dear! Is there another way for to gain a working understanding of this magic? Something a bit more pragmatic? A path that a ‘statistical muggle’ might be able to follow?

<Pete> Yes. It is called a simulator.

<Phil> You mean like a flight simulator that pilots use to learn how to control a jumbo jet before ever taking a real one out for a trip?

<Pete> Exactly like that.

<Phil> Do you have one?

<Pete> Yes. It was how I learned about this “stuff” … pragmatically.

<Phil> Can you show me?

<Pete> Of course. But to do that we will need a bit more time, another coffee, and maybe a couple of those tasty looking Danish pastries.

<Phil> A wise investment I’d say. I’ll get the the coffee and pastries, if you fire up the engines of the simulator.

19/03/2016

Type II Error

It was the time for Bob and Leslie’s regular Improvement Science coaching session.

<Leslie> Hi Bob, how are you today?

<Bob> I am getting over a winter cold but otherwise I am good. And you?

<Leslie> I am OK and I need to talk something through with you because I suspect you will be able to help.

<Bob> OK. What is the context?

<Leslie> Well, one of the projects that I am involved with is looking at the elderly unplanned admission stream which accounts for less than half of our unplanned admissions but more than half of our bed days.

<Bob> OK. So what were you looking to improve?

<Leslie> We want to reduce the average length of stay so that we free up beds to provide resilient space-capacity to ease the 4-hour A&E admission delay niggle.

<Bob> That sounds like a very reasonable strategy. So have you made any changes and measured any improvements?

<Leslie> We worked through the 6M Design® sequence. We studied the current system, diagnosed some time traps and bottlenecks, redesigned the ones we could influence, modified the system, and continued to measure to monitor the effect.

<Bob> And?

<Leslie> It feels better but the system behaviour charts do not show an improvement.

<Bob> Which charts, specifically?

<Leslie> The BaseLine XmR charts of average length of stay for each week of activity.

<Bob> And you locked the limits when you made the changes?

<Leslie> Yes. And there still were no red flags. So that means our changes have not had a significant effect. But it definitely feels better. Am I deluding myself?

<Bob> I do not believe so. Your subjective assessment is very likely to be accurate. Our Chimp OS 1.0 is very good at some things! I think the issue is with the tool you are using to measure the change.

<Leslie> The XmR chart? But I thought that was THE tool to use?

<Bob> Like all tools it is designed for a specific purpose. Are you familiar with the term Type II Error.

<Leslie> Doesn’t that come from research? I seem to remember that is the error we make when we have an under-powered study. When our sample size is too small to confidently detect the change in the mean that we are looking for.

<Bob> A perfect definition! The same error can happen when we are doing before and after studies too. And when it does, we see the pattern you have just described: the process feels better but we do not see any red flags on our BaseLine© chart.

<Leslie> But if our changes only have a small effect how can it feel better?

<Bob> Because some changes have cumulative effects and we omit to measure them.

<Leslie> OMG! That makes complete sense! For example, if my bank balance is stable my average income and average expenses are balanced over time. So if I make a small-but-sustained improvement to my expenses, like using lower cost generic label products, then I will see a cumulative benefit over time to the balance, but not the monthly expenses; because the noise swamps the signal on that chart!

<Bob> An excellent analogy!

<Leslie> So the XmR chart is not the tool for this job. And if this is the only tool we have then we risk making a Type II error. Is that correct?

<Bob> Yes. We do still use an XmR chart first though, because if there is a big enough and fast enough shift then the XmR chart will reveal it. If there is not then we do not give up just yet; we reach for our more sensitive shift detector tool.

<Leslie> Which is?

<Bob> I will leave you to ponder on that question. You are a trained designer now so it is time to put your designer hat on and first consider the purpose of this new tool, and then create the outline a fit-for-purpose design.

<Leslie> OK, I am on the case!

13/02/2016

Melting the Queue

[Drrrrrrring]

<Leslie> Hi Bob, I hope I am not interrupting you. Do you have five minutes?

<Bob> Hi Leslie. I have just finished what I was working on and a chat would be a very welcome break. Fire away.

<Leslie> I really just wanted to say how much I enjoyed the workshop this week, and so did all the delegates. They have been emailing me to say how much they learned and thanking me for organising it.

<Bob> Thank you Leslie. I really enjoyed it too … and I learned lots … I always do.

<Leslie> As you know I have been doing the ISP programme for some time, and I have come to believe that you could not surprise me any more … but you did! I never thought that we could make such a dramatic improvement in waiting times. The queue just melted away and I still cannot really believe it. Was it a trick?

<Bob> Ahhhh, the siren-call of the battle-hardened sceptic! It was no trick. What you all saw was real enough. There were no computers, statistics or smoke-and-mirrors used … just squared paper and a few coloured pens. You saw it with your own eyes; you drew the charts; you made the diagnosis; and you re-designed the policy. All I did was provide the context and a few nudges.

<Leslie> I know, and that is why I think seeing the before and after data would help me. The process felt so much better, but I know I will need to show the hard evidence to convince others, and to convince myself as well, to be brutally honest. I have the before data … do you have the after data?

<Bob> I do. And I was just plotting it as BaseLine charts to send to you. So you have pre-empted me. Here you are.

This is the waiting time run chart for the one stop clinic improvement exercise that you all did. The leftmost segment is the before, and the rightmost are the after … your two ‘new’ designs.

As you say, the queue and the waiting has melted away despite doing exactly the same work with exactly the same resources. Surprising and counter-intuitive but there is the evidence.

<Leslie> Wow! That fits exactly with how it felt. Quick and calm! But I seem to remember that the waiting room was empty, particularly in the case of the design that Team 1 created. How come the waiting is not closer to zero on the chart?

<Bob> You are correct. This is not just the time in the waiting room, it also includes the time needed to move between the rooms and the changeover time within the rooms. It is what I call the ‘tween-time.

<Leslie> OK, that makes sense now. And what also jumps out of the picture for me is the proof that we converted an unstable process into a stable one. The chaos was calmed. So what is the root cause of the difference between the two ‘after’ designs?

<Bob> The middle one, the slightly better of the two, is the one where all patients followed the newly designed process. The rightmost one was where we deliberately threw a spanner in the works by assuming an unpredictable case mix.

<Leslie> Which made very little difference! The new design was still much, much better than before.

<Bob> Yes. What you are seeing here is the footprint of resilient design. Do you believe it is possible now?

<Leslie> You bet I do!

31/01/2016

The Magic Black Box

It was the appointed time for Bob and Leslie’s regular coaching session as part of the improvement science practitioner programme.

<Leslie> Hi Bob, I am feeling rather despondent today so please excuse me in advance if you hear a lot of “Yes, but …” language.

<Bob> I am sorry to hear that Leslie. Do you want to talk about it?

<Leslie> Yes, please. The trigger for my gloom was being sent on a mandatory training workshop.

<Bob> OK. Training to do what?

<Leslie> Outpatient demand and capacity planning!

<Bob> But you know how to do that already, so what is the reason you were “sent”?

<Leslie> Well, I am no longer sure I know how to it. That is why I am feeling so blue. I went more out of curiosity and I came away utterly confused and with my confidence shattered.

<Bob> Oh dear! We had better start at the beginning. What was the purpose of the workshop?

<Leslie> To train everyone in how to use an Outpatient Demand and Capacity planning model, an Excel one that we were told to download along with the User Guide. I think it is part of a national push to improve waiting times for outpatients.

<Bob> OK. On the surface that sounds reasonable. You have designed and built your own Excel flow-models already; so where did the trouble start?

<Leslie> I will attempt to explain. This was a paragraph in the instructions. I felt OK with this because my Improvement Science training has given me a very good understanding of basic demand and capacity theory.

<Bob> OK. I am guessing that other delegates may have felt less comfortable with this. Was that the case?

<Leslie> The training workshops are targeted at Operational Managers and the ones I spoke to actually felt that they had a good grasp of the basics.

<Bob> OK. That is encouraging, but a warning bell is ringing for me. So where did the trouble start?

<Leslie> Well, before going to the workshop I decided to read the User Guide so that I had some idea of how this magic tool worked. This is where I started to wobble – this paragraph specifically …

<Bob> H’mm. What did you make of that?

<Leslie> It was complete gibberish to me and I felt like an idiot for not understanding it. I went to the workshop in a bit of a panic and hoped that all would become clear. It didn’t.

<Bob> Did the User Guide explain what ‘percentile’ means in this context, ideally with some visual charts to assist?

<Leslie> No and the use of ‘th’ and ‘%’ was really confusing too. After that I sort of went into a mental fog and none of the workshop made much sense. It was all about practising using the tool without any understanding of how it worked. Like a black magic box.

<Bob> OK. I can see why you were confused, and do not worry, you are not an idiot. It looks like the author of the User Guide has unwittingly used some very confusing and ambiguous terminology here. So can you talk me through what you have to do to use this magic box?

<Leslie> First we have to enter some of our historical data; the number of new referrals per week for a year; and the referral and appointment dates for all patients for the most recent three months.

<Bob> OK. That sounds very reasonable. A run chart of historical demand and the raw event data for a Vitals Chart® is where I would start the measurement phase too – so long as the data creates a valid 3 month reporting window.

<Leslie> Yes, I though so too … but that is not how the black box model seems to work. The weekly demand is used to draw an SPC chart, but the event data seems to disappear into the innards of the black box, and recommendations pop out of it.

<Bob> Ah ha! And let me guess the relationship between the term ‘percentile’ and the SPC chart of weekly new demand was not explained?

<Leslie> Spot on. What does percentile mean?

<Bob> It is statistics jargon. Remember that we have talked about the distribution of the data around the average on a BaseLine chart; and how we use the histogram feature of BaseLine to show it visually. Like this example.

<Leslie> Yes. I recognise that. This chart shows a stable system of demand with an average of around 150 new referrals per week and the variation distributed above and below the average in a symmetrical pattern, falling off to zero around the upper and lower process limits. I believe that you said that over 99% will fall within the limits.

<Bob> Good. The blue histogram on this chart is called a probability distribution function, to use the terminology of a statistician.

<Leslie> OK.

<Bob> So, what would happen if we created a Pareto chart of demand using the number of patients per week as the categories and ignoring the time aspect? We are allowed to do that if the behaviour is stable, as this chart suggests.

<Leslie> Give me a minute, I will need to do a rough sketch. Does this look right?

<Bob> Perfect! So if you now convert the Y-axis to a percentage scale so that 52 weeks is 100% then where does the average weekly demand of about 150 fall? Read up from the X-axis to the line then across to the Y-axis.

<Leslie> At about 26 weeks or 50% of 52 weeks. Ah ha! So that is what a percentile means! The 50th percentile is the average, the zeroth percentile is around the lower process limit and the 100th percentile is around the upper process limit!

<Bob> In this case the 50th percentile is the average, it is not always the case though. So where is the 85th percentile line?

<Leslie> Um, 52 times 0.85 is 44.2 which, reading across from the Y-axis then down to the X-axis gives a weekly demand of about 170 per week. That is about the same as the average plus one sigma according to the run chart.

<Bob> Excellent. The Pareto chart that you have drawn is called a cumulative probability distribution function … and that is usually what percentiles refer to. Comparative Statisticians love these but often omit to explain their rationale to non-statisticians!

<Leslie> Phew! So, now I can see that the 65th percentile is just above average demand, and 85th percentile is above that. But in the confusing paragraph how does that relate to the phrase “65% and 85% of the time”?

<Bob> It doesn’t. That is the really, really confusing part of that paragraph. I am not surprised that you looped out at that point!

<Leslie> OK. Let us leave that for another conversation. If I ignore that bit then does the rest of it make sense?

<Bob> Not yet alas. We need to dig a bit deeper. What would you say are the implications of this message?

<Leslie> Well. I know that if our flow-capacity is less than our average demand then we will guarantee to create an unstable queue and chaos. That is the Flaw of Averages trap.

<Bob> OK. The creator of this tool seems to know that.

<Leslie> And my outpatient manager colleagues are always complaining that they do not have enough slots to book into, so I conclude that our current flow-capacity is just above the 50th percentile.

<Bob> A reasonable hypothesis.

<Leslie> So to calm the chaos the message is saying I will need to increase my flow capacity up to the 85th percentile of demand which is from about 150 slots per week to 170 slots per week. An increase of 7% which implies a 7% increase in costs.

<Bob> Good. I am pleased that you did not fall into the intuitive trap that a increase from the 50th to the 85th percentile implies a 35/50 or 70% increase! Your estimate of 7% is a reasonable one.

<Leslie> Well it may be theoretically reasonable but it is not practically possible. We are exhorted to reduce costs by at least that amount.

<Bob> So we have a finance versus governance bun-fight with the operational managers caught in the middle: FOG. That is not the end of the litany of woes … is there anything about Did Not Attends in the model?

<Leslie> Yes indeed! We are required to enter the percentage of DNAs and what we do with them. Do we discharge them or re-book them.

<Bob> OK. Pragmatic reality is always much more interesting than academic rhetoric and this aspect of the real system rather complicates things, at least for a comparative statistician. This is where the smoke and mirrors will appear and they will be hidden inside the black magic box. To solve this conundrum we need to understand the relationship between demand, capacity, variation and yield … and it is rather counter-intuitive. So, how would you approach this problem?

<Leslie> I would use the 6M Design® framework and I would start with a map and not with a model; least of all a magic black box one that I did not design, build and verify myself.

<Bob> And how do you know that will work any better?

<Leslie> Because at the One Day ISP Workshop I saw it work with my own eyes. The queues, waits and chaos just evaporated. And it cost nothing. We already had more than enough “capacity”.

<Bob> Indeed you did. So shall we do this one as an ISP-2 project?

<Leslie> An excellent suggestion. I already feel my confidence flowing back and I am looking forward to this new challenge. Thank you again Bob.

16/01/2016

Hot and Cold

Last week Bob and Leslie were exploring the data analysis trap called a two-points-in-time comparison: as illustrated by the headline “This winter has not been as bad as last … which proves that our winter action plan has worked.”

Actually it doesn’t.

But just saying that is not very helpful. We need to explain the reason why this conclusion is invalid and therefore potentially dangerous.

So here is the continuation of Bob and Leslie’s conversation.

<Bob> Hi Leslie, have you been reflecting on the two-points-in-time challenge?

<Leslie> Yes indeed, and you were correct, I did know the answer … I just didn’t know I knew if you get my drift.

<Bob> Yes, I do. So, are you willing to share your story?

<Leslie> OK, but before I do that I would like to share what happened when I described what we talked about to some colleagues. They sort of got the idea but got lost in the unfamiliar language of ‘variance’ and I realized that I needed an example to illustrate.

<Bob> Excellent … what example did you choose?

<Leslie> The UK weather – or more specifically the temperature. My reasons for choosing this were many: first it is something that everyone can relate to; secondly it has strong seasonal cycle; and thirdly because the data is readily available on the Internet.

<Bob> OK, so what specific question were you trying to answer and what data did you use?

<Leslie> The question was “Are our winters getting warmer?” and my interest in that is because many people assume that the colder the winter the more people suffer from respiratory illness and the more that go to hospital … contributing to the winter A&E and hospital pressures. The data that I used was the maximum monthly temperature from 1960 to the present recorded at our closest weather station.

<Bob> OK, and what did you do with that data?

<Leslie> Well, what I did not do was to compare this winter with last winter and draw my conclusion from that! What I did first was just to plot-the-dots … I created a time-series chart … using the BaseLine© software.

And it shows what I expected to see, a strong, regular, 12-month cycle, with peaks in the summer and troughs in the winter.

<Bob> Can you explain what the green and red lines are and why some dots are red?

<Leslie> Sure. The green line is the average for all the data. The red lines are called the upper and lower process limits. They are calculated from the data and what they say is “if the variation in this data is random then we will expect more than 99% of the points to fall between these two red lines“.

<Bob> So, we have 55 years of monthly data which is nearly 700 points which means we would expect fewer than seven to fall outside these lines … and we clearly have many more than that. For example, the winter of 1962-63 and the summer of 1976 look exceptional – a run of three consecutive dots outside the red lines. So can we conclude the variation we are seeing is not random?

<Leslie> Yes, and there is more evidence to support that conclusion. First is the reality check … I do not remember either of those exceptionally cold or hot years personally, so I asked Dr Google.

This picture from January 1963 shows copper telephone lines that are so weighed down with ice, and for so long, that they have stretched down to the ground. In this era of mobile phones we forget this was what telecommunication was like!

And just look at the young Michal Fish in the Summer of ’76! Did people really wear clothes like that?

And there is more evidence on the chart. The red dots that you mentioned are indicators that BaseLine© has detected other non-random patterns.

So the large number of red dots confirms our Mark I Eyeball conclusion … that there are signals mixed up with the noise.

<Bob> Actually, I do remember the Summer of ’76 – it was the year I did my O Levels! And your signals-in-the-noise phrase reminds me of SETI – the search for extra-terrestrial intelligence! I really enjoyed the 1997 film of Carl Sagan’s book Contact with Jodi Foster playing the role of the determined scientist who ends up taking a faster-than-light trip through space in a machine designed by ET and built by humans. And especially the line about 10 minutes from the end when those-in-high-places who had discounted her story as “unbelievable” realized they may have made an error … the line ‘Yes, that is interesting isn’t it’.

<Leslie> Ha ha! Yes. I enjoyed that film too. It had lots of great characters – her glory seeking boss; the hyper-suspicious head of national security who militarized the project; the charismatic anti-hero; the ranting radical who blew up the first alien machine; and John Hurt as her guardian angel. I must watch it again.

Anyway, back to the story. The problem we have here is that this type of time-series chart is not designed to extract the overwhelming cyclical, annual pattern so that we can search for any weaker signals … such as a smaller change in winter temperature over a longer period of time.

<Bob>Yes, that is indeed the problem with these statistical process control charts. SPC charts were designed over 60 years ago for process quality assurance in manufacturing not as a diagnostic tool in a complex adaptive system such a healthcare. So how did you solve the problem?

<Leslie> I realized that it was the regularity of the cyclical pattern that was the key. I realized that I could use that to separate out the annual cycle and to expose the weaker signals. I did that using the rational grouping feature of BaseLine© with the month-of-the-year as the group.

Now I realize why the designers of the software put this feature in! With just one mouse click the story jumped out of the screen!

<Bob> OK. So can you explain what we are looking at here?

<Leslie> Sure. This chart shows the same data as before except that I asked BaseLine© first to group the data by month and then to create a mini-chart for each month-group independently. Each group has its own average and process limits. So if we look at the pattern of the averages, the green lines, we can clearly see the annual cycle. What is very obvious now is that the process limits for each sub-group are much narrower, and that there are now very few red points … other than in the groups that are coloured red anyway … a niggle that the designers need to nail in my opinion!

<Bob> I will pass on your improvement suggestion! So are you saying that the regular annual cycle has accounted for the majority of the signal in the previous chart and that now we have extracted that signal we can look for weaker signals by looking for red flags in each monthly group?

<Leslie> Exactly so. And the groups I am most interested in are the November to March ones. So, next I filtered out the November data and plotted it as a separate chart; and I then used another cool feature of BaseLine© called limit locking.

What that means is that I have used the November maximum temperature data for the first 30 years to get the baseline average and natural process limits … and we can see that there are no red flags in that section, no obvious signals. Then I locked these limits at 1990 and this tells BaseLine© to compare the subsequent 25 years of data against these projected limits. That exposed a lot of signal flags, and we can clearly see that most of the points in the later section are above the projected average from the earlier one. This confirms that there has been a significant increase in November maximum temperature over this 55 year period.

<Bob> Excellent! You have answered part of your question. So what about December onwards?

<Leslie> I was on a roll now! I also noticed from my second chart that the December, January and February groups looked rather similar so I filtered that data out and plotted them as a separate chart.

MaxTempDecJanFeb1960-2015_Grouped These were indeed almost identical so I lumped them together as a ‘winter’ group and compared the earlier half with the later half using another BaseLine© feature called segmentation.

This showed that the more recent winter months have a higher maximum temperature … on average. The difference is just over one degree Celsius. But it also shows that that the month-to-month and year-to-year variation still dominates the picture.

<Bob> Which implies?

<Leslie> That, with data like this, a two-points-in-time comparison is meaningless. If we do that we are just sampling random noise and there is no useful information in noise. Nothing that we can learn from. Nothing that we can justify a decision with. This is the reason the ‘this year was better than last year’ statement is meaningless at best; and dangerous at worst. Dangerous because if we draw an invalid conclusion, then it can lead us to make an unwise decision, then decide a counter-productive action, and then deliver an unintended outcome.

By doing invalid two-point comparisons we can too easily make the problem worse … not better.

<Bob> Yes. This is what W. Edwards Deming, an early guru of improvement science, referred to as ‘tampering‘. He was a student of Walter A. Shewhart who recognized this problem in manufacturing and, in 1924, invented the first control chart to highlight it, and so prevent it. My grandmother used the term meddling to describe this same behavior … and I now use that term as one of the eight sources of variation. Well done Leslie!

09/01/2016

The Two-Points-In-Time Comparison Trap

[Bzzzzzz] Bob’s phone vibrated to remind him it was time for the regular ISP remote coaching session with Leslie. He flipped the lid of his laptop just as Leslie joined the virtual meeting.

<Leslie> Hi Bob, and Happy New Year!

<Bob> Hello Leslie and I wish you well in 2016 too. So, what shall we talk about today?

<Leslie> Well, given the time of year I suppose it should be the Winter Crisis. The regularly repeating annual winter crisis. The one that feels more like the perpetual winter crisis.

<Bob> OK. What specifically would you like to explore?

<Leslie> Specifically? The habit of comparing of this year with last year to answer the burning question “Are we doing better, the same or worse?” Especially given the enormous effort and political attention that has been focused on the hot potato of A&E 4-hour performance.

<Bob> Aaaaah! That old chestnut! Two-Points-In-Time comparison.

<Leslie> Yes. I seem to recall you usually add the word ‘meaningless’ to that phrase.

<Bob> H’mm. Yes. It can certainly become that, but there is a perfectly good reason why we do this.

<Leslie> Indeed, it is because we see seasonal cycles in the data so we only want to compare the same parts of the seasonal cycle with each other. The apples and oranges thing.

<Bob> Yes, that is part of it. So what do you feel is the problem?

<Leslie> It feels like a lottery! It feels like whether we appear to be better or worse is just the outcome of a random toss.

<Bob> Ah! So we are back to the question “Is the variation I am looking at signal or noise?”

<Leslie> Yes, exactly.

<Bob> And we need a scientifically robust way to answer it. One that we can all trust.

<Leslie> Yes.

<Bob> So how do you decide that now in your improvement work? How do you do it when you have data that does not show a seasonal cycle?

<Leslie> I plot-the-dots and use an XmR chart to alert me to the presence of the signals I am interested in – especially a change of the mean.

<Bob> Good. So why can we not use that approach here?

<Leslie> Because the seasonal cycle is usually a big signal and it can swamp the smaller change I am looking for.

<Bob> Exactly so. Which is why we have to abandon the XmR chart and fall back the two points in time comparison?

<Leslie> That is what I see. That is the argument I am presented with and I have no answer.

<Bob> OK. It is important to appreciate that the XmR chart was not designed for doing this. It was designed for monitoring the output quality of a stable and capable process. It was designed to look for early warning signs; small but significant signals that suggest future problems. The purpose is to alert us so that we can identify the root causes, correct them and the avoid a future problem.

<Leslie> So we are using the wrong tool for the job. I sort of knew that. But surely there must be a better way than a two-points-in-time comparison!

<Bob> There is, but first we need to understand why a TPIT is a poor design.

<Leslie> Excellent. I’m all ears.

<Bob> A two point comparison is looking at the difference between two values, and that difference can be positive, zero or negative. In fact, it is very unlikely to be zero because noise is always present.

<Leslie> OK.

<Bob> Now, both of the values we are comparing are single samples from two bigger pools of data. It is the difference between the pools that we are interested in but we only have single samples of each one … so they are not measurements … they are estimates.

<Leslie> So, when we do a TPIT comparison we are looking at the difference between two samples that come from two pools that have inherent variation and may or may not actually be different.

<Bob> Well put. We give that inherent variation a name … we call it variance … and we can quantify it.

<Leslie> So if we do many TPIT comparisons then they will show variation as well … for two reasons; first because the pools we are sampling have inherent variation; and second just from the process of sampling itself. It was the first lesson in the ISP-1 course.

<Bob> Well done! So the question is: “How does the variance of the TPIT sample compare with the variance of the pools that the samples are taken from?”

<Leslie> My intuition tells me that it will be less because we are subtracting.

<Bob> Your intuition is half-right. The effect of the variation caused by the signal will be less … that is the rationale for the TPIT after all … but the same does not hold for the noise.

<Leslie> So the noise variation in the TPIT is the same?

<Bob> No. It is increased.

<Leslie> What! But that would imply that when we do this we are less likely to be able to detect a change because a small shift in signal will be swamped by the increase in the noise!

<Bob> Precisely. And the degree that the variance increases by is mathematically predictable … it is increased by a factor of two.

<Leslie> So as we usually present variation as the square root of the variance, to get it into the same units as the metric, then that will be increased by the square root of two … 1.414

<Bob> Yes.

<Leslie> I need to put this counter-intuitive theory to the test!

<Bob> Excellent. Accept nothing on faith. Always test assumptions. And how will you do that?

<Leslie> I will use Excel to generate a big series of normally distributed random numbers; then I will calculate a series of TPIT differences using a fixed time interval; then I will calculate the means and variations of the two sets of data; and then I will compare them.

<Bob> Excellent. Let us reconvene in ten minutes when you have done that.

10 minutes later …

<Leslie> Hi Bob, OK I am ready and I would like to present the results as charts. Is that OK?

<Bob> Perfect!

<Leslie> Here is the first one. I used our A&E performance data to give me some context. We know that on Mondays we have an average of 210 arrivals with an approximately normal distribution and a standard deviation of 44; so I used these values to generate the random numbers. Here is the simulated Monday Arrivals chart for two years.

<Bob> OK. It looks stable as we would expect and I see that you have plotted the sigma levels which look to be just under 50 wide.

<Leslie> Yes, it shows that my simulation is working. So next is the chart of the comparison of arrivals for each Monday in Year 2 compared with the corresponding week in Year 1.

<Bob> Oooookaaaaay. What have we here? Another stable chart with a mean of about zero. That is what we would expect given that there has not been a change in the average from Year 1 to Year 2. And the variation has increased … sigma looks to be just over 60.

<Leslie> Yes! Just as the theory predicted. And this is not a spurious answer. I ran the simulation dozens of times and the effect is consistent! So, I am forced by reality to accept the conclusion that when we do two-point-in-time comparisons to eliminate a cyclical signal we will reduce the sensitivity of our test and make it harder to detect other signals.

<Bob> Good work Leslie! Now that you have demonstrated this to yourself using a carefully designed and conducted simulation experiment, you will be better able to explain it to others.

<Leslie> So how do we avoid this problem?

<Bob> An excellent question and one that I will ask you to ponder on until our next chat. You know the answer to this … you just need to bring it to conscious awareness.

11/04/2015

The Improvement Pyramid

Developing productive improvement capability in an organisation is like building a pyramid in the desert.

It is not easy and it takes time before there is any visible evidence of success.

The height of the pyramid is a measure of the level of improvement complexity that we can take on.

An improvement of a single step in a system would only require a small pyramid.

Improving the whole system will require a much taller one.

But if we rush and attempt to build a sky-scraper on top of the sand then we will not be surprised when it topples over before we have made very much progress. The Egyptians knew this!

First, we need to dig down and to lay some foundations. Stable enough and strong enough to support the whole structure. We will never see the foundations so it is easy to forget them in our rush but they need to be there and they need to be there first.

It is the same when developing improvement science capability … the foundations are laid first and when enough of that foundation knowledge is in place we can start to build the next layer of the pyramid: the practitioner layer.

It is the the Improvement Science Practitioners (ISPs) who start to generate tangible evidence of progress. The first success stories help to spur us all on to continue to invest effort, time and money in widening our foundations to be able to build even higher – more layers of capability -until we can realistically take on a system wide improvement challenge.

So sharing the first hard evidence of improvement is an important milestone … it is proof of fitness for purpose … and that news should be shared with those toiling in the hot desert sun and with those watching from the safety of the shade.

So here is a real story of a real improvement pyramid achieving this magical and motivating milestone.

21/02/2015

Cumulative Sum

[Bing] Bob logged in for the weekly Webex coaching session. Leslie was not yet on line, but joined a few minutes later.

<Leslie> Hi Bob, sorry I am a bit late, I have been grappling with a data analysis problem and did not notice the time.

<Bob> Hi Leslie. Sounds interesting. Would you like to talk about that?

<Leslie> Yes please! It has been driving me nuts!

<Bob> OK. Some context first please.

<Leslie> Right, yes. The context is an improvement-by-design assignment with a primary care team who are looking at ways to reduce the unplanned admissions for elderly patients by 10%.

<Bob> OK. Why 10%?

<Leslie> Because they said that would be an operationally very significant reduction. Most of their unplanned admissions, and therefore costs for admissions, are in that age group. They feel that some admissions are avoidable with better primary care support and a 10% reduction would make their investment of time and effort worthwhile.

<Bob> OK. That makes complete sense. Setting a new design specification is OK. I assume they have some baseline flow data.

<Leslie> Yes. We have historical weekly unplanned admissions data for two years. It looks stable, though rather variable on a week-by-week basis.

<Bob> So has the design change been made?

<Leslie> Yes, over three months ago – so I expected to be able to see something by now but there are no red flags on the XmR chart of weekly admissions. No change. They are adamant that they are making a difference, particularly in reducing re-admissions. I do not want to disappoint them by saying that all their hard work has made no difference!

<Bob> OK Leslie. Let us approach this rationally. What are the possible causes that the weekly admissions chart is not signalling a change?

<Leslie> If there has not been a change in admissions. This could be because they have indeed reduced readmissions but new admissions have gone up and is masking the effect.

<Bob> Yes. That is possible. Any other ideas?

<Leslie> That their intervention has made no difference to re-admissions and their data is erroneous … or worse still … fabricated!

<Bob> Yes. That is possible too. Any other ideas?

<Leslie> Um. No. I cannot think of any.

<Bob> What about the idea that the XmR chart is not showing a change that is actually there?

<Leslie> You mean a false negative? That the sensitivity of the XmR chart is limited? How can that be? I thought these charts will always signal a significant shift.

<Bob> It depends on the degree of shift and the amount of variation. The more variation there is the harder it is to detect a small shift. In a conventional statistical test we would just use bigger samples, but that does not work with an XmR chart because the run tests are all fixed length. Pre-defined sample sizes.

<Leslie> So that means we can miss small but significant changes and come to the wrong conclusion that our change has had no effect! Isn’t that called a Type 2 error?

<Bob> Yes, it is. And we need to be aware of the limitations of the analysis tool we are using. So, now you know that how might you get around the problem?

<Leslie> One way would be to aggregate the data over a longer time period before plotting on the chart … we know that will reduce the sample variation.

<Bob> Yes. That would work … but what is the downside?

<Leslie> That we have to wait a lot longer to show a change, or not. We do not want that.

<Bob> I agree. So what we do is we use a chart that is much more sensitive to small shifts of the mean. And that is called a cusum chart. These were not invented until 30 years after Shewhart first described his time-series chart. To give you an example, do you recall that the work-in-progress chart is much more sensitive to changes in flow than either demand or activity charts?

<Leslie> Yes, and the WIP chart also reacts immediately if either demand or activity change. It is the one I always look at first.

<Bob> That is because a WIP chart is actually a cusum chart. It is the cumulative sum of the difference between demand and activity.

<Leslie> OK! That makes sense. So how do I create and use a cusum chart?

<Bob> I have just emailed you some instructions and a few examples. You can try with your unplanned admissions data. It should only take a few minutes. I will get a cup of tea and a chocolate Hobnob while I wait.

[Five minutes later]

<Leslie> Wow! That is just brilliant! I can see clearly on the cusum chart when the shifts happened and when I split the XmR chart at those points the underlying changes become clear and measurable. The team did indeed achieve a 10% reduction in admissions just as they claimed they had. And I checked with a statistical test which confirmed that it is statistically significant.

<Bob> Good work. Cusum charts take a bit of getting used to and we have be careful about the metric we are plotting and a few other things but it is a useful trick to have up our sleeves for situations like this.

<Leslie> Thanks Bob. I will bear that in mind. Now I just need to work out how to explain cusum charts to others! I do not want to be accused of using statistical smoke-and-mirrors! I think a golf metaphor may work with the GPs.

06/09/2014

The Jigsaw

Systems are made of interdependent parts that link together – rather like a jigsaw.

If pieces are distorted, missing, or in the wrong place then the picture is distorted and the system does not work as well as it could.

And if pieces of one jigsaw are mixed up with those of another then it is even more difficult to see any clear picture.

A system of improvement is just the same.

There are many improvement jigsaws each of which have pieces that fit well together and form a synergistic whole. Lean, Six Sigma, and Theory of Constraints are three well known ones.

Each improvement jigsaw evolved in a different context so naturally the picture that emerges is from a particular perspective: such as manufacturing.

So when the improvement context changes then the familiar jigsaws may not work as well: such as when we shift context from products to services, and from commercial to public.

A public service such as healthcare requires a modified improvement jigsaw … so how do we go about getting that?

One way is to ‘evolve’ an old jigsaw into a new context. That is tricky because it means adding new pieces and changing old pieces and the ‘zealots’ do not like changing their familiar jigsaw so they resist.

Another way is to ‘combine’ several old jigsaws in the hope that together they will provide enough perspectives. That is even more tricky because now you have several tribes of zealots who resist having their familiar jigsaws modified.

What about starting with a blank canvas and painting a new picture from scratch? Well it is actually very difficult to create a blank canvas for learning because we cannot erase what we already know. Our current mental model is the context we need for learning new knowledge.

So what about using a combination of the above?

What about first learning a new creative approach called design? And within that framework we can then create a new improvement jigsaw that better suits our specific context using some of the pieces of the existing ones. We may need to modify the pieces a bit to allow them to fit better together, and we may need to fashion new pieces to fill the gaps that we expose. But that is part of the fun.

The improvement jigsaw shown here is a new hybrid.

It has been created from a combination of existing improvement knowledge and some innovative stuff.

Pareto analysis was described by Vilfredo Pareto over 100 years ago. So that is tried and tested!

Time-series charts were invented by Walter Shewhart almost 100 years ago. So they are tried and tested too!

The combination of Pareto and Shewhart tools have been used very effectively for over 50 years. The combination is well proven.

The other two pieces are innovative. They have different parents and different pedigrees. And different purposes.

The Niggle-o-Gram® is related to 2-by-2, FMEA and EIQ and the 4N Chart®. It is the synthesis of them that creates a powerful lens for focussing our improvement efforts on where the greatest return-on-investment will be.

The Right-2-Left Map® is a descendent of the Design family and has been crossed with Graph Theory and Causal Network exemplars to introduce their best features. Its purpose is to expose errors of omission.

The emergent system is synergistic … much more effective than each part individually … and more even than their linear sum.

So when learning this new Science of Improvement we have to focus first on learning about the individual pieces and we do that by seeing examples of them used in practice. That in itself is illuminating!

As we learn about more pieces a fog of confusion starts to form and we run the risk of mutating into a ‘tool-head’. We know about the pieces in detail but we still do not see the bigger picture.

To avoid the tool-head trap we must balance our learning wheel and ensure that we invest enough time in learning-by-doing.

Then one day something apparently random will happen that triggers a ‘click’. Familiar pieces start to fit together in a unfamiliar way and as we see the relationships, the sequences, and the synergy – then a bigger picture will start to emerge. Slowly at first and then more quickly as more pieces aggregate.

Suddenly we feel a big CLICK as the final pieces fall into place. The fog of confusion evaporates in the bright sunlight of a paradigm shift in our thinking.

The way forward that was previously obscured becomes clearly visible.

Ah ha!

And we are off on the next stage of our purposeful journey of improvement.

07/06/2014

A Stab At The Vitals

[Drrring Drrring] The phone heralded the start of the weekly ISP mentoring session.

<Bob> Hi Leslie, how are you today?

<Leslie> Hi Bob. To be honest I am not good. I am drowning. Drowning in data!

<Bob> Oh dear! I am sorry to hear that. Can I help? What led up to this?

<Leslie> Well, it was sort of triggered by our last chat and after you opened my eyes to the fact that we habitually throw most of our valuable information away by thresholding, aggregating and normalising. Then we wonder why we make poor decisions … and then we get frustrated because nothing seems to improve.

<Bob> OK. What happened next?

<Leslie> I phoned our Performance Team and asked for some raw data. Three months worth.

<Bob> And what was their reaction?

<Leslie> They said “OK, here you go!” and sent me a twenty megabyte Excel spreadsheet that clogged my email inbox! I did manage to unclog it eventually by deleting loads of old junk. But I could swear that I heard the whole office laughing as they hung up the phone! Maybe I am paranoid?

<Bob> OK. And what happened next?

<Leslie> I started drowning! The mega-file had a row of data for every patient that has attended A&E for the last three months as I had requested, but there were dozens of columns! Trying to slice-and-dice it was a nightmare! My computer was smoking and each step took ages for it to complete. In the end I gave up in frustration. I now have a lot more respect for the Performance Team I can tell you! They do this for a living?

<Bob> OK. It sounds like you are ready for a Stab At the Vitals.

<Leslie> What? That sounds rather piratical! Are you making fun of my slicing-and-dicing metaphor?

<Bob> No indeed. I am deadly serious! Before we leap into the data ocean we need to be able to swim; and we also need a raft that will keep us afloat; and we need a sail to power our raft; and we need a way to navigate our raft to our desired destination.

<Leslie> OK. I like the nautical metaphor but how does it help?

<Bob> Let me translate. Learning to use system behaviour charts is equivalent to learning the skill of swimming. We have to do that first and practice until we are competent and confident. Let us call our raft “ISP” – you are already aboard. The sail you also have already – your Excel software. The navigation aid is what I refer to as Vitals. So we need to have a “stab at the vitals”.

<Leslie> Do you mean we use a combination of time-series charts, ISP and Excel to create a navigation aid that helps avoid the Depths of Data and the Rocks of DRAT?

<Bob> Exactly.

<Leslie> Can you demonstrate with an example?

<Bob> Sure. Send me some of your data … just the arrival and departure events for one day – a typical one.

<Leslie> OK … give me a minute! … It is on its way. How long will it take for you to analyse it?

<Bob> About 2 seconds. OK, here is your email … um … copy … paste … copy … reply

<Leslie> What the ****? That was quick! Let me see what this is … the top left chart is the demand, activity and work-in-progress for each hour; the top right chart is the lead time by patient plotted in discharge order; the table bottom left includes the 4 hour breach rate. Those I do recognise. What is the chart on the bottom right?

<Bob> It is a histogram of the lead times … and it shows a problem. Can you see the spike at 225 to 240 minutes?

<Leslie> Is that the fabled Horned Gaussian?

<Bob> Yes. That is the sign that the 4-hour performance target is distorting the behaviour of the system. And this is yet another reason why the Breach Rate is a dangerous management metric. The adaptive reaction it triggers amplifies the variation and fuels the chaos.

<Leslie> Wow! And you did all that in Excel using my data in two seconds? That must need a whole host of clever macros and code!

<Bob> “Yes” it was done in Excel and “No” it does not need any macros or code. It is all done using simple formulae.

<Leslie> That is fantastic! Can you send me a copy of your Excel file?

<Bob> Nope.

<Leslie>Whaaaat? Why not? Is this some sort of evil piratical game?

<Bob> Nope. You are going to learn how to do this yourself – you are going to build your own Vitals Chart Generator – because that is the only way to really understand how it works.

<Leslie> Phew! You had me going for a second there! Bring it on! What do I do next?

<Bob> I will send you the step-by-step instructions of how to build, test and use a Vitals Chart Generator.

<Leslie> Thanks Bob. I cannot wait to get started! Weigh anchor and set the sails! Ha’ harrrr me hearties.

24/05/2014

Ratio Hazards

[Bzzzzz Bzzzzz] Bob’s phone was on silent but the desktop amplified the vibration and heralded the arrival of Leslie’s weekly ISP coaching call.

<Bob> Hi Leslie. How are you today and what would you like to talk about?

<Leslie> Hi Bob. I am well and I have an old chestnut to roast today … target-driven-behaviour!

<Bob> Excellent. That is one of my favorite topics. Is there a specific context?

<Leslie> Yes. The usual desperate directive from on-high exhorting everyone to “work harder to hit the target” and usually accompanied by a RAG table of percentages that show just who is failing and how badly they are doing.

<Bob> OK. Red RAGs irritating the Bulls eh? Percentages eh? Have we talked about Ratio Hazards?

<Leslie> We have talked about DRATs … Delusional Ratios and Arbitrary Targets as you call them. Is that the same thing?

<Bob> Sort of. What happened when you tried to explain DRATs to those who are reacting to these ‘desperate directives’?

<Leslie> The usual reply is ‘Yes, but that is how we are required to report our performance to our Commissioners and Regulatory Bodies.’

<Bob> And are the key performance indicators that are reported upwards and outwards also being used to manage downwards and inwards? If so, then that is poor design and is very likely to be contributing to the chaos.

<Leslie> Can you explain that a bit more? It feels like a very fundamental point you have just made.

<Bob> OK. To do that let us work through the process by which the raw data from your system is converted into the externally reported KPI. Choose any one of your KPIs

<Leslie> Easy! The 4-hour A&E target performance.

<Bob> What is the raw data that goes in to that?

<Leslie> The percentage of patients who breach 4-hours per day.

<Bob> And where does that ratio come from?

<Leslie> Oh! I see what you mean. That comes from a count of the number of patients who are in A&E for more than 4 hours divided by a count of the number of patients who attended.

<Bob> And where do those counts come come from?

<Leslie> We calculate the time the patient is in A&E and use the 4-hour target to label them as breaches or not.

<Bob> And what data goes into the calculation of that time?

<Leslie>The arrival and departure times for each patient. The arrive and depart events.

<Bob>OK. Is that the raw data?

<Leslie>Yes. Everything follows from that.

<Bob> Good. Each of these two events is a time – which is a continuous metric. In principle, we could in record it to any degree of precision we like – milliseconds if we had a good enough enough clock.

<Leslie> Yes. We record it to an accuracy of of seconds – it is when the patient is ‘clicked through’ on the computer.

<Bob> Careful Leslie, do not confuse precision with accuracy. We need both.

<Leslie> Oops! Yes I remember we had that conversation before.

<Bob> And how often is the A&E 4-hour target KPI reported externally?

<Leslie> Quarterly. We either succeed or fail each quarter of the financial year.

<Bob> That is a binary metric. An “OK or not OK”. No gray zone.

<Leslie> Yes. It is rather blunt but that is how we are contractually obliged to report our performance.

<Bob> OK. And how many patients per day on average come to A&E?

<Leslie> About 200 per day.

<Bob> So the data analysis process is boiling down about 36,000 pieces of continuous data into one Yes-or-No bit of binary data.

<Leslie> Yes.

<Bob> And then that one bit is used to drive the action of the Board: if it is ‘OK last quarter’ then there is no ‘desperate directive’ and if it is a ‘Not OK last quarter’ then there is.

<Leslie> Yes.

<Bob> So you are throwing away 99.9999% of your data and wondering why what is left is not offering much insight in what to do.

<Leslie>Um, I guess so … when you say it like that. But how does that relate to your phrase ‘Ratio Hazards’?

<Bob> A ratio is just one of the many ways that we throw away information. A ratio requires two numbers to calculate it; and it gives one number as an output so we are throwing half our information away. And this is an irreversible act. Two specific numbers will give one ratio; but that ratio can be created by an infinite number possible pairs of numbers and we have no way of knowing from the ratio what specific pair was used to create it.

<Leslie> So a ratio is an exercise in obfuscation!

<Bob> Well put! And there is an even more data-wasteful behaviour that we indulge in. We aggregate.

<Leslie> By that do you mean we summarise a whole set of numbers with an average?

<Bob> Yes. When we average we throw most of the data away and when we average over time then we abandon our ability to react in a timely way.

<Leslie>The Flaw of Averages!

<Bob> Yes. One of them. There are many.

<Leslie>No wonder it feels like we are flying blind and out of control!

<Bob> There is more. There is an even worse data-wasteful behaviour. We threshold.

<Leslie>Is that when we use a target to decide if the lead time is OK or Not OK.

<Bob> Yes. And using an arbitrary target makes it even worse.

<Leslie> Ah ha! I see what you are getting at. The raw event data that we painstakingly collect is a treasure trove of information and potential insight that we could use to help us diagnose, design and deliver a better service. But we throw all but one single solitary binary digit when we put it through the DRAT Processor.

<Bob> Yup.

<Leslie> So why could we not do both? Why could we not use use the raw data for ourselves and the DRAT processed data for external reporting.

<Bob> We could. So what is stopping us doing just that?

<Leslie> We do not know how to effectively and efficiently interpret the vast ocean of raw data.

<Bob> That is what a time-series chart is for. It turns the thousands of pieces of valuable information onto a picture that tells a story – without throwing the information away in the process. We just need to learn how to interpret the pictures.

<Leslie> Wow! Now I understand much better why you insist we ‘plot the dots’ first.

<Bob> And now you understand the Ratio Hazards a bit better too.

<Leslie> Indeed so. And once again I have much to ponder on. Thank you again Bob.

02/11/2013

Three Essentials

There are three necessary parts before ANY improvement-by-design effort will gain traction. Omit any one of them and nothing happens.

1. A clear purpose and an outline strategic plan.

2. Tactical measurement of performance-over-time.

3. A generic Improvement-by-Design framework.

These are necessary minimum requirements to be able to safely delegate the day-to-day and week-to-week tactical stuff the delivers the “what is needed”.

These are necessary minimum requirements to build a self-regulating, self-sustaining, self-healing, self-learning win-win-win system.

And this is not a new idea. It was described by Joseph Juran in the 1960’s and that description was based on 20 years of hands-on experience of actually doing it in a wide range of manufacturing and service organisations.

That is 20 years before the terms “Lean” or “Six Sigma” or “Theory of Constraints” were coined. And the roots of Juran’s journey were 20 years before that – when he started work at the famous Hawthorne Works in Chicago – home of the Hawthorne Effect – and where he learned of the pioneering work of Walter Shewhart.

And the roots of Shewhart’s innovations were 20 years before that – in the first decade of the 20th Century when innovators like Henry Ford and Henry Gantt were developing the methods of how to design and build highly productive processes.

Ford gave us the one-piece-flow high-quality at low-cost production paradigm. Toyota learned it from Ford. Gantt gave us simple yet powerful visual charts that give us an understanding-at-a-glance of the progress of the work. And Shewhart gave us the deceptively simple time-series chart that signals when we need to take more notice.

These nuggets of pragmatic golden knowledge have been buried for decades under a deluge of academic mud. It is nigh time to clear away the detritus and get back to the bedrock of pragmatism. The “how-to-do-it” of improvement. Just reading Juran’s 1964 “Managerial Breakthrough” illustrates just how much we now take for granted. And how ignorant we have allowed ourselves to become.

Acquired Arrogance is a creeping, silent disease – we slip from second nature to blissful ignorance without noticing when we divorce painful reality and settle down with our own comfortable collective rhetoric.

The wake-up call is all the more painful as a consequence: because it is all the more shocking for each one of us; and because it affects more of us.

The pain is temporary – so long as we treat the cause and not just the symptom.

The first step is to acknowledge the gap – and to start filling it in. It is not technically difficult, time-consuming or expensive. Whatever our starting point we need to put in place the three foundation stones above:

1. Common purpose.
2. Measurement-over-time.
3. Method for Improvement.

Then the rubber meets the road (rather than the sky) and things start to improve – for real. Lots of little things in lots of places at the same time – facilitated by the Junior Managers. The cumulative effect is dramatic. Chaos is tamed; calm is restored; capability builds; and confidence builds. The cynics have to look elsewhere for their sport and the skeptics are able to remain healthy.

Then the Middle Managers feel the new firmness under their feet – where before there were shifting sands. They are able to exert their influence again – to where it makes a difference. They stop chasing Scotch Mist and start reporting real and tangible improvement – with hard evidence. And they rightly claim a slice of the credit.

And the upwelling of win-win-win feedback frees the Senior Managers from getting sucked into reactive fire-fighting and the Victim Vortex; and that releases the emotional and temporal space to start learning and applying System-level Design. That is what is needed to deliver a significant and sustained improvement.

And that creates the stable platform for the Executive Team to do Strategy from. Which is their job.

It all starts with the Three Essentials:

1. A Clear and Common Constancy of Purpose.
2. Measurement-over-time of the Vital Metrics.
3. A Generic Method for Improvement-by-Design.

19/10/2013

Improvement-by-Twitter

Sat 5th October

It started with a tweet.

08:17 [JG] The NHS is its people. If you lose them, you lose the NHS.

09:15 [DO] We are in a PEOPLE business – educating people and creating value.

Sun 6th October

08:32 [SD] Who isn’t in people business? It is only people who buy stuff. Plants, animals, rocks and machines don’t.

09:42 [DO] Very true – it is people who use a service and people who deliver a service and we ALL know what good service is.

09:47 [SD] So onus is on us to walk our own talk. If we don’t all improve our small bits of the NHS then who can do it for us?

Then we were off … the debate was on …

10:04 [DO] True – I can prove I am saving over £160 000.00 a year – roll on PBR !?

10:15 [SD] Bravo David. I recently changed my surgery process: productivity up by 35%. Cost? Zero. How? Process design methods.

11:54 [DO] Exactly – cost neutral because we were thinking differently – so how to persuade the rest?

12:10 [SD] First demonstrate it is possible then show those who want to learn how to do it themselves. http://www.saasoft.com/fish/course

We had hard evidence it was possible … and now MC joined the debate …

12:48 [MC] Simon why are there different FISH courses for safety, quality and efficiency? Shouldn’t good design do all of that?

12:52 [SD] Yes – goal of good design is all three. It just depends where you are starting from: Governance, Operations or Finance.

A number of parallel threads then took off and we all had lots of fun exploring each others knowledge and understanding.

17:28 MC registers on the FISH course.

And that gave me an idea. I emailed an offer – that he could have a complimentary pass for the whole FISH course in return for sharing what he learns as he learns it. He thought it over for a couple of days then said “OK”.

Weds 9th October

06:38 [MC] Over the last 4 years of so, I’ve been involved in incrementally improving systems in hospitals. Today I’m going to start an experiment.

06:40 [MC] I’m going to see if we can do less of the incremental change and more system redesign. To do this I’ve enrolled in FISH

Fri 11th October

06:47 [MC] So as part of my exploration into system design, I’ve done some studies in my clinic this week. Will share data shortly.

21:21 [MC] Here’s a chart showing cycle time of patients in my clinic. Median cycle time 14 mins, but much longer in 2 pic.twitter.com/wu5MsAKk80

21:22 [MC] Here’s the same clinic from patients’ point if view, wait time. Much longer than I thought or would like pic.twitter.com/iVLTDsbG8m

21:24 [MC] Two patients needed to discuss surgery or significant news, that takes time and can’t be rushed.

21:25 [MC] So, although I started on time, worked hard and finished on time. People were waited ages to see me. Template is wrong!

21:27 [MC] By the time I had seen the the 3rd patient, people were waiting 45 mins to see me. That’s poor.

21:28 [MC] The wait got progressively worse until the end of the clinic.

Sunday 13th October

16:02 [MC] As part of my homework on systems, I’ve put my clinic study data into a Gantt chart. Red = waiting, green = seeing me pic.twitter.com/iep2PDoruN

16:34 [SD] Hurrah! The visual power of the Gantt Chart. Worth adding the booked time too – there are Seven Sins of Scheduling to find.

16:36 [SD] Excellent – good idea to sort into booked time order – it makes the planned rate of demand easier to see.

16:42 [SD] Best chart is Work In Progress – count the number of patients at each time step and plot as a run chart.

17:23 [SD] Yes – just count how many lines you cross vertically at each time interval. It can be automated in Excel

17:38 [MC] Like this? pic.twitter.com/fTnTK7MdOp

This is the work-in-progress chart. The most useful process monitoring chart of all. It shows the changing size of the queue over time. Good flow design is associated with small, steady queues.

18:22 [SD] Perfect! You’re right not to plot as XmR – this is a cusum metric. Not a healthy WIP chart this!

There was more to follow but the “ah ha” moment had been seen and shared.

Weds 16th October

MC completes the Online FISH course and receives his well-earned Certificate of Achievement.

This was his with-the-benefit-of-hindsight conclusion:

I wish I had known some of this before. I will have totally different approach to improvement projects now. Key is to measure and model well before doing anything radical.

Improvement Science works.
Improvement-by-Design is a skill that can be learned quickly.
FISH is just a first step.

05/10/2013

The Power of the Converted Skeptic

One of the biggest challenges in Improvement Science is diffusion of an improvement outside the circle of control of the innovator.

It is difficult enough to make a significant improvement in one small area – it is an order of magnitude more difficult to spread the word and to influence others to adopt the new idea!

One strategy is to shame others into change by demonstrating that their attitude and behaviour are blocking the diffusion of innovation.

This strategy does not work. It generates more resistance and amplifies the differences of opinion.

Another approach is to bully others into change by discounting their opinion and just rolling out the “obvious solution” by top-down diktat.

This strategy does not work either. It generates resentment – even if the solution is fit-for-purpose – which it usually is not!

So what does work?

The key to it is to convert some skeptics because a converted skeptic is a powerful force for change.

But doesn’t that fly in the face of established change management theory?

Innovation diffuses from innovators to early-adopters, then to the silent majority, then to the laggards and maybe even dinosaurs … doesn’t it?

Yes – but that style of diffusion is incremental, slow and has a very high failure rate. What is very often required is something more radical, much faster and more reliable. For that it needs both push from the Confident Optimists and pull from some Converted Pessimists. The tipping point does not happen until the silent majority start to come off the fence in droves: and they do that when the noisy optimists and equally noisy pessimists start to agree.

The fence-sitters jump when the tug-o-war stalemate stops and the force for change becomes aligned in the direction of progress.

So how is a skeptic converted?

Simple. By another Converted Skeptic.

Here is a real example.

We are all skeptical about many things that we would actually like to improve.

Personal health for instance. Something like weight. Yawn! Not that Old Chestnut!

We are bombarded with shroud-waver stories that we are facing an epidemic of obesity, rapidly rising rates of diabetes, and all the nasty and life-shortening consequences of that. We are exhorted to eat “five portions of fruit and veg a day” … or else! We are told that we must all exercise our flab away. We are warned of the Evils of Cholesterol and told that overweight children are caused by bad parenting.

The more gullible and fearful are herded en-masse in the direction of the Get-Thin-Quick sharks who then have a veritable feeding frenzy. Their goal is their short-term financial health not the long-term health of their customers.

The more insightful, skeptical and frustrated seek solace in the chocolate Hob Nob jar.

For their part, the healthcare professionals are rewarded for providing ineffective healthcare by being paid-for-activity not for outcome. They dutifully measure the decline and hand out ineffective advice. Their goal is survival too.

The outcome is predictable and seemingly unavoidable.

So when a disruptive innovation comes along that challenges the current dogma and status quo, the healthy skeptics inevitably line up and proclaim that it will not work.

Not that it does not work. They do not know that because they never try it. They are skeptics. Someone else has to prove it to them.

And I am a healthy skeptic about many things.

I am skeptical about diets – the evidence suggests that their proclaimed benefit is difficult to achieve and even more difficult to sustain: and that is the hall-mark of either a poor design or a deliberate, profit-driven, yet legal scam.

So I decided to put an innovative approach to weight loss to the test. It is not a diet – it is a design to achieve and sustain a healthier weight to height ratio. And for it to work it must work for me because I am a diet skeptic.

The start of the story is HERE

I am now a Converted Healthier Skeptic.

I call the innovative design a “2 out of 7 Lo-CHO” policy and what that means is for two days a week I just cut out as much carbohydrate (CHO) as feasible. Stuff like bread, potatoes, rice, pasta and sugar. The rest of the time I do what I normally do. There is no need for me to exercise and no need for me to fill up on Five Fruit and Veg.

The chart above is the evidence of what happened. It shows a 7 kg reduction in weight over 140 days – and that is impressive given that it has required no extra exercise and no need to give up tasty treats completely and definitely no need to boost the bottom-line of a Get-Thin-Quick shark!

It also shows what to expect. The weight loss starts steeper then tails off as it approaches a new equilibrium weight. This is the classic picture of what happens to a “system” when one of its “operational policies” is wisely re-designed.

Patience, persistence and a time-series chart are all that is needed. It takes less than a minute per day to monitor the improvement.

Even I can afford to invest a minute per day.

The BaseLine© chart clearly shows that the day-to-day variation is quite high: and that is expected – it is inherent in the 2-out-of-7 Lo-CHO design. It is the not the short-term change that is the measure of success – it is the long-term improvement that is important.

It is important to measure daily – because it is the daily habit that keeps me mindful, aligned, and on-goal. It is not the measurement itself that is the most important thing – it is the conscious act of measuring and then plotting the dot in the context of the previous dots. The picture tells the story. No further “statistical” analysis is required.

The power of this chart is that it provides hard evidence that is very effective for nudging other skeptics like me into giving the innovative idea a try. I know because I have done that many times now. I have converted other skeptics. It is an innovation infection.

And the same principle appears to apply to other areas. What is critical to success is tangible and visible proof of progress. That is what skeptics need. Then a rational and logical method and explanation that respects their individual opinion and requirements. The design has to work for them. And it must make sense.

They will come out with a string of “Yes … buts” and that is OK because that is how skeptics work. Just answer their questions with evidence and explanations. It can get a bit wearing I admit but it is worth the effort.

An effective Improvement Scientist needs to be a healthy skeptic too – i.e. an open minded one.

08/09/2013

DRAT!

[Bing Bong] The sound bite heralded Leslie joining the regular Improvement Science mentoring session with Bob. They were now using web-technology to run virtual meetings because it allows a richer conversation and saves a lot of time. It is a big improvement.

<Bob> Hi Lesley, how are you today?

<Leslie> OK thank you Bob. I have a thorny issue to ask you about today. It has been niggling me even since we started to share the experience we are gaining from our current improvement-by-design project.

<Bob> OK. That sounds interesting. Can you paint the picture for me?

<Leslie> Better than that – I can show you the picture, I will share my screen with you.

<Bob> OK. I can see that RAG table. Can you give me a bit more context?

<Leslie> Yes. This is how our performance management team have been asked to produce their 4-weekly reports for the monthly performance committee meetings.

<Bob> OK. I assume the “Period” means sequential four week periods … so what is Count, Fail and Fail%?

<Leslie> Count is the number of discharges in that 4 week period, Fail is the number whose length of stay is longer than the target, and Fail% is the ratio of Fail/Count for each 4 week period.

<Bob> It looks odd that the counts are all 28. Is there some form of admission slot carve-out policy?

<Leslie> Yes. There is one admission slot per day for this particular stream – that has been worked out from the average historical activity.

<Bob> Ah! And the Red, Amber, Green indicates what?

<Leslie> That is depends where the Fail% falls in a set of predefined target ranges; less than 5% is green, 5-10% is Amber and more than 10% is red.

<Bob> OK. So what is the niggle?

<Leslie>Each month when we are in the green we get no feedback – a deafening silence. Each month we are in amber we get a warning email. Each month we are in the red we have to “go and explain ourselves” and provide a “back-on-track” plan.

<Bob> Let me guess – this feedback design is not helping much.

<Leslie> It is worse than that – it creates a perpetual sense of fear. The risk of breaching the target is distorting people’s priorities and their behaviour.

<Bob> Do you have any evidence of that?

<Leslie> Yes – but it is anecdotal. There is a daily operational meeting and the highest priority topic is “Which patients are closest to the target length of stay and therefore need to have their discharge expedited?“.

<Bob> Ah yes. The “target tail wagging the quality dog” problem. So what is your question?

<Leslie> How do we focus on the cause of the problem rather than the symptoms? We want to be rid of the “fear of the stick”.

<Bob> OK. What you have hear is a very common system design flaw. It is called a DRAT.

<Leslie> DRAT?

<Bob> “Delusional Ratio and Arbitrary Target”.

<Leslie> Ha! That sounds spot on! “DRAT” is what we say every time we miss the target!

<Bob> Indeed. So first plot this yield data as a time series chart.

<Leslie> Here we go.

<Bob>Good. I see you have added the cut-off thresholds for the RAG chart. These 5% and 10% thresholds are arbitrary and the data shows your current system is unable to meet them. Your design looks incapable.

<Leslie>Yes – and it also shows that the % expressed to one decimal place is meaningless because there are limited possibilities for the value.

<Bob> Yes. These are two reasons that this is a Delusional Ratio; there are quite a few more.

<Leslie> OK and if I plot this as an Individuals charts I can see that this variation is not exceptional.

<Bob> Careful Leslie. It can be dangerous to do this: an Individuals chart of aggregate yield becomes quite insensitive with aggregated counts of relatively rare events, a small number of levels that go down to zero, and a limited number of points. The SPC zealots are compounding the problem and plotting this data as a C-chart or a P-chart makes no difference.

This is all the effect of the common practice of applying an arbitrary performance target then counting the failures and using that as means of control.

It is poor feedback loop design – but a depressingly common one.

<Leslie> So what do we do? What is a better design?

<Bob> First ask what the purpose of the feedback is?

<Leslie> To reduce the number of beds and save money by forcing down the length of stay so that the bed-day load is reduced and so we can do the same activity with fewer beds and at the same time avoid cancellations.

<Bob> OK. That sounds reasonable from the perspective of a tax-payer and a patient. It would also be a more productive design.

<Leslie> I agree but it seems to be having the opposite effect. We are focusing on avoiding breaches so much that other patients get delayed who could have gone home sooner and we end up with more patients to expedite. It is like a vicious circle. And every time we fail we get whacked with the RAG stick again. It is very demoralizing and it generates a lot of resentment and conflict. That is not good for anyone – least of all the patients.

<Bob>Yes. That is the usual effect of a DRAT design. Remember that senior managers have not been trained in process improvement-by-design either so blaming them is also counter-productive. We need to go back to the raw data. Can you plot actual LOS by patient in order of discharge as a run chart.

<Bob> OK – is the maximum LOS target 8 days?

<Leslie> Yes – and this shows we are meeting it most of the time. But it is only with a huge amount of effort.

<Bob> Do you know where 8 days came from?

<Leslie> I think it was the historical average divided by 85% – someone read in a book somewhere that 85% average occupancy was optimum and put 2 and 2 together.

<Bob> Oh dear! The “85% Occupancy is Best” myth combined with the “Flaw of Averages” trap. Never mind – let me explain the reasons why it is invalid to do this.

<Leslie> Yes please!

<Bob> First plot the data as a run chart and as a histogram – do not plot the natural process limits yet as you have done. We need to do some validity checks first.

<Leslie> Here you go.

<Bob> What do you see?

<Leslie> The histogram has more than one peak – and there is a big one sitting just under the target.

<Bob>Yes. This is called the “Horned Gaussian” and is the characteristic pattern of an arbitrary lead-time target that is distorting the behaviour of the system. Just as you have described subjectively. There is a smaller peak with a mode of 4 days and are a few very long length of stay outliers. This multi-modal pattern means that the mean and standard deviation of this data are meaningless numbers as are any numbers derived from them. It is like having a bag of mixed fruit and then setting a maximum allowable size for an unspecified piece of fruit. Meaningless.

<Leslie> And the cases causing the breaches are completely different and could never realistically achieve that target! So we are effectively being randomly beaten with a stick. That is certainly how it feels.

<Bob> They are certainly different but you cannot yet assume that their longer LOS is inevitable. This chart just says – “go and have a look at these specific cases for a possible cause for the difference“.

<Leslie> OK … so if they are from a different system and I exclude them from the analysis what happens?

<Bob> It will not change reality. The current design of this process may not be capable of delivering an 8 day upper limit for the LOS. Imposing a DRAT does not help – it actually makes the design worse! As you can see. Only removing the DRAT will remove the distortion and reveal the underlying process behaviour.

<Leslie> So what do we do? There is no way that will happen in the current chaos!

<Bob> Apply the 6M Design® method. Map, Measure and Model it. Understand how it is behaving as it is then design out all the causes of longer LOS and that way deliver with a shorter and less variable LOS. Your chart shows that your process is stable. That means you have enough flow capacity – so look at the policies. Draw on all your FISH training. That way you achieve your common purpose, and the big nasty stick goes away, and everyone feels better. And in the process you will demonstrate that there is a better feedback design than DRATs and RAGs. A win-win-win design.

<Leslie> OK. That makes complete sense. Thanks Bob! But what you have described is not part of the FISH course.

<Bob> You are right. It is part of the ISP training that comes after FISH. Improvement Science Practitioner.

<Leslie> I think we will need to get a few more people trained in the theory, techniques and tools of Improvement Science.

<Bob> That would appear to be the case. They will need a real example to see what is possible.

<Leslie> OK. I am on the case!

06/07/2013

Step 5 – Monitor

Improvement-by-Design is not the same as Improvement-by-Desire.

Improvement-by-Design has a clear destination and a design that we know can get us there because we have tested it before we implement it.

Improvement-by-Desire has a vague direction and no design – we do not know if the path we choose will take us in the direction we desire to go. We cannot see the twists and turns, the unknown decisions, the forks, the loops, and the dead-ends. We expect to discover those along the way. It is an exercise in hope.

So where pessimists and skeptics dominate the debate then Improvement-by-Design is a safer strategy.

Just over seven weeks ago I started an Improvement-by-Design project – a personal one. The destination was clear: to get my BMI (body mass index) into a “healthy” range by reducing weight by about 5 kg. The design was clear too – to reduce energy input rather than increase energy output. It is a tried-and-tested method – “avoid burning the toast”. The physical and physiological model predicted that the goal was achievable in 6 to 8 weeks.

So what has happened?

To answer that question requires two time-series charts. The input chart of calories ingested and the output chart of weight. This is Step 5 of the 6M Design® sequence.

Remember that there was another parameter in this personal Energy-Weight system: the daily energy expended.

But that is very difficult to measure accurately – so I could not do that.

What I could do was to estimate the actual energy expended from the model of the system using the measured effect of the change. But that is straying into the Department of Improvement Science Nerds. Let us stay in the real world a bit longer.

Here is the energy input chart …

It shows an average calorie intake of 1500 kcal – the estimated required value to achieve the weight loss given the assumptions of the physiological model. It also shows a wide day-to-day variation. It does not show any signal flags (red dots) so an inexperienced Improvementologist might conclude that this just random noise.

It is not. The data is not homogeneous. There is a signal in the system – a deliberate design change – and without that context it is impossible to correctly interpret the chart.

Remember Rule #1: Data without context is meaningless.

The deliberate process design change was to reduce calorie intake for just two days per week by omitting unnecessary Hi-Cal treats – like those nice-but-naughty Chocolate Hobnobs. But which two days varied – so there is no obvious repeating pattern in the chart. And the intake on all days varied – there were a few meals out and some BBQ action.

To separate out these two parts of the voice-of-the-process we need to rationally group the data into the Lo-cal days (F) and the OK-cal days (N).

The grouped BaseLine© chart tells a different story. The two groups clearly have a different average and both have a lower variation-over-time than the meaningless mixed-up chart.

And we can now see a flag – on the second F day. That is a prompt for an “investigation” which revealed: will-power failure. Thursday evening beer and peanuts! The counter measure was to avoid Lo-cal on a Thursday!

What we are seeing here is the fifth step of 6M Design® exercise – the Monitor step.

And as well as monitoring the factor we are changing – the cause; we also monitor the factor we want to influence – the effect.

The effect here is weight. And our design includes a way of monitoring that – the daily weighing.

The output metric BaseLine© chart – weight – shows a very different pattern. It is described as “unstable” because there are clusters of flags (red dots) – some at the start and some at the end. The direction of the instability is “falling” – which is the intended outcome.

So we have robust, statistically valid evidence that our modified design is working.

The weight is falling so the energy going in must be less than the energy being put out. I am burning off the excess lard and without doing any extra exercise. The physics of the system mandate that this is the only explanation. And that was my design specification.

So that is good. Our design is working – but is it working as we designed? Does observation match prediction? This is Improvement-by-Design.

Remember that we had to estimate the other parameter to our model – the average daily energy output – and we guessed a value of 2400 kcal per day using generic published data. Now I can refine the model using my specific measured change in weight – and I can work backwards to calculate the third parameter. And when I did that the number came out at 2300 kcal per day. Not a huge difference – the equivalent of one yummy Chocolate Hobnob a day – but the effect is cumulative. Over the 53 days of the 6M Design® project so far that would be a 5300 kcal difference – about 0.6kg of useless blubber.

So now I have refined my personal energy-weight model using the new data and I can update my prediction and create a new chart – a Deviation from Aim chart.

This is the chart I need to watch to see if I am on the predicted track – and it too is unstable -and not a good direction. It shows that the deviation-from-aim is increasing over time and this is because my original guesstimate of an unmeasurable model parameter was too high.

This means that my current design will not get me to where I want to be, when I what to be there. This tells me I need to tweak my design. And I have a list of options.

1) I could adjust the target average calories per day down from 1500 to 1400 and cut out a few more calories; or

2) I could just keep doing what I am doing and accept that it will take me longer to get to the destination; or

3) I could do a bit of extra exercise to burn the extra 100 kcals a day off, or

4) I could do a bit of any or all three.

And because I am comparing experience with expectation using a DFA chart I will know very quickly if the design tweak is delivering.

And because some nice weather has finally arrived so the BBQ will be busy I have chosen to take longer to get there. I will enjoy the weather, have a few beers and some burgers. And that is OK. It is a perfectly reasonable design option – it is a rational and justifiable choice.

And I need to set my next destination – a weight if about 72 kg according to the BMI chart – and with my calibrated Energy-Weight model I will know exactly how to achieve that weight and how long it will take me. And I also know how to maintain it – by increasing my calorie intake. More beer and peanuts – maybe – or the occasional Chocolate Hobnob even. Hurrah! Win-win-win!

This real-life example illustrates 6M Design® in action and demonstrates that it is a generic framework.

The energy-weight model in this case is a very simple one that can be worked out on the back of a beer mat (which is what I did).

It is called a linear model because the relationship between calories-in and weight-out is approximately a straight line.

Most real-world systems are not like this. Inputs are not linearly related to outputs. They are called non-linear systems: and that makes a BIG difference.

A very common error is to impose a “linear model” on a “non-linear system” and it is a recipe for disappointment and disaster. We do that when we commit the Flaw of Averages error. We do it when we plot linear regression lines through time-series data. We do it when we extrapolate beyond the limits of our evidence. We do it when we equate time with money.

The danger of this error is that our linear model leads us to make unwise decisions and we actually make the problem worse – not better. We then give up in frustration and label the problem as “impossible” or “wicked” or get sucked into to various forms of Snake Oil Sorcery.

The safer approach is to assume the system is non-linear and just let the voice of the system talk to us through our BaseLine© charts. The challenge for us is to learn to understand what the system is saying.

That is why the time-series charts are called System Behaviour Charts and that is why they are an essential component of Improvement-by-Design.

However – there is a step that must happen before this – and that is to get the Foundations in place. The foundation of knowledge on which we can build our new learning. That gap must be filled first.

And anyone who wants to invest in learning the foundations of improvement science can now do so at their own convenience and at their own pace because it is on-line …. and it is here.

29/06/201322/02/2022

Step 6 – Maintain

Anyone with much experience of change will testify that one of the hardest parts is sustaining the hard won improvement.

The typical story is all too familiar – a big push for improvement, a dramatic improvement, congratulations and presentations then six months later it is back where it was before but worse. The cynics are feeding on the corpse of the dead change effort.

The cause of this recurrent nightmare is a simple error of omission.

Failure to complete the change sequence. Missing out the last and most important step. Step 6 – Maintain.

Regular readers may remember the story of the pharmacy project – where a sceptical department were surprised and delighted to discover that zero-cost improvement was achievable and that a win-win-win outcome was not an impossible dream.

Enough time has now passed to ask the question: “Was the improvement sustained?”

The BaseLine© chart above shows their daily performance data on their 2-hour turnaround target for to-take-out prescriptions (TTOs) . The weekends are excluded because the weekend system is different from the weekday system. The first split in the data in Jan 2013 is when the improvement-by-design change was made. Step 4 on the 6M Design® sequence – Modify.

There was an immediate and dramatic improvement in performance that was sustained for about six weeks – then it started to drift back. Bit by Bit. The time-series chart flags it clearly.

So what happened next?

The 12-week review happened next – and it was done by the change leader – in this case the Inspector/Designer/Educator. The review data plotted as a time-series chart revealed instability and that justified an investigation of the root cause – which was that the final and critical step had not been completed as recommended. The inner feedback loop was missing. Step 6 – Maintain was not in place.

The outer feedback loop had not been omitted. That was the responsibility of the experienced change leader.

And the effect of closing the outer-loop is clearly shown by the third segment – a restoration of stability and improved capability. The system is again delivering the improvement it was designed to deliver.

What does this lesson teach us?

The message here is that the sponsors of improvement have essential parts to play in the initiation and the maintenance of change and improvement. If they fail in their responsibility then the outcome is inevitable and predictable. Mediocrity and cynicism.

Part 1: Setting the clarity and constancy of common purpose.

Without a clear purpose then alignment, focus and effectiveness are thwarted. Purpose that changes frequently is not a purpose – it is reactive knee-jerk politics. Constancy of purpose is required because improvement takes time to achieve and to embed. There is always a lag so moving the target while the arrow is in flight is both dangerous and leads to disengagement. Establishing common ground is essential to avoiding the time-wasting discussion and negotiation that is inevitable when opinions differ – which they always do.

Part 2: Respectful challenge.

Effective change leadership requires an ability to challenge from a position of mutual respect. Telling people what to do is not leadership – it is dictatorship. Dodging the difficult conversations and passing the buck to others is not leadership – it is ineffective delegation. Asking people what they want to do is not leadership – it is abdication of responsibility. People need their leaders to challenge them and to respect them at the same time. It is not a contradiction. It is possible to do both.

And one way that a leader of change can challenge with respect is to expose the need for change; to create the context for change; and then to commit to holding those charged with change to account – including themselves. And to make it clear at the start what their expectation is as a leader – and what the consequences of disappointment are.

It is a delight to see individuals, teams, departments and organisations blossom and grow when the context of change is conducive. And it is disappointing to see them wither and shrink when the context of change is laced with cynicide – the toxic product of cynicism.

So what is the next step?

What could an aspirant change leader do to get this for themselves and their organisations?

One option is to become a Student of Improvementology® – and they can do that here.

10/02/2013

The Writing on the Wall – Part II

The retrospectoscope is the favourite instrument of the forensic cynic – the expert in the after-the-event-and-I-told-you-so rhetoric. The rabble-rouser for the lynch-mob.

It feels better to retrospectively nail-to-a-cross the person who committed the Cardinal Error of Omission, and leave them there in emotional and financial pain as a visible lesson to everyone else.

This form of public feedback has been used for centuries.

It is called barbarism, and it has no place in a modern civilised society.

A more constructive question to ask is:

“Could the evolving Mid-Staffordshire crisis have been detected earlier … and avoided?”

And this question exposes a tricky problem: it is much more difficult to predict the future than to explain the past. And if it could have been detected and avoided earlier, then how is that done? And if the how-is-known then is everyone else in the NHS using this know-how to detect and avoid their own evolving Mid-Staffs crisis?

To illustrate how it is currently done let us use the actual Mid-Staffs data. It is conveniently available in Figure 1 embedded in Figure 5 on Page 360 in Appendix G of Volume 1 of the first Francis Report. If you do not have it at your fingertips I have put a copy of it below.

The message does not exactly leap off the page and smack us between the eyes does it? Even with the benefit of hindsight. So what is the problem here?

The problem is one of ergonomics. Tables of numbers like this are very difficult for most people to interpret, so they create a risk that we ignore the data or that we just jump to the bottom line and miss the real message. And It is very easy to miss the message when we compare the results for the current period with the previous one – a very bad habit that is spread by accountants.

This was a slowly emerging crisis so we need a way of seeing it evolving and the better way to present this data is as a time-series chart.

As we are most interested in safety and outcomes, then we would reasonably look at the outcome we do not want – i.e. mortality. I think we will all agree that it is an easy enough one to measure.

MS_RawDeaths This is the raw mortality data from the table above, plotted as a time-series chart. The green line is the average and the red-lines are a measure of variation-over-time. We can all see that the raw mortality is increasing and the red flags say that this is a statistically significant increase. Oh dear!

But hang on just a minute – using raw mortality data like this is invalid because we all know that the people are getting older, demand on our hospitals is rising, A&Es are busier, older people have more illnesses, and more of them will not survive their visit to our hospital. This rise in mortality may actually just be because we are doing more work.

Good point! Let us plot the activity data and see if there has been an increase.

Yes – indeed the activity has increased significantly too.

Told you so! And it looks like the activity has gone up more than the mortality. Does that mean we are actually doing a better job at keeping people alive? That sounds like a more positive message for the Board and the Annual Report. But how do we present that message? What about as a ratio of mortality to activity? That will make it easier to compare ourselves with other hospitals.

Good idea! Here is the Raw Mortality Ratio chart.

Ah ha. See! The % mortality is falling significantly over time. Told you so.

Careful. There is an unstated assumption here. The assumption that the case mix is staying the same over time. This pattern could also be the impact of us doing a greater proportion of lower complexity and lower risk work. So we need to correct this raw mortality data for case mix complexity – and we can do that by using data from all NHS hospitals to give us a frame of reference. Dr Foster can help us with that because it is quite a complicated statistical modelling process. What comes out of Dr Fosters black magic box is the Global Hospital Raw Mortality (GHRM) which is the expected number of deaths for our case mix if we were an ‘average’ NHS hospital.

What this says is that the NHS-wide raw mortality risk appears to be falling over time (which may be for a wide variety of reasons but that is outside the scope of this conversation). So what we now need to do is compare this global raw mortality risk with our local raw mortality risk … to give the Hospital Standardised Mortality Ratio.

This gives us the Mid Staffordshire Hospital HSMR chart. The blue line at 100 is the reference average – and what this chart says is that Mid Staffordshire hospital had a consistently higher risk than the average case-mix adjusted mortality risk for the whole NHS. And it says that it got even worse after 2001 and that it stayed consistently 20% higher after 2003.

Ah! Oh dear! That is not such a positive message for the Board and the Annual Report. But how did we miss this evolving safety catastrophe? We had the Dr Foster data from 2001

This is not a new problem – a similar thing happened in Vienna between 1820 and 1850 with maternal deaths caused by Childbed Fever. The problem was detected by Dr Ignaz Semmelweis who also discovered a simple, pragmatic solution to the problem: hand washing. He blew the whistle but unfortunately those in power did not like the implication that they had been the cause of thousands of avoidable mother and baby deaths. Semmelweis was vilified and ignored, and he did not publish his data until 1861. And even then the story was buried in tables of numbers. Semmelweis went mad trying to convince the World that there was a problem. Here is the full story.

Also, time-series charts were not invented until 1924 – and it was not in healthcare – it was in manufacturing. These tried-and-tested safety and quality improvement tools are only slowly diffusing into healthcare because the barriers to innovation appear somewhat impervious.

And the pores have been clogged even more by the social poison called “cynicide” – the emotional and political toxin exuded by cynics.

So how could we detect a developing crisis earlier – in time to avoid a catastrophe?

The first step is to estimate the excess-death-equivalent. Dr Foster does this for you.Here is the data from the table plotted as a time-series chart that shows that the estimated-excess-death-equivalent per year. It has an average of 100 (that is two per week) and the average should be close to zero. More worryingly the number was increasing steadily over time up to 200 per year in 2006 – that is about four excess deaths per week – on average. It is important to remember that HSMR is a risk ratio and mortality is a multi-factorial outcome. So the excess-death-equivalent estimate does not imply that a clear causal chain will be evident in specific deaths. That is a complete misunderstanding of the method.

I am sorry – you are losing me with the statistical jargon here. Can you explain in plain English what you mean?

OK. Let us use an example.

Suppose we set up a tombola at the village fete and we sell 50 tickets with the expectation that the winner bags all the money. Each ticket holder has the same 1 in 50 risk of winning the wad-of-wonga and a 49 in 50 risk of losing their small stake. At the appointed time we spin the barrel to mix up the ticket stubs then we blindly draw one ticket out. At that instant the 50 people with an equal risk changes to one winner and 49 losers. It is as if the grey fog of risk instantly condenses into a precise, black-and-white, yes-or-no, winner-or-loser, reality.

Translating this concept back into HSMR and Mid Staffs – the estimated 1200 deaths are the just the “condensed risk of harm equivalent”. So, to then conduct a retrospective case note analysis of specific deaths looking for the specific cause would be equivalent to trying to retrospectively work out the reason the particular winning ticket in the tombola was picked out. It is a search that is doomed to fail. To then conclude from this fruitless search that HSMR is invalid, is only to compound the delusion further. The actual problem here is ignorance and misunderstanding of the basic Laws of Physics and Probability, because our brains are not good at solving these sort of problems.

But Mid Staffs is a particularly severe example and it only shows up after years of data has accumulated. How would a hospital that was not as bad as this know they had a risk problem and know sooner? Waiting for years to accumulate enough data to prove there was a avoidable problem in the past is not much help.

That is an excellent question. This type of time-series chart is not very sensitive to small changes when the data is noisy and sparse – such as when you plot the data on a month-by-month timescale and avoidable deaths are actually an uncommon outcome. Plotting the annual sum smooths out this variation and makes the trend easier to see, but it delays the diagnosis further. One way to increase the sensitivity is to plot the data as a cusum (cumulative sum) chart – which is conspicuous by its absence from the data table. It is the running total of the estimated excess deaths. Rather like the running total of swings in a game of golf.

This is the cusum chart of excess deaths and you will notice that it is not plotted with control limits. That is because it is invalid to use standard control limits for cumulative data. The important feature of the cusum chart is the slope and the deviation from zero. What is usually done is an alert threshold is plotted on the cusum chart and if the measured cusum crosses this alert-line then the alarm bell should go off – and the search then focuses on the precursor events: the Near Misses, the Not Agains and the Niggles.

I see. You make it look easy when the data is presented as pictures. But aren’t we still missing the point? Isn’t this still after-the-avoidable-event analysis?

Yes! An avoidable death should be a Never-Event in a designed-to-be-safe healthcare system. It should never happen. There should be no coffins to count. To get to that stage we need to apply exactly the same approach to the Near-Misses, and then the Not-Agains, and eventually the Niggles.

You mean we have to use the SUI data and the IR1 data and the complaint data to do this – and also ask our staff and patients about their Niggles?

Yes. And it is not the number of complaints that is the most useful metric – it is the appearance of the cumulative sum of the complaint severity score. And we need a method for diagnosing and treating the cause of the Niggles too. We need to convert the feedback information into effective action.

Ah ha! Now I understand what the role of the Governance Department is: to apply the tools and techniques of Improvement Science proactively. But our Governance Department have not been trained to do this!

Then that is one place to start – and their role needs to evolve from Inspectors and Supervisors to Demonstrators and Educators – ultimately everyone in the organisation needs to be a competent Healthcare Improvementologist.

OK – I now now what to do next. But wait a minute. This is going to cost a fortune!

This is just one small first step. The next step is to redesign the processes so the errors do not happen in the first place. The cumulative cost saving from eliminating the repeated checking, correcting, box-ticking, documenting, investigating, compensating and insuring is much much more than the one-off investment in learning safe system design.

So the Finance Director should be a champion for safety and quality too.

Yup!

Brill. Thanks. And can I ask one more question? I do not want to appear to skeptical but how do we know we can trust that this risk-estimation system has been designed and implemented correctly? How do we know we are not being bamboozled by statisticians? It has happened before!

That is the best question yet. It is important to remember that HSMR is counting deaths in hospital which means that it is not actually the risk of harm to the patient that is measured – it is the risk to the reputation of hospital! So the answer to your question is that you demonstrate your deep understanding of the rationle and method of risk-of-harm estimation by listing all the ways that such a system could be deliberately “gamed” to make the figures look better for the hospital. And then go out and look for hard evidence of all the “games” that you can invent. It is a sort of creative poacher-becomes-gamekeeper detective exercise.

OK – I sort of get what you mean. Can you give me some examples?

Yes. The HSMR method is based on deaths-in-hospital so discharging a patient from hospital before they die will make the figures look better. Suppose one hospital has more access to end-of-life care in the community than another: their HSMR figures would look better even though exactly the same number of people died. Another is that the HSMR method is weighted towards admissions classified as “emergencies” – so if a hospital admits more patients as “emergencies” who are not actually very sick and discharges them quickly then this will inflated their estimated deaths and make their actual mortality ratio look better – even though the risk-of-harm to patients has not changed.

OMG – so if we have pressure to meet 4 hour A&E targets and we get paid more for an emergency admission than an A&E attendance then admitting to an Assessmen Area and discharging within one day will actually reward the hospital financially, operationally and by apparently reducing their HSMR even though there has been no difference at all to the care that patients actually recieve?

Yes. It is an inevitable outcome of the current system design.

But that means that if I am gaming the system and my HSMR is not getting better then the risk-of-harm to patients is actually increasing and my HSMR system is giving me false reassurance that everything is OK. Wow! I can see why some people might not want that realisation to be public knowledge. So what do we do?

Design the system so that the rewards are aligned with lower risk of harm to patients and improved outcomes.

Is that possible?

Yes. It is called a Win-Win-Win design.

How do we learn how to do that?

Improvement Science.

Footnote I:

The graphs tell a story but they may not create a useful sense of perspective. It has been said that there is a 1 in 300 chance that if you go to hospital you will not leave alive for avoidable causes. What! It cannot be as high as 1 in 300 surely?

OK – let us use the published Mid-Staffs data to test this hypothesis. Over 12 years there were about 150,000 admissions and an estimated 1,200 excess deaths (if all the risk were concentrated into the excess deaths which is not what actually happens). That means a 1 in 130 odds of an avoidable death for every admission! That is twice as bad as the estimated average.

The Mid Staffordshire statistics are bad enough; but the NHS-as-a-whole statistics are cumulatively worse because there are 100’s of other hospitals that are each generating not-as-obvious avoidable mortality. The data is very ‘noisy’ so it is difficult even for a statistical expert to separate the message from the morass.

And remember – that the “expected” mortality is estimated from the average for the whole NHS – which means that if this average is higher than it could be then there is a statistical bias and we are being falsely reassured by being ‘not statistically significantly different’ from the pack.

And remember too – for every patient and family that suffers and avoidable death there are many more that have to live with the consequences of avoidable but non-fatal harm. That is called avoidable morbidity. This is what the risk really means – everyone has a higher risk of some degree of avoidable harm. Psychological and physical harm.

This challenge is not just about preventing another Mid Staffs – it is about preventing 1000’s of avoidable deaths and 100,000s of patients avoidably harmed every year in ‘average’ NHS trusts.

It is not a mass conspiracy of bad nurses, bad doctors, bad managers or bad policians that is the root cause.

It is poorly designed processes – and they are poorly designed because the nurses, doctors and managers have not learned how to design better ones. And we do not know how because we were not trained to. And that education gap was an accident – an unintended error of omission.

Our urgently-improve-NHS-safety-challenge requires a system-wide safety-by-design educational and cultural transformation.

And that is possible because the knowledge of how to design, test and implement inherently safe processes exists. But it exists outside healthcare.

And that safety-by-design training is a worthwhile investment because safer-by-design processes cost less to run because they require less checking, less documenting, less correcting – and all the valuable nurse, doctor and manager time freed up by that can be reinvested in more care, better care and designing even better processes and systems.

Everyone Wins – except the cynics who have a choice: to eat humble pie or leave.

Footnote II:

In the debate that has followed the publication of the Francis Report a lot of scrutiny has been applied to the method by which an estimated excess mortality number is created and it is necessary to explore this in a bit more detail.

The HSMR is an estimate of relative risk – it does not say that a set of specific patients were the ones who came to harm and the rest were OK. So looking at individual deaths and looking for the specific causes is to completely misunderstand the method. So looking at the actual deaths individually and looking for identifiable cause-and-effect paths is an misuse of the message. When very few if any are found to conclude that HSMR is flawed is an error of logic and exposes the ignorance of the analyst further.

HSMR is not perfect though – it has weaknesses. It is a benchmarking process the”standard” of 100 is always moving because the collective goal posts are moving – the reference is always changing . HSMR is estimated using data submitted by hospitals themselves – the clinical coding data. So the main weakness is that it is dependent on the quality of the clinicial coding – the errors of comission (wrong codes) and the errors of omission (missing codes). Garbage In Garbage Out.

Hospitals use clinically coded data for other reasons – payment. The way hospitals are now paid is based on the volume and complexity of that activity – Payment By Results (PbR) – using what are called Health Resource Groups (HRGs). This is a better and fairer design because hospitals with more complex (i.e. costly to manage) case loads get paid more per patient on average. The HRG for each patient is determined by their clinical codes – including what are called the comorbidities – the other things that the patient has wrong with them. More comorbidites means more complex and more risky so more money and more risk of death – roughly speaking. So when PbR came in it becamevery important to code fully in order to get paid “properly”. The problem was that before PbR the coding errors went largely unnoticed – especially the comorbidity coding. And the errors were biassed – it is more likely to omit a code than to have an incorrect code. Errors of omission are harder to detect. This meant that by more complete coding (to attract more money) the estimated casemix complexity would have gone up compared with the historical reference. So as actual (not estimated) NHS mortality has gone down slightly then the HSMR yardstick becomes even more distorted. Hospitals that did not keep up with the Coding Game would look worse even though their actual risk and mortality may be unchanged. This is the fundamental design flaw in all types of benchmarking based on self-reported data.

The actual problem here is even more serious. PbR is actually a payment for activity – not a payment for outcomes. It is calculated from what it cost to run the average NHS hospital using a technique called Reference Costing which is the same method that manufacturing companies used to decide what price to charge for their products. It has another name – Absorption Costing. The highest performers in the manufacturing world no longer use this out-of-date method. The implication of using Reference Costing and PbR in the NHS are profound and dangerous:

If NHS hospitals in general have poorly designed processes that create internal queues and require more bed days than actually necessary then the cost of that “waste” becomes built into the future PbR tariff. This means average length of stay (LOS) is financially rewarded. Above average LOS is financially penalised and below average LOS makes a profit. There is no financial pressure to improve beyound average. This is called the Regression to the Mean effect. Also LOS is not a measure of quality – so there is a to shorten length of stay for purely financial reasons – to generate a surplus to use to fund growth and capital investment. That pressure is non-specific and indiscrimiate. PbR is necessary but it is not sufficient – it requires an quality of outcome metric to complete it.

So the PbR system is based on an out-of-date cost-allocation model and therefore leads to the very problems that are contributing to the MidStaffs crisis – financial pressure causing quality failures and increased risk of mortality. MidStaffs may be a chance victim of a combination of factors coming together like a perfect storm – but those same factors are present throughout the NHS because they are built into the current design.

One solution is to move towards a more up-to-date financial model called stream costing. This uses the similar data to reference costing but it estimates the “ideal” cost of the “necessary” work to achieve the intended outcome. This stream cost becomes the focus for improvement – the streams where there is the biggest gap between the stream cost and the reference cost are the focus of the redesign activity. Very often the root cause is just poor operational policy design; sometimes it is quality and safety design problems. Both are solvable without investment in extra capacity. The result is a higher quality, quicker, lower-cost stream. Win-win-win. And in the short term that is rewarded by a tariff income that exceeds cost and a lower HSMR.

Radically redesigning the financial model for healthcare is not a quick fix – and it requires a lot of other changes to happen first. So the sooner we start the sooner we will arrive.

09/02/2013

The Writing On The Wall – Part I

The writing is on the wall for the NHS.

It is called the Francis Report and there is a lot of it. Just the 290 recommendations runs to 30 pages. It would need a very big wall and very small writing to put it all up there for all to see.

So predictably the speed-readers have latched onto specific words – such as “Inspectors“.

Recommendation 137 “Inspection should remain the central method for monitoring compliance with fundamental standards.”

And it goes further by recommending “A specialist cadre of hospital inspectors should be established …”

A predictable wail of anguish rose from the ranks “Not more inspectors! The last lot did not do much good!”

The word “cadre” is not one that is used in common parlance so I looked it up:

Cadre: 1. a core group of people at the center of an organization, especially military; 2. a small group of highly trained people, often part of a political movement.

So it has a military, centralist, specialist, political flavour. No wonder there was a wail of anguish! Perhaps this “cadre of inspectors” has been unconsciously labelled with another name? Persecutors.

Of more interest is the “highly trained” phrase. Trained to do what? Trained by whom? Clearly none of the existing schools of NHS management who have allowed the fiasco to happen in the first place. So who – exactly? Are these inspectors intended to be protectors, persecutors, or educators?

And what would they inspect?

And how would they use the output of such an inspection?

Would the fear of the inspection and its possible unpleasant consequences be the stick to motivate compliance?

Is the language of the Francis Report going to create another brick wall of resistance from the rubble of the ruins of the reputation of the NHS? Many self-appointed experts are already saying that implementing 290 recommendations is impossible.

They are incorrect.

The number of recommendations is a measure of the breadth and depth of the rot. So the critical-to-success factor is to implement them in a well-designed order. Get the first few in place and working and the rest will follow naturally. Get the order wrong and the radical cure will kill the patient.

So where do we start?

Let us look at the inspection question again. Why would we fear an external inspection? What are we resisting? There are three facets to this: first we do not know what is expected of us; second we do not know if we can satisfy the expectation; and third we fear being persecuted for failing to achieve the impossible.

W Edwards Deming used a very effective demonstration of the dangers of well-intended but badly-implemented quality improvement by inspection: it was called the Red Bead Game. The purpose of the game was to illustrate how to design an inspection system that actually helps to achieve the intended goal. Sustained improvement.

This is applied Improvement Science and I will illustrate how it is done with a real and current example.

I am assisting a department in a large NHS hospital to improve the quality of their service. I have been sent in as an external inspector. The specific quality metric they have been tasked to improve is the turnaround time of the specialist work that they do. This is a flow metric because a patient cannot leave hospital until this work is complete – and more importantly it is a flow and quality metric because when the hospital is full then another patient, one who urgently needs to be admitted, will be waiting for the bed to be vacated. One in one out.

The department have been set a standard to meet, a target, a specification, a goal. It is very clear and it is easily measurable. They have to turnaround each job of work in less than 2 hours. This is called a lead time specification and it is arbitrary. But it is not unreasonable from the perspective of the patient waiting to leave and for the patient waiting to be admitted. Neither want to wait.

The department has a sophisticated IT system that measures their performance. They use it to record when each job starts and when each job is finished and from those two events the software calculates the lead time for each job in real-time. At the end of each day the IT system counts how many jobs were completed in less than 2 hours and compares this with how many were done in total and calculates a ratio which it presents as a percentage in the range of 0 and 100. This is called the process yield. The department are dedicated and they work hard and they do all the work that arrives each day the same day – no matter how long it takes. And at the end of each day they have their score for that day. And it is almost never 100%. Not never. Almost never. But it is not good enough and they are being blamed for it. In turn they blame others for making their job more difficult. It is a blame-game and it has been going on for years.

So how does an experienced Improvement Science-trained Inspector approach this sort of “wicked” problem?

First we need to get the writing on the wall – we need to see the reality – we need to “plot the dots” – we need to see what the performance is doing over time – we need to see the voice of the process. And that requires only their data, a pencil, some paper and for the chart to be put on the on the wall where everyone can see it.

This is what their daily % yield data for three consecutive weeks looked like as a time-series chart. The thin blue line is the 100% yield target.

The 100% target was only achieved on three days – and they were all Sundays. On the other Sunday it was zero (which may mean that there was no data to calculate a ratio from).

There is wide variation from one day to the next and it is the variation as well as the average that is of interest to an improvement scientist. What is the source of the variation it? If 100% yield can be achieved some days then what is different about those days?

So our Improvement science-trained Inspector will now re-plot the data in a different way – as rational groups. This exposes the issue clearly. The variation on Weekends is very wide and the performance during the Weekdays is much less variable. What this says is that the weekend system and the weekday system are different. This means that it is invalid to combine the data for both.

It also raises the question of why there is such high variation in yield only at weekends? The chart cannot answer the question, so our IS-trained Inspector digs a bit deeper and discovers that the volume of work done at the weekend is low, the staffing of the department is different, and that the recording of the events is less reliable. In short – we cannot even trust the weekend data – so we have two reasons to justify excluding it from our chart and just focusing on what happens during the week.

We re-plot our chart, marking the excluded weekend data as not for analysis.

We can now see that the weekday performance of our system is visible, less variable, and the average is a long way from 100%.

The team are working hard and still only achieving mediocre performance. That must mean that they need something that is missing. Motivating maybe. More people maybe. More technology maybe. But there is no more money for more people or technology and traditional JFDI motivation does not seem to have helped.

This looks like an impossible task!

So what does our Inspector do now? Mark their paper with a FAIL and put them on the “To Be Sacked for Failing to Meet an Externally Imposed Standard“ heap?

Nope.

Our IS-trained Inspector calculates the limits of expected performance from the data and plots these limits on the chart – the red lines. The computation is not difficult – it can be done with a calculator and the appropriate formula. It does not need a sophisticated IT system.

What this chart now says is “The current design of this process is capable of delivering between 40% and 85% yield. To expect it do do better is unrealistic”. The implication for action is “If we want 100% yield then the process needs to be re-designed.” Persecution will not work. Blame will not work. Hoping-for-the-best will not work. The process must be redesigned.

Our improvement scientist then takes off the Inspector’s hat and dons the Designer’s overalls and gets to work. There is a method to this and it is called 6M Design®.

First we need to have a way of knowing if any future design changes have a statistically significant impact – for better or for worse. To do this the chart is extended into the future and the red lines are projected forwards in time as the black lines called locked-limits. The new data is compared with this projected baseline as it comes in. The weekends and bank holidays are excluded because we know that they are a different system. On one day (20/12/2012) the yield was surprisingly high. Not 100% but more than the expected upper limit of 85%.

The alerts us to investigate and we found that it was a ‘hospital bed crisis’ and an ‘all hands to the pumps’ distress call went out.

Extra capacity was pulled to the process and less urgent work was delayed until later. It is the habitual reaction-to-a-crisis behaviour called “expediting” or “firefighting”. So after the crisis had waned and the excitement diminished the performance returned to the expected range. A week later the chart signals us again and we investigate but this time the cause was different. It was an unusually quiet day and there was more than enough hands on the pumps.

Both of these days are atypically good and we have an explanation for each of them. This is called an assignable cause. So we are justified in excluding these points from our measure of the typical baseline capability of our process – the performance the current design can be expected to deliver.

An inexperienced manager might conclude from these lessons that what is needed is more capacity. That sounds and feels intuitively obvious and it is correct that adding more capacity may improve the yield – but that does not prove that lack of capacity is the primary cause. There are many other causes of long lead times just as there are many causes of headaches other than brain tumours! So before we can decide the best treatment for our under-performing design we need to establish the design diagnosis. And that is done by inspecting the process in detail. And we need to know what we are looking for; the errors of design commission and the errors of design omission. The design flaws.

Only a trained and experienced process designer can spot the flaws in a process design. Intuition will trick the untrained and inexperienced.

Once the design diagnosis is established then the redesign stage can commence. Design always works to a specification and in this case it was clear – to significantly improve the yield to over 90% at no cost. In other words without needing more people, more skills, more equipment, more space, more anything. The design assignment was made trickier by the fact that the department claimed that it was impossible to achieve significant improvement without adding extra capacity. That is why the Inspector had been sent in. To evaluate that claim.

The design inspection revealed a complex adaptive system – not a linear, deterministic, production-line that manufactures widgets. The department had to cope with wide variation in demand, wide variation in quality of request, wide variation in job complexity, and wide variation in urgency – all at the same time. But that is the nature of healthcare and acute hospital work. That is the expected context.

The analysis of the current design revealed that it was not well suited for this requirement – and the low yield was entirely predictable. The analysis also revealed that the root cause of the low yield was not lack of either flow-capacity or space-capacity.

This insight led to the suggestion that it would be possible to improve yield without increasing cost. The department were polite but they did not believe it was possible. They had never seen it, so why should they be expected to just accept this on faith?

So, the next step was to develop, test and demonstrate a new design and that was done in three stages. The final stage was the Reality Test – the actual process design was changed for just one day – and the yield measured and compared with the predicted improvement.

This was the validity test – the proof of the design pudding. And to visualise the impact we used the same technique as before – extending the baseline of our time-series chart, locking the limits, and comparing the “after” with the “before”.

The yellow point marks the day of the design test. The measured yield was well above the upper limit which suggested that the design change had made a significant improvement. A statistically significant improvement. There was no more capacity than usual and the day was not unusually quiet. At the end of the day we held a team huddle.

Our first question was “How did the new design feel?” The consensus was “Calmer, smoother, fewer interruptions” and best of all “We finished on time – there was no frantic catch up at the end of the day and no one had to stay late to complete the days work!”

The next question was “Do we want to continue tomorrow with this new design or revert back to the old one?” The answer was clear “Keep going with the new design. It feels better.”

The same chart was used to show what happened over the next few days – excluding the weekends as before. The improvement was sustained – it did not revert to the original because the process design had been changed. Same work, same capacity, different process – higher yield. The red flags on the charts mark the statistically significant evidence of change and the cluster of red flags is very strong statistical evidence that the improvement is not due to chance.

The next phase of the 6M Design® method is to continue to monitor the new process to establish the new baseline of expectation. That will require at least twelve data points and it is in progress. But we have enough evidence of a significant improvement. This means that we have no credible justification to return to the old design, and it also implies that it is no longer valid to compare the new data against the old projected limits. Our chart tells us that we need to split the data into before-and-after and to calculate new averages and limits for each segment separately. We have changed the voice of the process by changing the design.

And when we split the data at the point-of-change then the red flags disappear – which means that our new design is stable. And it has a new capability – a better one. We have moved closer to our goal of 100% yield. It is still early days and we do not really have enough data to calculate the new capability.

What we can say is that we have improved average quality yield from 63% to about 90% at no cost using a sequence of process diagnose, design, deliver. Study-Plan-Do.

And we have hard evidence that disproves the impossibility hypothesis.

And that was the goal of the first design change – it was not to achieve 100% yield in one jump. Our design simulation had predicted an improvement to about 90%. And there are other design changes to follow that need this stable foundation to build on. The order of implementation is critical – and each change needs time to bed in before the next change is made. That is the nature of the challenge of improving a complex adaptive system.

The cost to the department was zero but the benefit was huge. The bigger benefit to the organisation was felt elsewhere – the ‘customers’ saw a higher quality, quicker process – and there will be a financial benefit for the whole system. It will be difficult to measure with our current financial monitoring systems but it will be real and it will be there – lurking in the data.

The improvement required a trained and experienced Inspector/Designer/Educator to start the wheel of change turning. There are not many of these in the NHS – but the good news is that the first level of this training is now available.

What this means for the post-Francis Report II NHS is that those who want to can choose to leap over the wall of resistance that is being erected by the massing legions of noisy cynics. It means we can all become our own inspectors. It means we can all become our own improvers. It means we can all learn to redesign our systems so that they deliver higher safety, better quality, more quickly and at no extra one-off or recurring cost. We all can have nothing to fear from the Specialist Cadre of Hospital Inspectors.

The writing is on the wall.

15/02/2013 – Two weeks in and still going strong. The yield has improved from 63% to 92% and is stable. Improvement-by-design works.

10/03/2013 – Six weeks in and a good time to test if the improvement has been sustained.

The chart is the weekly performance plotted for 17 weeks before the change and for 5 weeks after. The advantage of weekly aggregated data is that it removes the weekend/weekday 7-day cycle and reduces the effect of day-to-day variation.

The improvement is obvious, significant and has been sustained. This is the objective improvement. More important is the subjective improvement.

Here is what Chris M (departmental operational manager) wrote in an email this week (quoted with permission):

Hi Simon

It is I who need to thank you for explaining to me how to turn our pharmacy performance around and ultimately improve the day to day work for the pharmacy team (and the trust staff). This will increase job satisfaction and make pharmacy a worthwhile career again instead of working in constant pressure with a lack of achievement that had made the team feel rather disheartened and depressed. I feel we can now move onwards and upwards so thanks for the confidence boost.

Best wishes and many thanks

Chris

This is what Improvement Science is all about!

19/01/2013

A Ray Of Hope

It does not seem to take much to bring a real system to an almost standstill. Six inches of snow falling between 10 AM and 2 PM in a Friday in January seems to be enough!

It was not so much the amount of snow – it was the timing. The decision to close many schools was not made until after the pupils had arrived – and it created a logistical nightmare for parents.

Many people suddenly needed to get home before they expected which created an early rush hour and gridlocked the road system.

The same number of people travelled the same distance in the same way as they would normally – it just took them a lot longer. And the queues created more problems as people tried to find work-arounds to bypass the traffic jams.

How many thousands of hours of life-time was wasted sitting in near-stationary queues of cars? How many millions of poundsworth of productivity was lost? How much will the catchup cost?

And yet while we grumble we shrug our shoulders and say “It is just one of those things. We cannot control the weather. We just have to grin and bear it.”

Actually we do not have to. And we do not need a weather machine to control the weather. Mother Nature is what it is.

Exactly the same behaviour happens in many systems – and our conclusion is the same. We assume the chaos and queues are inevitable.

They are not.

They are symptoms of the system design – and specifically they are the inevitable outcomes of the time-design.

But it is tricky to visualise the time-design of a system. We can see the manifestations of the poor time-design, the queues and chaos, but we do not so easily perceive the causes. So the poor time-design persists. We are not completely useless though; there are lots of obvious things we can do. We can devise ingenious ways to manage the queues; we can build warehouses to hold the queues; we can track the jobs in the queues using sophisticated and expensive information technology; we can identify the hot spots; we can recruit and deploy expediters, problem-solvers and fire-fighters to facilitate the flow through the hottest of them; and we can pump capacity and money into defences, drains and dramatics. And our efforts seem to work so we congratulate ourselves and conclude that these actions are the only ones that work. And we keep clamouring for more and more resources. More capacity, MORE capacity, MORE CAPACITY.

Until we run out of money!

And then we have to stop asking for more. And then we start rationing. And then we start cost-cutting. And then the chaos and queues get worse.

And all the time we are not aware that our initial assumptions were wrong.

The chaos and queues are not inevitable. They are a sign of the time-design of our system. So we do have other options. We can improve the time-design of our system. We do not need to change the safety-design; nor the quality-design; nor the money-design. Just improving the time-design will be enough. For now.

So the $64,000,000 question is “How?”

Before we explore that we need to demonstrate What is possible. How big is the prize?

The class of system design problem that cause particular angst are called mixed-priority mixed-complexity crossed-stream designs. We encounter dozens of them in our daily life and we are not aware of it. One of particular interest to many is called a hospital. The mixed-priority dimension is the need to manage some patients as emergencies, some as urgent and some as routine. The mixed-complexity dimension is that some patients are easy and some are complex. The crossed-stream dimension is the aggregation of specialised resources into departments. Expensive equipment and specific expertise. We then attempt to push patients with different priorites long different paths through these different departments . And it is a management nightmare!

Our usual and “obvious” response to this challenge is called a carve-out design. And that means we chop up our available resource capacity into chunks. And we do that in two ways: chunks of time and chunks of space. We try to simplify the problem by dissecting it into bits that we can understand. We separate the emergency departments from the planned-care facilities. We separate outpatients from inpatients. We separate medicine from surgery – and we then intellectually dissect our patients into organ systems: brains, lungs, hearts, guts, bones, skin, and so on – and we create separate departments for each one. Neurology, Respiratory, Cardiology, Gastroenterology, Orthopaedics, Dermatology to list just a few. And then we become locked into the carve-out design silos like prisoners in cages of our own making.

And so it is within the departments that are sub-systems of the bigger system. Simplification, dissection and separation. Ad absurdam.

The major drawback with our carve-up design strategy is that it actually makes the system more complicated. The number of necessary links between the separate parts grows exponentially. And each link can hold a small queue of waiting tasks – just as each side road can hold a queue of waiting cars. The collective complexity is incomprehensible. The cumulative queue is enormous. The opportunity for confusion and error grows exponentially. Safety and quality fall and cost rises. Carve-out is an inferior time-design.

But our goal is correct: we do need to simplify the system so that means simplifying the time-design.

To illustrate the potential of this ‘simplify the time-design’ approach we need a real example.

One way to do this is to create a real system with lots of carve-out time-design built into it and then we can observe how it behaves – in reality. A carefully designed Table Top Game is one way to do this – one where the players have defined Roles and by following the Rules they collectively create a real system that we can map, measure and modify. With our Table Top Team trained and ready to go we then pump realistic tasks into our realistic system and measure how long they take in reality to appear out of the other side. And we then use the real data to plot some real time-series charts. Not theoretical general ones – real specific ones. And then we use the actual charts to diagnose the actual causes of the actual queues and actual chaos.

This is the time-series chart of a real Time-Design Game that has been designed using an actual hospital department and real observation data. Which department it was is not of importance because it could have been one of many. Carve-out is everywhere.

During one run of the Game the Team processed 186 tasks and the chart shows how long each task took from arriving to leaving (the game was designed to do the work in seconds when in the real department it took minutes – and this was done so that one working day could be condensed from 8 hours into 8 minutes!)

There was a mix of priority: some tasks were more urgent than others. There was a mix of complexity: some tasks required more steps that others. The paths crossed at separate steps where different people did defined work using different skills and special equipment. There were handoffs between all of the steps on all of the streams. There were lots of links. There were many queues. There were ample opportunities for confusion and errors.

But the design of the real process was such that the work was delivered to a high quality – there were very few output errors. The yield was very high. The design was effective. The resources required to achieve this quality were represented by the hours of people-time availability – the capacity. The cost. And the work was stressful, chaotic, pressured, and important – so it got done. Everyone was busy. Everyone pulled together. They helped each other out. They were not idle. They were a good team. The design was efficient.

The thin blue line on the time-series chart is the “time target” set by the Organisation. But the effective and efficient system design only achieved it 77% of the time. So the “obvious” solution was to clamour for more people and for more space and for more equipment so that the work can be done more quickly to deliver more jobs on-time. Unfortunately the Rules of the Time-Design Game do not allow this more-money option. There is no more money.

To succeed at the Time-Design Game the team must find a way to improve their delivery time performance with the capacity they have and also to deliver the same quality. But this is impossible! If it were possible then the solution would be obvious and they would be doing it already. No one can succeed on the Time-Design Game.

Wrong. It is possible. And the assumption that the solution is obvious is incorrect. The solution is not obvious – at least to the untrained eye.

To the trained eye the time-series chart shows the characteristic signals of a carve-out time-design. The high task-to-task variation is highly suggestive as is the pattern of some of the earlier arrivals having a longer lead time. An experienced system designer can diagnose a carve-out time-design from a set of time-series charts of a process just as a doctor can diagnose the disease from the vital signs chart for a patient. And when the diagnosis is confirmed with a verification test then the time-Redesign phase can start.

This chart shows what happened after the time-design of the system was changed – after some of the carve-out design was modified. The Y-axis scale is the same as before – and the delivery time improvement is dramatic. The Time-ReDesigned system is now delivering 98% achievement of the “on time target”.

The important thing to be aware of is that exactly the same work was done, using exactly the same steps, and exactly the same resources. No one had to be retrained, released or recruited. The quality was not impaired. And the cost was actually less because less overtime was needed to mop up the spillover of work at the end of the day.

And the Time-ReDesigned system feels better to work in. It is not chaotic; flow is much smoother; and it is busy yet relaxed and even fun. The same activity is achieved by the same people doing the same work in the same sequence. Only the Time-Design has changed. A change that delivered a win for the workers!

What was the impact of this cost-saving improvement on the customers of this service? They can now be 98% confident that they will get their task completed correctly in less than 120 minutes. Before the Time-Redesign the 98% confidence limit was 470 minutes! So this is a win for the customers too!

And the Time-ReDesigned system is less expensive so it is a win for whoever is paying.

Same safety and quality, quicker with less variation, and at lower cost. Win-Win-Win.

And the usual reaction to playing the Time-ReDesign Game is incredulous disbelief. Some describe it as a “light bulb” moment when they see how the diagnosis of the carve-out time-design is made and and how the Time-ReDesign is done. They say “If I had not seen it with my own eyes I would not have believed it.” And they say “The solutions are simple but not obvious!” And they say “I wish I had learned this years ago!” And thay apologise for being so skeptical before.

And there are those who are too complacent, too careful or too cynical to play the Time-ReDesign Game (which is about 80% of people actually) – and who deny themselves the opportunity of a win-win-win outcome. And that is their choice. They can continue to grin and bear it – for a while longer.

And for the 20% who want to learn how to do Time ReDesign for real in their actual systems there is now a Ray Of Hope.

And the Ray of Hope is illuminating a signpost on which is written “This Way to Improvementology“.

12/01/2013

Quality First or Time First?

Before we explore this question we need to establish something. If the issue is Safety then that always goes First – and by safety we mean “a risk of harm that everyone agrees is unacceptable”.

Many Improvement Zealots state dogmatically that the only way reach the Nirvanah of “Right Thing – On Time – On Budget” is to focus on Quality First.

This is incorrect. And what makes it incorrect is the word only.

Experience teaches us that it is impossible to divert people to focus on quality when everyone is too busy just keeping afloat. If they stop to do something else then they will drown. And they know it.

The critical word here is busy.

‘Busy’ means that everyone is spending all their time doing stuff – important stuff – the work, the checking, the correcting, the expediting, the problem solving, and the fire-fighting. They are all busy all of the time.

So when a Quality Zealot breezes in and proclaims ‘You should always focus on quality first … that will solve all the problems’ then the reaction they get is predictable. The weary workers listen with their arms-crossed, roll-their eyes, exchange knowing glances, sigh, shrug, shake their heads, grit their teeth, and trudge back to fire-fighting. Their scepticism and cynicism has been cut a notch deeper. And the weary workers get labelled as ‘Not Interested In Quality’ and ‘Resisting Change’ and ‘Laggards’ by the Quality Zealot who has spent more time studying and regurgitating rhetoric than investing time in observing and understanding reality.

The problem here is the seemingly innocuous word ‘always’. It is too absolute. Too black-and-white. Too dogmatic. Too simple.

Sometimes focussing on Quality First is a wise decision. And that situation is when there is low-quality and idle-time. There is some spare capacity to re-invest in understanding the root causes of the quality issues, in designing them out of the process, and in implementing the design changes.

But when everyone is busy – when there is no idle-time – then focussing on quality first is not a wise decision because it can actually make the problem worse!

[The Quality Zealots will now be turning a strange red colour, steam will be erupting from their ears and sparks will be coming from their finger-tips as they reach for their keyboards to silence the heretical anti-quality lunatic. “Burn, burn, burn” they rant].

When everyone is busy then the first thing to focus on is Time.

And because everyone is busy then the person doing the Focus-on-Time stuff must be someone else. Someone like an Improvementologist. The Quality Zealot is a liability at this stage – but they become an asset later when the chaos has calmed.

And what our Improvementologist is looking for are queues – also known as Work-in-Progress or WIP.

Why WIP? Why not where the work is happening? Why not focus on resource utilisation? Isn’t that a time metric?

Yes, resource utilisation is a time-related metric but because everyone is busy then resource utilisation will be high. So looking at utilisation will only confirm what we already know. And everyone is busy doing important stuff – they are not stupid – they are busy and they are doing their best given the constraints of their process design.

The queue is where an Improvementologist will direct attention first. And the specific focus of their attention is the cause of the queue.

This is because there is only one cause of a queue: a mismatch-over-time between demand and activity.

So, the critical first step to diagnosing the cause of a queue is to make the flow visible – to plot the time-series charts of demand, activity and WIP. Until that is done then no progress will be made with understanding what is happening and it wil be impossible to decide what to do. We need a diagnosis before we can treat. And to get a diagnosis we need data from an examination of our process; and we need data on the history of how it has developed. And we need to know how to convert that data into information, and then into understanding, and then into design options, and then into a wise decision, and then into action, and then into improvement.

And we now know how to spot an experienced Improvementologist because the first thing they will look for are the Queues not the Quality.

But why bother with the flow and the queues at all? Customers are not interested in them! If time is the focus then surely it is turnaround times and waiting times that we need to measure! Then we can compare our performance with our ‘target’ and if it is out of range we can then apply the necessary ‘pressure’!

This is indeed what we observe. So let us explore the pros and cons of this approach with an example.

We are the manager of a support department that receives requests, processes them and delivers the output back to the sender. We could be one of many support departments in an organisation: human resources, procurement, supplies, finance, IT, estates and so on. We are the Backroom Brigade. We are the unsung heros and heroines.

The requests for our service come in different flavours – some are easy to deal with, others are more complex. They also come with different priorities – urgent, soon and routine. And they arrive as a mixture of dribbles and deluges. Our job is to deliver high quality work (i.e. no errors) within the delivery time expected by the originator of the request (i.e. on time). If we do that then we do not get complaints (but we do not get compliments either).

From the outside things look mostly OK. We deliver mostly on quality and mostly on time. But on the inside our department is in chaos! Every day brings a new fire to fight. Everyone is busy and the pressure and chaos are relentless. We are keeping our head above water – but only just. We do not enjoy our work-life. It is not fun. Our people are miserable too. Some leave – others complain – others just come to work, do stuff, take the money and go home – like Zombies. They comply.

Once in the past we were were seduced by the sweet talk of a Quality Zealot. We were promised Nirvanah. We were advised to look at the quality of the requests that we get. And this suggestion resonated with us because we were very aware that the requests were of variable quality. Our people had to spend time checking-and-correcting them before we could process them. The extra checking had improved the quality of what we deliver – but it had increased our costs too. So the Quality Zealot told us we should work more closely with our customers and to ‘swim upstream’ to prevent the quality problems getting to us in the first place. So we sent some of our most experienced and most expensive Inspectors to paddle upstream. But our customers were also very busy and, much as they would have liked, they did not have time to focus on quality either. So our Inspectors started doing the checking-and-correcting for our customers. Our people are now working for our customers but we still pay their wages. And we do not have enough Inspectors to check-and-correct all the requests at source so we still need to keep a skeleton crew of Inspectors in the department. And these stay-at-home Inspectors are stretched too thin and their job is too pressured and too stressful. So no one wants to do it.And given the choice they would all rather paddle out to the customers first thing in the morning to give them as much time as possible to check-and-correct the requests so the days work can be completed on time. It all sounds perfectly logical and rational – but it does not seem to have worked as promised. The stay-at-home Inspectors can only keep up with the more urgent work, delivery of the less urgent work suffers and the chronic chaos and fire-fighting are now aggravated by a stream of interruptions from customers asking when their ‘non-urgent’ requests will be completed.

The Quality Zealot insisted we should always answer the phone to our customers – so we take the calls – we expedite the requests – we solve the problems – and we fight-the-fire. Day, after day, after day.

We now know what Purgatory means. Retirement with a pension or voluntary redundancy with a package are looking more attractive – if only we can keep going long enough.

And the last thing we need is more external inspection, more targets, and more expensive Quality Zealots telling us what to do!

And when we go and look we see a workplace that appears just as chaotic and stressful and angry as we feel. There are heaps of work in progress everywhere – the phone is always ringing – and our people are running around like headless chickens, expediting, fire-fighting and getting burned-out: physically and emotionally. And we feel powerless to stop it. So we hide.

Does this fictional fiasco feel familiar? It is called the Miserable Job Purgatory Vortex.

Now we know the characteristic pattern of symptoms and signs: constant pressure of work, ever present threat of quality failure, everyone busy, just managing to cope, target-stick-and-carrot management, a miserable job, and demotivated people.

The issue here is that the queues are causing some of the low quality. It is not always low quality that causes all of the queues.

Queues create delays, which generate interruptions, which force investigation, which generates expediting, which takes time from doing the work, which consumes required capacity, which reduces activity, which increases the demand-activity mismatch, which increases the queue, which increases the delay – and so on. It is a vicious circle. And interruptions are a fertile source of internally generated errors which generates even more checking and correcting which uses up even more required capacity which makes the queues grow even faster and longer. Round and round. The cries for ‘we need more capacity’ get louder. It is all hands to the pump – but even then eventually there is a crisis. A big mistake happens. Then Senior Management get named-blamed-and shamed, money magically appears and is thrown at the problem, capacity increases, the symptoms settle, the cries for more capacity go quiet – but productivity has dropped another notch. Eventually the financial crunch arrives.

One symptom of this ‘reactive fire-fight design’ is that people get used to working late to catch up at the end of the day so that the next day they can start the whole rollercoaster ride again. And again. And again. At least that is a form of stability. We can expect tomorrow to be just a s miserable as today and yesterday and the day before that. But TOIL (Time Off In Lieu) costs money.

The way out of the Miserable Job Purgatory Vortex is to diagnose what is causing the queue – and to treat that first.

And that means focussing on Time first – and that means Focussing on Flow first. And by doing that we will improve delivery, improve quality and improve cost because chaotic systems generate errors which need checking and correcting which costs more. Time first is a win-win-win strategy too.

And we already have everything we need to start. We can easily count what comes in and when and what goes out and when.

The first step is to plot the inflow over time (the demand), the outflow over time (the activity), and from that we work out and plot the Work-in-Progress over time. With these three charts we can start the diagnostic process and by that path we can calm the chaos.

And then we can set to work on the Quality Improvement.

13/01/2013 – Newspapers report that 17 hospitals are “dangerously understaffed” Sound familiar?

Next week we will explore how to diagnose the root cause of a queue using Time charts.

For an example to explore please play the SystemFlow Game by clicking here

08/12/201207/09/2024

The Six Dice Game

<Ring Ring><Ring Ring>

Hello, you are through to the Improvement Science Helpline. How can we help?

This is Leslie, one of your apprentices. Could I speak to Bob – my Improvement Science coach?

Yes, Bob is free. I will connect you now.

<Ring Ring><Ring Ring>

B: Hello Leslie, Bob here. What is on your mind?

L: Hi Bob, I have a problem that I do not feel my Foundation training has equipped me to solve. Can I talk it through with you?

B: Of course. Can you outline the context for me?

L: OK. The context is a department that is delivering an acceptable quality-of-service and is delivering on-time but is failing financially. As you know we are all being forced to adopt austerity measures and I am concerned that if their budget is cut then they will fail on delivery and may start cutting corners and then fail on quality too. We need a win-win-win outcome and I do not know where to start with this one.

B: OK – are you using the 6M Design method?

L: Yes – of course!

B: OK – have you done The 4N Chart for the customer of their service?

L: Yes – it was their customers who asked me if I could help and that is what I used to get the context.

B: OK – have you done The 4N Chart for the department?

L: Yes. And that is where my major concerns come from. They feel under extreme pressure; they feel they are working flat out just to maintain the current level of quality and on-time delivery; they feel undervalued and frustrated that their requests for more resources are refused; they feel demoralized; demotivated and scared that their service may be ‘outsourced’. On the positive side they feel that they work well as a team and are willing to learn. I do not know what to do next.

B: OK. Dispair not. This sounds like a very common and treatable system illness. It is a stream design problem which may be the reason your Foundations training feels insufficient. Would you like to see how a Practitioner would approach this?

L: Yes please!

B: OK. Have you mapped their internal process?

L: Yes. It is a six-step process for each job. Each step has different requirements and are done by different people with different skills. In the past they had a problem with poor service quality so extra safety and quality checks were imposed by the Governance department. Now the quality of each step is measured on a 1-6 scale and the quality of the whole process is the sum of the individual steps so is measured on a scale of 6 to 36. They now have been given a minimum quality target of 21 to achieve for every job. How they achieve that is not specified – it was left up to them.

B: OK – do they record their quality measurement data?

L: Yes – I have their report.

B: OK – how is the information presented?

L: As an average for the previous month which is reported up to the Quality Performance Committee.

B: OK – what was the average for last month?

L: Their results were 24 – so they do not have an issue delivering the required quality. The problem is the costs they are incurring and they are being labelled by others as ‘inefficient’. Especially the departments who are in budget and they are annoyed that this failing department keeps getting ‘bailed out’.

B: OK. One issue here is the quality reporting process is not alerting you to the real issue. It sounds from what you say that you have fallen into the Flaw of Averages trap.

L: I don’t understand. What is the Flaw of Averages trap?

B: The answer to your question will become clear. The finance issue is a symptom – an effect – it is unlikely to be the cause. When did this finance issue appear?

L: Just after the Safety and Quality Review. They needed to employ more agency staff to do the extra work created by having to meet the new Minimum Quality target.

B: OK. I need to ask you a personal question. Do you believe that improving quality always costs more?

L: I have to say that I am coming to that conclusion. Our Governance and Finance departments are always arguing about it. Governance state ‘a minimum standard of safety and quality is not optional’ and finance say ‘but we are going out of business’. They are at loggerheads. The service departments get caught in the cross-fire.

B: OK. We will need to use reality to demonstrate that this belief is incorrect. Rhetoric alone does not work. If it did then we would not be having this conversation. Do you have the raw data from which the averages are calculated?

L: Yes. We have the data. The quality inspectors are very thorough!

B: OK – can you plot the quality scores for the last fifty jobs as a BaseLine chart?

L: Yes – give me a second. The average is 24 as I said.

B: OK – is the process stable?

L: Yes – there is only one flag for the fifty. I know from my Foundations training that is not a cause for alarm.

B: OK – what is the process capability?

L: I am sorry – I don’t know what you mean by that?

B: My apologies. I forgot that you have not completed the Practitioner training yet. The capability is the range between the red lines on the chart.

L: Um – the lower line is at 17 and the upper line is at 31.

L: OK – how many points lie below the target of 21.

B: None of course. They are meeting their Minimum Quality target. The issue is not quality – it is money.

There was a pause. Leslie knew from experience that when Bob paused there was a surprise coming.

B: Can you email me your chart?

A cold-shiver went down Leslie’s back. What was the problem here? Bob had never asked to see the data before.

Sure. I will send it now. The recent fifty is on the right, the data on the left is from after the quality inspectors went in and before the the Minimum Quality target was imposed. This is the chart that Governance has been using as evidence to justify their existence because they are claiming the credit for improving the quality.

B: OK – thanks. I have got it – let me see. Oh dear.

Leslie was shocked. She had never heard Bob use language like ‘Oh dear’.

There was another pause.

B: Leslie, what is the context for this data? What does the X-axis represent?

Leslie looked at the chart again – more closely this time. Then she saw what Bob was getting at. There were fifty points in the first group, and about the same number in the second group. That was not the interesting part. In the first group the X-axis went up to 50 in regular steps of five; in the second group it went from 50 to just over 149 and was no longer regularly spaced. Eventually she replied.

Bob, that is a really good question. My guess it is that this is the quality of the completed work.

B: It is unwise to guess. It is better to go and see reality.

You are right. I knew that. It is drummed into us during the Foundations training! I will go and ask. Can I call you back?

B: Of course. I will email you my direct number.

<Ring Ring><Ring Ring>

B: Hello, Bob here.

L: Bob – it is Leslie. I am so excited! I have discovered something amazing.

B: Hello Leslie. That is good to hear. Can you tell me what you have discovered?

L: I have discovered that better quality does not always cost more.

B: That is a good discovery. Can you prove it with data?

L: Yes I can! I am emailing you the chart now.

B: OK – I am looking at your chart. Can you explain to me what you have discovered?

L: Yes. When I went to see for myself I saw that when a job failed the Minimum Quality check at the end then the whole job had to be re-done because there was no time to investigate and correct the causes of the failure. The people doing the work said that they were helpless victims of errors that were made upstream of them – and they could not predict from one job to the next what the error would be. They said it felt like quality was a lottery and that they were just firefighting all the time. They knew that just repeating the work was not solving the problem but they had no other choice because they were under enormous pressure to deliver on-time as well. The only solution they could see is was to get more resources but their requests were being refused by Finance on the grounds that there is no more money. They felt completely trapped.

B: OK. Can you describe what you did?

L: Yes. I saw immediately that there were so many sources of errors that it would be impossible for me to tackle them all. So I used the tool that I had learned in the Foundations training: the Niggle-o-Gram. That focussed us and led to a surprisingly simple, quick, zero-cost process design change. We deliberately did not remove the Inspection-and-Correction policy because we needed to know what the impact of the change would be. Oh, and we did one other thing that challenged the current methods. We plotted every attempt, both the successes and the failures, on the BaseLine chart so we could see both the the quality and the work done on one chart. And we updated the chart every day and posted it chart on the notice board so everyone in the department could see the effect of the change that they had designed. It worked like magic! They have already slashed their agency staff costs, the whole department feels calmer and they are still delivering on-time. And best of all they now feel that they have the energy and time to start looking at the next niggle. Thank you so much! Now I see how the tools and techniques I learned in Foundations are so powerful and now I understand better the reason we learned them first.

B: Well done Leslie. You have taken an important step to becoming a fully fledged Practitioner. You have learned some critical lessons in this challenge.

This scenario is fictional but realistic.

And it has been designed so that it can be replicated easily using a simple game that requires only pencil, paper and some dice.

If you do not have some dice handy then you can use this little program that simulates rolling six dice.

The Six Digital Dice program (for PC only).

Instructions
1. Prepare a piece of A4 squared paper with the Y-axis marked from zero to 40 and the X-axis from 1 to 80.
2. Roll six dice and record the score on each (or roll one die six times) – then calculate the total.
3. Plot the total on your graph. Left-to-right in time order. Link the dots with lines.
4. After 25 dots look at the chart. It should resemble the leftmost data in the charts above.
5. Now draw a horizontal line at 21. This is the Minimum Quality Target.
6. Keep rolling the dice – six per cycle, adding the totals to the right of your previous data.

But this time if the total is less than 21 then repeat the cycle of six dice rolls until the score is 21 or more. Record on your chart the output of all the cycles – not just the acceptable ones.

7. Keep going until you have 25 acceptable outcomes. As long as it takes.

Now count how many cycles you needed to complete in order to get 25 acceptable outcomes. You should find that it is about twice as many as before you “imposed” the Inspect-and-Correct QI policy.

This illustrates the problem of an Inspection-and-Correction design for quality improvement. It does improve the quality of the final output – but at a higher cost.

We are treating the symptoms (effects) and ignoring the disease (causes).

The internal design of the process is unchanged so it is still generating mistakes.

How much quality improvement you get and how much it costs you is determined by the design of the underlying process – which has not changed. There is a Law of Diminishing returns here – and a big risk.

The risk is that if quality improves as the result of applying a quality target then it encourages the Governance thumbscrews to be tightened further and forces those delivering the service further into cross-fire between Governance and Finance.

The other negative consequence of the Inspect-and-Correct approach is that it increases both the average and the variation in lead time which also fuels the calls for more targets, more sticks, calls for more resources and pushes costs up even further.

The lesson from this simple exercise seems clear.

The better strategy for improving quality is to design the root causes of errors out of the processes because then we will get improved quality and improved delivery and improved productivity and we will discover that we have improved safety as well. Win-win-win-win.

The Six Dice Game is a simpler version of the famous Red Bead Game that W Edwards Deming used to explain why, in the modern world, the arbitrary-target-driven-command-and-control-stick-and-carrot style of performance management creates more problems than it solves.

The illusion is of short-term gain but the reality is of long-term pain.

And if you would like to see and hear Deming talking about the science of improvement there is a video of him speaking in 1984. He is at the bottom of the page. Click here.

24/11/2012

The Three R’s

Processes are like people – they get poorly – sometimes very poorly.

Poorly processes present with symptoms. Symptoms such as criticism, complaints, and even catastrophes.

Poorly processes show signs. Signs such as fear, queues and deficits.

So when a process gets very poorly what do we do?

We follow the Three R’s

1-Resuscitate
2-Review
3-Repair

Resuscitate means to stabilize the process so that it is not getting sicker.

Review means to quickly and accurately diagnose the root cause of the process sickness.

Repair means to make changes that will return the process to a healthy and stable state.

So the concept of ‘stability’ is fundamental and we need to understand what that means in practice.

Stability means ‘predictable within limits’. It is not the same as ‘constant’. Constant is stable but stable is not necessarily constant.

Predictable implies time – so any measure of process health must be presented as time-series data.

We are now getting close to a working definition of stability: “a useful metric of system performance that is predictable within limits over time”.

So what is a ‘useful metric’?

There will be at least three useful metrics for every system: a quality metric, a time metric and a money metric.

Quality is subjective. Money is objective. Time is both.

Time is the one to start with – because it is the easiest to measure.

And if we treat our system as a ‘black box’ then from the outside there are three inter-dependent time-related metrics. These are external process metrics (EPMs) – sometimes called Key Performance Indicators (KPIs).

Flow in – also called demand
Flow out – also called activity
Delivery time – which is the time a task spends inside our system – also called the lead time.

But this is all starting to sound like rather dry, conceptual, academic mumbo-jumbo … so let us add a bit of realism and drama – let us tell this as a story …

[reveal heading=”Click here to reveal the story …“]

Picture yourself as the manager of a service that is poorly. Very poorly. You are getting a constant barrage of criticism and complaints and the occasional catastrophe. Your service is struggling to meet the required delivery time performance. Your service is struggling to stay in budget – let alone meet future cost improvement targets. Your life is a constant fire-fight and you are getting very tired and depressed. Nothing you try seems to make any difference. You are starting to think that anything is better than this – even unemployment! But you have a family to support and jobs are hard to come by in austere times so jumping is not an option. There is no way out. You feel you are going under. You feel are drowning. You feel terrified and helpless!

In desperation you type “Management fire-fighting” into your web search box and among the list of hits you see “Process Improvement Emergency Service”. That looks hopeful. The link takes you to a website and a phone number. What have you got to lose? You dial the number.

It rings twice and a calm voice answers.

?“You are through to the Process Improvement Emergency Service – what is the nature of the process emergency?”

“Um – my service feels like it is on fire and I am drowning!”

The calm voice continues in a reassuring tone.

?“OK. Have you got a minute to answer three questions?”

“Yes – just about”.

?“OK. First question: Is your service safe?”

“Yes – for now. We have had some catastrophes but have put in lots of extra safety policies and checks which seems to be working. But they are creating a lot of extra work and pushing up our costs and even then we still have lots of criticism and complaints.”

?“OK. Second question: Is your service financially viable?”

“Yes, but not for long. Last year we just broke even, this year we are projecting a big deficit. The cost of maintaining safety is ‘killing’ us.”

?“OK. Third question: Is your service delivering on time?”

“Mostly but not all of the time, and that is what is causing us the most pain. We keep getting beaten up for missing our targets. We constantly ask, argue and plead for more capacity and all we get back is ‘that is your problem and your job to fix – there is no more money’. The system feels chaotic. There seems to be no rhyme nor reason to when we have a good day or a bad day. All we can hope to do is to spot the jobs that are about to slip through the net in time; to expedite them; and to just avoid failing the target. We are fire-fighting all of the time and it is not getting better. In fact it feels like it is getting worse. And no one seems to be able to do anything other than blame each other.”

There is a short pause then the calm voice continues.

?“OK. Do not panic. We can help – and you need to do exactly what we say to put the fire out. Are you willing to do that?”

“I do not have any other options! That is why I am calling.”

The calm voice replied without hesitation.

?“We all always have the option of walking away from the fire. We all need to be prepared to exercise that option at any time. To be able to help then you will need to understand that and you will need to commit to tackling the fire. Are you willing to commit to that?”

You are surprised and strangely reassured by the clarity and confidence of this response and you take a moment to compose yourself.

“I see. Yes, I agree that I do not need to get toasted personally and I understand that you cannot parachute in to rescue me. I do not want to run away from my responsibility – I will tackle the fire.”

?“OK. First we need to know how stable your process is on the delivery time dimension. Do you have historical data on demand, activity and delivery time?”

“Hey! Data is one thing I do have – I am drowning in the stuff! RAG charts that blink at me like evil demons! None of it seems to help though – the more data I get sent the more confused I become!”

?“OK. Do not panic. The data you need is very specific. We need the start and finish events for the most recent one hundred completed jobs. Do you have that?”

“Yes – I have it right here on a spreadsheet – do I send the data to you to analyse?”

?“There is no need to do that. I will talk you through how to do it.”

“You mean I can do it now?”

?“Yes – it will only take a few minutes.”

“OK, I am ready – I have the spreadsheet open – what do I do?”

?“Step 1. Arrange the start and finish events into two columns with a start and finish event for each task on each row.

You copy and paste the data you need into a new worksheet.

“OK – done that”.

?“Step 2. Sort the two columns into ascending order using the start event.”

“OK – that is easy”.

?“Step 3. Create a third column and for each row calculate the difference between the start and the finish event for that task. Please label it ‘Lead Time’”.

“OK – do you want me to calculate the average Lead Time next?”

There was a pause. Then the calm voice continued but with a slight tinge of irritation.

?“That will not help. First we need to see if your system is unstable. We need to avoid the Flaw of Averages trap. Please follow the instructions exactly. Are you OK with that?”

This response was a surprise and you are starting to feel a bit confused.

“Yes – sorry. What is the next step?”

?“Step 4: Plot a graph. Put the Lead Time on the vertical axis and the start time on the horizontal axis”.

“OK – done that.”

?“Step 5: Please describe what you see?”

“Um – it looks to me like a cave full of stalagtites. The top is almost flat, there are some spikes, but the bottom is all jagged.”

?“OK. Step 6: Does the pattern on the left-side and on the right-side look similar?”

“Yes – it does not seem to be rising or falling over time. Do you want me to plot the smoothed average over time or a trend line? They are options on the spreadsheet software. I do that use all the time!”

The calm voice paused then continued with the irritated overtone again.

?“No. There is no value is doing that. Please stay with me here. A linear regression line is meaningless on a time series chart. You may be feeling a bit confused. It is common to feel confused at this point but the fog will clear soon. Are you OK to continue?”

An odd feeling starts to grow in you: a mixture of anger, sadness and excitement. You find yourself muttering “But I spent my own hard-earned cash on that expensive MBA where I learned how to do linear regression and data smoothing because I was told it would be good for my career progression!”

?“I am sorry I did not catch that? Could you repeat it for me?”

“Um – sorry. I was talking to myself. Can we proceed to the next step?”

?”OK. From what you say it sounds as if your process is stable – for now. That is good. It means that you do not need to Resuscitate your process and we can move to the Review phase and start to look for the cause of the pain. Are you OK to continue?”

An uncomfortable feeling is starting to form – one that you cannot quite put your finger on.

“Yes – please”.

?Step 7: What is the value of the Lead Time at the ‘cave roof’?”

“Um – about 42”

?“OK – Step 8: What is your delivery time target?”

“42”

?“OK – Step 9: How is your delivery time performance measured?”

“By the percentage of tasks that are delivered late each month. Our target is better than 95%. If we fail any month then we are named-and-shamed at the monthly performance review meeting and we have to explain why and what we are going to do about it. If we succeed then we are spared the ritual humiliation and we are rewarded by watching others else being mauled instead. There is always someone in the firing line and attendance at the meeting is not optional!”

You also wanted to say that the data you submit is not always completely accurate and that you often expedite tasks just to avoid missing the target – in full knowkedge that the work had not been competed to the required standard. But you hold that back. Someone might be listening.

There was a pause. Then the calm voice continued with no hint of surprise.

?“OK. Step 10. The most likely diagnosis here is a DRAT. You have probably developed a Gaussian Horn that is creating the emotional pain and that is fuelling the fire-fighting. Do not panic. This is a common and curable process illness.”

You look at the clock. The conversation has taken only a few minutes. Your feeling of panic is starting to fade and a sense of relief and curiosity is growing. Who are these people?

“Can you tell me more about a DRAT? I am not familiar with that term.”

?“Yes. Do you have two minutes to continue the conversation?”

“Yes indeed! You have my complete attention for as long as you need. The emails can wait.”

The calm voice continues.

?“OK. I may need to put you on hold or call you back if another emergency call comes in. Are you OK with that?”

“You mean I am not the only person feeling like this?”

?“You are not the only person feeling like this. The process improvement emergency service, or PIES as we call it, receives dozens of calls like this every day – from organisations of every size and type.”

“Wow! And what is the outcome?”

There was a pause. Then the calm voice continued with an unmistakeable hint of pride.

?“We have a 100% success rate to date – for those who commit. You can look at our performance charts and the client feedback on the website.”

“I certainly will! So can you explain what a DRAT is?”

And as you ask this you are thinking to yourself ‘I wonder what happened to those who did not commit?’

The calm voice interrupts your train of thought with a well-practiced explanation.

?“DRAT stands for Delusional Ratio and Arbitrary Target. It is a very common management reaction to unintended negative outcomes such as customer complaints. The concept of metric-ratios-and-performance-specifications is not wrong; it is just applied indiscriminately. Using DRATs can drive short-term improvements but over a longer time-scale they always make the problem worse.”

One thought is now reverberating in your mind. “I knew that! I just could not explain why I felt so uneasy about how my service was being measured.” And now you have a new feeling growing – anger. You control the urge to swear and instead you ask:

“And what is a Horned Gaussian?”

The calm voice was expecting this question.

?“It is easier to demonstrate than to explain. Do you still have your spreadsheet open and do you know how to draw a histogram?”

“Yes – what do I need to plot?”

?“Use the Lead Time data and set up ten bins in the range 0 to 50 with equal intervals. Please describe what you see”.

It takes you only a few seconds to do this. You draw lots of histograms – most of them very colourful but meaningless. No one seems to mind though.

“OK. The histogram shows a sort of heap with a big spike on the right hand side – at 42.”

The calm voice continued – this time with a sense of satisfaction.

?“OK. You are looking at the Horned Gaussian. The hump is the Gaussian and the spike is the Horn. It is a sign that your complex adaptive system behaviour is being distorted by the DRAT. It is the Horn that causes the pain and the perpetual fire-fighting. It is the DRAT that causes the Horn.”

“Is it possible to remove the Horn and put out the fire?”

?“Yes.”

This is what you wanted to hear and you cannot help cutting to the closure question.

“Good. How long does that take and what does it involve?”

The calm voice was clearly expecting this question too.

?“The Gaussian Horn is a non-specific reaction – it is an effect – it is not the cause. To remove it and to ensure it does not come back requires treating the root cause. The DRAT is not the root cause – it is also a knee-jerk reaction to the symptoms – the complaints. Treating the symptoms requires learning how to diagnose the specific root cause of the lead time performance failure. There are many possible contributors to lead time and you need to know which are present because if you get the diagnosis wrong you will make an unwise decision, take the wrong action and exacerbate the problem.”

Something goes ‘click’ in your head and suddently your fog of confusion evaporates. It is like someone just switched a light on.

“Ah Ha! You have just explained why nothing we try seems to work for long – if at all. How long does it take to learn how to diagnose and treat the specific root causes?”

The calm voice was expecting this question and seemed to switch to the next part of the script.

?“It depends on how committed the learner is and how much unlearning they have to do in the process. Our experience is that it takes a few hours of focussed effort over a few weeks. It is rather like learning any new skill. Guidance, practice and feedback are needed. Just about anyone can learn how to do it – but paradoxically it takes longer for the more experienced and, can I say, cynical managers. We believe they have more unlearning to do.”

You are now feeling a growing sense of urgency and excitement.

“So it is not something we can do now on the phone?”

?“No. This conversation is just the first step.”

You are eager now – sitting forward on the edge of your chair and completely focussed.

“OK. What is the next step?”

There is a pause. You sense that the calm voice is reviewing the conversation and coming to a decision.

?“Before I can answer your question I need to ask you something. I need to ask you how you are feeling.”

That was not the question you expected! You are not used to talking about your feelings – especially to a complete stranger on the phone – yet strangely you do not sense that you are being judged. You have is a growing feeling of trust in the calm voice.

You pause, collect your thoughts and attempt to put your feelings into words.

“Er – well – a mixture of feelings actually – and they changed over time. First I had a feeling of surprise that this seems so familiar and straightforward to you; then a sense of resistance to the idea that my problem is fixable; and then a sense of confusion because what you have shown me challenges everything I have been taught; and then a feeling distrust that there must be a catch and then a feeling of fear of embarassement if I do not spot the trick. Then when I put my natural skepticism to one side and considered the possibility as real then there was a feeling of anger that I was not taught any of this before; and then a feeling of sadness for the years of wasted time and frustration from battling something I could not explain. Eventually I started to started to feel that my cherished impossibility belief was being shaken to its roots. And then I felt a growing sense of curiosity, optimism and even excitement that is also tinged with a feeling of fear of disappointment and of having my hopes dashed – again.”

There was a pause – as if the calm voice was digesting this hearty meal of feelings. Then the calm voice stated:

?“You are experiencing the Nerve Curve. It is normal and expected. It is a healthy sign. It means that the healing process has already started. You are part of your system. You feel what it feels – it feels what you do. The sequence of negative feelings: the shock, denial, anger, sadness, depression and fear will subside with time and the positive feelings of confidence, curiosity and excitement will replace them. Do not worry. This is normal and it takes time. I can now suggest the next step.”

You now feel like you have just stepped off an emotional rollercoaster – scary yet exhilarating at the same time. A sense of relief sweeps over you. You have shared your private emotional pain with a stranger on the phone and the world did not end! There is hope.

“What is the next step?”

This time there was no pause.

?“To commit to learning how to diagnose and treat your process illnesses yourself.”

“You mean you do not sell me an expensive training course or send me a sharp-suited expert who will come tell me what to do and charge me a small fortune?”

There is an almost sarcastic tone to your reply that you regret as soon as you have spoken.

Another pause. An uncomfortably long one this time. You sense the calm voice knows that you know the answer to your own question and is waiting for you to answer it yourself.

You answer your own question.

“OK. I guess not. Sorry for that. Yes – I am definitely up for learning how! What do I need to do.”

?“Just email us. The address is on the website. We will outline the learning process. It is neither difficult nor expensive.”

The way this reply was delivered – calmly and matter-of-factly – was reassuring but it also promoted a new niggle – a flash of fear.

“How long have I got to learn this?”

This time the calm voice had an unmistakable sense of urgency that sent a cold prickles down your spine.

?”Delay will add no value. You are being stalked by the Horned Gaussian. This means your system is on the edge of a catastrophe cliff. It could tip over any time. You cannot afford to relax. You must maintain all your current defenses. It is a learning-by-doing process. The sooner you start to learn-by-doing the sooner the fire starts to fade and the sooner you move away from the edge of the cliff.”

“OK – I understand – and I do not know why I did not seek help a long time ago.”

The calm voice replied simply.

?”Many people find seeking help difficult. Especially senior people”.

Sensing that the conversation is coming to an end you feel compelled to ask:

“I am curious. Where do the DRATs come from?”

?“Curiosity is a healthy attitude to nurture. We believe that DRATs originated in finance departments – where they were originally called Fiscal Averages, Ratios and Targets. At some time in the past they were sucked into operations and governance departments by a knowledge vacuum created by an unintended error of omission.”

You are not quite sure what this unfamiliar language means and you sense that you have strayed outside the scope of the “emergency script” but the phrase ‘error of omission sounds interesting’ and pricks your curiosity. You ask:

“What was the error of omission?”

?“We believe it was not investing in learning how to design complex adaptive value systems to deliver capable win-win-win performance. Not investing in learning the Science of Improvement.”

“I am not sure I understand everything you have said.”

?“That is OK. Do not worry. You will. We look forward to your email. My name is Bob by the way.”

“Thank you so much Bob. I feel better just having talked to someone who understands what I am going through and I am grateful to learn that there is a way out of this dark pit of despair. I will look at the website and send the email immediately.”

?”I am happy to have been of assistance.”

[/reveal]

03/11/2012

A Recipe for Improvement PIE.

Most of us are realists. We have to solve problems in the real world so we prefer real examples and step-by-step how-to-do recipes.

A minority of us are theorists and are more comfortable with abstract models and solving rhetorical problems.

Many of these Improvement Science blog articles debate abstract concepts – because I am a strong iNtuitor by nature. Most realists are Sensors – so by popular request here is a “how-to-do” recipe for a Productivity Improvement Exercise (PIE)

Step 1 – Define Productivity.

There are many definitions we could choose because productivity means the results delivered divided by the resources used. We could use any of the three currencies – quality, time or money – but the easiest is money. And that is because it is easier to measure and we have well established department for doing it – Finance – the guardians of the money. There are two other departments who may need to be involved – Governance (the guardians of the safety) and Operations (the guardians of the delivery).

So the definition we will use is productivity = revenue generated divided cost incurred.

Step 2 – Draw a map of the process we want to make more productive.

This means creating a picture of the parts and their relationships to each other – in particular what the steps in the process are; who does what, where and when; what is done in parallel and what is done in sequence; what feeds into what and what depends on what. The output of this step is a diagram with boxes and arrows and annotations – called a process map. It tells us at a glance how complex our process is – the number of boxes and the number of arrows. The simpler the process the easier it is to demonstrate a productivity improvement quickly and unambiguously.

Step 3 – Decide the objective metrics that will tell us our productivity.

We have chosen a finanical measure of productivity so we need to measure revenue and cost over time – and our Finance department do that already so we do not need to do anything new. We just ask them for the data. It will probably come as a monthly report because that is how Finance processes are designed – the calendar month accounting cycle is not negotiable.

We will also need some internal process metrics (IPMs) that will link to the end of month productivity report values because we need to be observing our process more often than monthly. Weekly, daily or even task-by-task may be necessary – and our monthly finance reports will not meet that time-granularity requirement.

These internal process metrics will be time metrics.

Start with objective metrics and avoid the subjective ones at this stage. They are necessary but they come later.

Step 4 – Measure the process.

There are three essential measures we usually need for each step in the process: A measure of quality, a measure of time and a measure of cost. For the purposes of this example we will simplify by making three assumptions. Quality is 100% (no mistakes) and Predictability is 100% (no variation) and Necessity is 100% (no worthless steps). This means that we are considering a simplified and theoretical situation but we are novices and we need to start with the wood and not get lost in the trees.

The 100% Quality means that we do not need to worry about Governance for the purposes of this basic recipe.

The 100% Predictability means that we can use averages – so long as we are careful.

The 100% Necessity means that we must have all the steps in there or the process will not work.

The best way to measure the process is to observe it and record the events as they happen. There is no place for rhetoric here. Only reality is acceptable. And avoid computers getting in the way of the measurement. The place for computers is to assist the analysis – and only later may they be used to assist the maintenance – after the improvement has been achieved.

Many attempts at productivity improvement fail at this point – because there is a strong belief that the more computers we add the better. Experience shows the opposite is usually the case – adding computers adds complexity, cost and the opportunity for errors – so beware.

Step 5 – Identify the Constraint Step.

The meaning of the term constraint in this context is very specific – it means the step that controls the flow in the whole process. The critical word here is flow. We need to identify the current flow constraint.

A tap or valve on a pipe is a good example of a flow constraint – we adjust the tap to control the flow in the whole pipe. It makes no difference how long or fat the pipe is or where the tap is, begining, middle or end. (So long as the pipe is not too long or too narrow or the fluid too gloopy because if they are then the pipe will become the flow constraint and we do not want that).

The way to identify the constraint in the system is to look at the time measurements. The step that shows the same flow as the output is the constraint step. (And remember we are using the simplified example of no errors and no variation – in real life there is a bit more to identifying the constraint step).

Step 6 – Identify the ideal place for the Constraint Step.

This is the critical-to-success step in the PIE recipe. Get this wrong and it will not work.

This step requires two pieces of measurement data for each step – the time data and the cost data. So the Operational team and the Finance team will need to collaborate here. Tricky I know but if we want improved productivity then there is no alternative.

Lots of productivity improvement initiatives fall at the Sixth Fence – so beware. If our Finance and Operations departments are at war then we should not consider even starting the race. It will only make the bad situation even worse!

If they are able to maintain an adult and respectful face-to-face conversation then we can proceed.

The time measure for each step we need is called the cycle time – which is the time interval from starting one task to being ready to start the next one. Please note this is a precise definition and it should be used exactly as defined.

The money measure for each step we need is the fully absorbed cost of time of providing the resource. Your Finance department will understand that – they are Masters of FACTs!

The magic number we need to identify the Ideal Constraint is the product of the Cycle Time and the FACT – the step with the highest magic number should be the constraint step. It should control the flow in the whole process. (In reality there is a bit more to it than this but I am trying hard to stay out of the trees).

Step 7 – Design the capacity so that the Ideal Constraint is the Actual Constraint.

We are using a precise definition of the term capacity here – the amount of resource-time available – not just the number of resources available. Again this is a precise definition and should be used as defined.

The capacity design sequence means adding and removing capacity to and from steps so that the constraint moves to where we want it.

The sequence is:
7a) Set the capacity of the Ideal Constraint so it is capable of delivering the required activity and revenue.
7b) Increase the capacity of the all the other steps so that the Ideal Constraint actually controls the flow.
7c) Reduce the capacity of each step in turn, a click at a time until it becomes the constraint then back off one click.

Step 8 – Model your whole design to predict the expected productivity improvement.

This is critical because we are not interested in suck-it-and-see incremental improvement. We need to be able to decide if the expected benefit is worth the effort before we authorise and action any changes. And we will be asked for a business case. That necessity is not negotiable either.

Lots of productivity improvement projects try to dodge this particularly thorny fence behind a smoke screen of a plausible looking business case that is more fiction than fact. This happens when any of Steps 2 to 7 are omitted or done incorrectly. What we need here is a model and if we are not prepared to learn how to build one then we should not start. It may only need a simple model – but it will need one. Intuition is too unreliable.

A model is defined as a simplified representation of reality used for making predictions.

All models are approximations of reality. That is OK.

The art of modeling is to define the questions the model needs to be designed to answer (and the precision and accuracy needed) and then design, build and test the model so that it is just simple enough and no simpler. Adding unnecessary complexity is difficult, time consuming, error prone and expensive. Using a computer model when a simple pen-and-paper model would suffice is a good example of over-complicating the recipe!

Many productivity improvement projects that get this far still fall at this fence. There is a belief that modeling can only be done by Marvins with brains the size of planets. This is incorrect. There is also a belief that just using a spreadsheet or modelling software is all that is needed. This is incorrect too. Competent modelling requires tools and training – and experience because it is as much art as science.

Step 9 – Modify your system as per the tested design.

Once you have demonstrated how the proposed design will deliver a valuable increase in productivity then get on with it.

Not by imposing it as a fait accompli – but by sharing the story along with the rationale, real data, explanation and results. Ask for balanced, reasoned and respectful feedback. The question to ask is “Can you think of any reasons why this would not work?” Very often the reply is “It all looks OK in theory but I bet it won’t work in practice but I can’t explain why”. This is an emotional reaction which may have some basis in fact. It may also just be habitual skepticism/cynicism. Further debate is usually worthless – the only way to know for sure is by doing the experiment. As an experiment – as a small-scale and time-limited pilot. Set the date and do it. Waiting and debating will add no value. The proof of the pie is in the eating.

Step 10 – Measure and maintain your system productivity.

Keep measuring the same metrics that you need to calculate productivity and in addition monitor the old constraint step and the new constraint steps like a hawk – capturing their time metrics for every task – and tracking what you see against what the model predicted you should see.

The correct tool to use here is a system behaviour chart for each constraint metric. The before-the-change data is the baseline from which improvement is measured over time; and with a dot plotted for each task in real time and made visible to all the stakeholders. This is the voice of the process (VoP).

A review after three months with a retrospective financial analysis will not be enough. The feedback needs to be immediate. The voice of the process will dictate if and when to celebrate. (There is a bit more to this step too and the trees are clamoring for attention but we must stay out of the wood a bit longer).

And after the charts-on-the-wall have revealed the expected improvement has actually happened; and after the skeptics have deleted their ‘we told you so’ emails; and after the cynics have slunk off to sulk; and after the celebration party is over; and after the fame and glory has been snatched by the non-participants – after all of that expected change management stuff has happened …. there is a bit more work to do.

And that is to establish the new higher productivity design as business-as-usual which means tearing up all the old policies and writing new ones: New Policies that capture the New Reality. Bin the out-of-date rubbish.

This is an essential step because culture changes slowly. If this step is omitted then out-of-date beliefs, attitudes, habits and behaviours will start to diffuse back in, poison the pond, and undo all the good work. The New Policies are the reference – but they alone will not ensure the improvement is maintained. What is also needed is a PFL – a performance feedback loop.

And we have already demonstrated what that needs to be – the tactical system behaviour charts for the Intended Constraint step.

The finanical productivity metric is the strategic output and is reported monthly – as a system behaviour chart! Just comparing this month with last month is meaningless. The tactical SBCs for the constraint step must be maintained continuously by the people who own the constraint step – because they control the productivity of the whole process. They are the guardians of the productivity improvement and their SBCs are the Early Warning System (EWS).

If the tactical SBCs set off an alarm then investigate the root cause immediately – and address it. If they do not then leave it alone and do not meddle.

This is the simplified version of the recipe. The essential framework.

Reality is messier. More complicated. More fun!

Reality throws in lots of rusty spanners so we do also need to understand how to manage the complexity; the unnecessary steps; the errors; the meddlers; and the inevitable variation. It is possible (though not trivial) to design real systems to deliver much higher productivity by using the framework above and by mastering a number of other tools and techniques. And for that to succeed the Governance, Operations and Finance functions need to collaborate closely with the People and the Process – initially with guidance from an experienced and competent Improvement Scientist. But only initially. This is a learnable skill. And it takes practice to master – so start with easy ones and work up.

If any of these bits are missing or are dysfunctional the recipe will not work. So that is the first nettle the Executive must grasp. Get everyone who is necessary on the same bus going in the same direction – and show the cynics the exit. Skeptics are OK – they will counter-balance the Optimists. Cynics add no value and are a liability.

What you may have noticed is that 8 of the 10 steps happen before any change is made. 80% of the effort is in the design – only 20% is in the doing.

If we get the design wrong the the doing will be an ineffective and inefficient waste of effort, time and money.

The best complement to real Improvement PIE is a FISH course.

05/12/2010

Lies, Damned Lies and Statistics!

Most people are confused by statistics and because of this experts often regard them as ignorant, stupid or both. However, those who claim to be experts in statistics need to proceed with caution – and here is why.

The people who are confused by statistics are confused for a reason – the statistics they see presented do not make sense to them in their world. They are not stupid – many are graduates and have high IQ’s – so this means they must be ignorant and the obvious solution is to tell them to go and learn statistics. This is the strategy adopted in medicine: Trainees are expected to invest some time doing research and in the process they are expected to learn how to use statistics in order to develop their critical thinking and decision making. So far so good, so what is the outcome?

Well, we have been running this experiment for decades now – there are millions of peer reviewed papers published – each one having passed the scrutiny of a statistical expert – and yet we still have a health care system that is not delivering what we need at a cost we can afford. So, there must be someone else at fault – maybe the managers! They are not expected to learn or use statistics so that statistically-ignorant rabble must be the problem -so the next plan is “Beat up the managers” and “Put statistically trained doctors in charge”.

Hang on a minute! Before we nail the managers and restructure the system let us step back and consider another more radical hypothesis. What if there is something not right about the statistics we are using? The medical statistics experts will rise immediately and state “Research statistics is a rigorous science derived from first principles and is mathematically robust!” They are correct. It is. But all mathematical derivations are based on some initial fundamental assumptions so when the output does not seem to work in all cases then it is always worth re-examining the initial assumptions. That is the tried-and-tested path to new breakthroughs and new understanding.

The basic assumption that underlies research statistics is that all measurements are independent of each other which also implies that order and time can be ignored. This is the reason that so much effort, time and money is invested in the design of a research trial – to ensure that the statistical analysis will be correct and the conclusions will be valid. In other words the research trial is designed around the statistical analysis method and its founding assumption. And that is OK when we are doing research.

However, when we come to apply the output of our research trials to the Real World we have a problem.

How do we demonstrate that implementing the research recommendation has resulted in an improvement? We are outside the controlled environment of research now and we cannot distort the Real World to suit our statistical paradigm. Are the statistical tools we used for the research still OK? Is the founding assumption still valid? Can we still ignore time? Our answer is clearly “NO” because we are looking for a change over time! So can we assume the measurements are independent – again our answer is “NO” because for a process the measurement we make now is influenced by the system before, and the same system will also influence the next measurement. The measurements are NOT independent of each other.

Our statistical paradigm suddenly falls apart because the founding assumption on which it is built is no longer valid. We cannot use the statistics that we used in the research when we attempt to apply the output of the research to the Real World. We need a new and complementary statistical approach.

Fortunately for us it already exists and it is called improvement statistics and we use it all the time – unconsciously. No doctor would manage the blood pressure of a patient on Ward A based on the average blood pressure of the patients on Ward B – it does not make sense and would not be safe. This single flash of insight is enough to explain our confusion. There is more than one type of statistics!

New insights also offer new options and new actions. One action would be that the Academics learn improvement statistics so that they can understand better the world outside research; another action would be that the Pragmatists learn improvement statistics so that they can apply the output of well-conducted research in the Real World in a rational, robust and safe way. When both groups have a common language the opportunities for systemic improvment increase.