3 May 2011

The Truth about Data Quality

I've found the talks at the British Computer Society (BCS) London Central Branch a bit hit and miss. A few have been far too basic to be useful and in others the speaker has not impressed me with their grasp of the topic.

But I go to the meetings because some of them are good, even very good. The Truth about Data Quality was one of those.

As a Business Analyst / Consultant I deal a lot with data and processes and I learnt long ago that data is far and away the more important of the two. The main reason for this is if you have the data model correct (in business terms) then it is easy to adapt processes to different needs but if the data model is wrong no playing around with processes can fix it.

All that means that a talk on data quality was guaranteed to pique my interest. And so I went.

Jon Evans' talk was in two parts. First we had a useful (i.e. more than basic) introduction to data quality and this was followed by a detailed case study from the NHS that really brought the lessons home.

It was a long talk, around 75 minutes, and in picking out a few of my personal highlights I am obviously leaving a great deal out.

The four cornerstones of Information Quality are Accuracy, Timeliness, Relevance and Completeness. Accuracy is the most important of these as the others are meaningless without this.

We build to Accuracy through Validity, Integrity and Credibility.

Jon explained this well by successfully correcting the colours, fonts, spelling and grammar in a familiar sentence. The result all seemed very sensible until Jon asked, "Does the quick brown fox really jump over the lazy dog?". The point being, this is a valid sentence but is it creditable?

When looking to improve the quality of data we can use the FIRM approach; Find, Investigate, Remedy, Monitor.

At that point we moved on to the case study.

To make sense of this we first had to learn a lot about how hospitals get funded through the incorrectly-names Payment by Results (HbR). This is calculated from the Healthcare Resource Group (HRG) of each episode, e.g. a diagnosis or treatment given to a patient.

The HRGs are recorded by clinical coders working from the doctors' handwritten notes. Clearly there is much scope for error in this process and making a small error can make a big difference to the hospital financially. Jon gave us an example where a patient's condition had not been fully recorded and that doubled the amount paid to the hospital.

The system Jon had been heavily involved in developing with the NHS compared results between hospitals to see if they were creditable, e.g. were the numbers of episodes, as recorded in the HRGs, of each type in line with expectations.

The funnel diagram here shows the distribution of hospitals' results and the statistical significance of this. The hospitals get detailed reports that shows them their relative performance and allows them to drill down in to the detail to see of any anomalies. But, remember, being incredible does not mean wrong.

Using this analysis has helped hospitals to dramatically improve the quality of their data. The two main problems they have had to address are the quality of the original records and training for the clinical coders on the HRG framework.

It was an impressive case study but I was left with the worry that, despite the improvements made, data quality would always remain a significant issue and the inherent instability of the HRG framework (i.e. small differences in coding make a big difference in costs) means that the whole system may be invalid.

Or, as I put it in a tweet at the time: The NHS PbR is inherently bonkers.

But the basic faults in the PbR system take nothing away from the case study or from the talk. I learned a lot and that's what I went there to do.

No comments:

Post a comment

All comments are welcome. Comments are moderated only to keep out the spammers and all valid comments are published, even those that I disagree with!