Healthcare Data QA
This website provides an overview of the software processing of medical data, with an emphasis on the traps that are often present.  
Home
Introduction
Software Design
Basic Obstacles
Data Input Problems
Human Obstacles
EHR DataBases
CSV Files
XML Files
Reports
Statistics
Legal
Other


© 2022 Kevin Pardo
    

Reports

It is helpful to have both batch reports and interactive reports. The batch reports can summarize processing and include custom queries. These are helpful to review upon processing completion and if errors are suspected. Interactive reports are helpful when a client has a question regarding the processing of a set of patients. Data for a large number of patients can be reviewed quickly with the client.

Tallies of Recently Harvested Data: Simple as these queries are, they do identify the most common type of problems. For example:

  • How many vitals values, by normalized tag/name, have come in over the last three months?
  • How many lab values, by normalized tag/name or type, have come in over the last three months?
  • How many diagnoses have come in over the last three months?
  • How many procedures have come in over the last three months?
Ideally the data is graphed. Note that lab values and procedure data will tend to be entered after the event date. Procedure data will be especially slow to reach the system because billing departments usually add processing on top of what was entered by providers.

A simple query to help check that recent labs have been harvested is:

SELECT lab_normal_tag,lab_raw_tag,
	DATE_PART('year',lab_collected),
	DATE_PART('month',lab_collected),
	COUNT(*)
FROM normal.lab
WHERE lab_normal_tag IS NOT NULL AND AGE(lab_collected) <= INTERVAL '3 months' 
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4

Despite all of the complex ways data can be evaluated, the most important quality checks are the ones which ensure that data sources had current data and that processing completed as expected for each harvest. Complex statistics for data quality should wait until after basic harvest checks are in place.

Study the Work of Edward Tufte: Edward Tufte emphasizes reducing visual clutter while allowing details to be viewed and explored. Tufte's book The Visual Display of Quantative Information is a good start. Website: www.edwardtufte.com/tufte/

The unformatted report, which often crops up, is like a piece of concrete. The second, following some of the guidelines and ideas of Tufte, is easy to skim. If you are a worker who actually reads thousands of lines of reports per week, you will appreciate reports which follow Tufte's guidelines.

Of course there is only so much time in the work week. A general set of rules will be easier to implement than one specific to the report. Flagging a high A1C value, for example, will require custom programming or rule setting for the report column. It is often not practical to do this.

Note that flagging a value as out-of-range should be done with the formal approval of a medical professional. A report which "interprets" values in ways not approved by a medical professional can be a violation of basic healthcare laws.

For reports visible to large audiences, you may want to provide the option of an unformatted report. Though many team members will expect or even demand helpful formatting, some will insist on a basic view. Expect occasional friction between "the enlightened" and the narrowly educated. At least report discussions will be less damaging to a project than UI discussions.

Software libraries meant to help people churn out reports may make custom formatting quite difficult. Your life may be better without such custom libraries.

Processing Reports: Reports generated by the steps for harvesting and processing data are quite helpful. Content can include:

  • The SQL of statements executed.
  • The number of lines inserted, updated, or deleted.
  • Error messages with stack traces.
  • Custom SQL queries as a form of regression testing.
  • Summaries of tables and columns, including top N, min, max and the number of null/empty values.
The intent is to be able to quickly recognize major failures as well as to be able to begin detailed reviews quickly. Standard reports assist with this.

If batch reports are collected on a central server/VM, it is possible to keep "harvesting VM's" off most of the time. Preliminary inspections, and sometimes fairly detailed QA checks, can be performed on the reports on the central server.

Standard Query Groups: When working with a client, it is important to get information on a given patient quickly. Common tasks include:

  • To check active/inactive logic for patients, view the deceased date and last visit date.
  • To check for dropped values labs, trace data through lab staging and roster tables.
  • To check concept logic, output the data which drives a group of derived concepts.
Software systems are commonly "black boxes." Inexperienced developers often do not add visibility to software processing and cannot quickly trace data through a system. If you have queries ready to run for any MRN, however, you can quickly confirm or dismiss problems. You may be able to identify the cause of major bugs on the fly.

In some cases it may appear that you have not harvested required data. Reasons may include:

  • MRN values had unexpected variability in formatting, causing data to be dropped.
  • The client assumes that you are harvesting a data source which you do not yet use.
  • The data was unusual and was rejected at harvest time.
  • The data your client believes is missing, such as a scanned document summary, did not arrive until after its expected harvesting time.
If your client believes that data is missing, you should investigate. Small amounts of lost data should not keep you up until dawn, however. You can't be expected to provide immediate fixes when data is varied and filled with typos.

Lookup Reports: The names and tags which are compared against lookup rules change over time. It is also possible that important names and tags get missed at the beginning of a project. Helpful queries for QA reports include:

  • The count of names and tags which drive lookup results.
  • The most common names and tags which do not match any lookup rules.
For example, there should be reviews of which labs match LDL cholesterol as well as the most common labs which are being ignored.

Even at the start of a project, reviews may be few. This is real life. People are busy, and many people don't want to do detailed work. At least if you circulate lookup reports, you will have a record of attempting to ensure high quality results.