Healthcare Data QA
This website provides an overview of the software processing of medical data, with an emphasis on the traps that are often present.  
Home
Introduction
Software Design
Basic Obstacles
Data Input Problems
Human Obstacles
EHR DataBases
CSV Files
XML Files
Reports
Statistics
Legal
Other


2022 Kevin Pardo
    

Stats

Be careful of junk input, improper statistics, and wild inferences. The ways data can be corrupted are limitless, as are the ways in which results can be misused.

Only Similar Entities Should be Compared: Comparing the overall health of patients of a pediatrician to the patients of a gerontologist will make the pediatrician appear to be a miracle worker and the gerontologist a quack. Child patients will have far fewer chronic conditions than elderly adults. Even comparing the populations of two doctors with similar specialties is difficult. If one was on sabbatical for months, the provider's number of office visits may appear to be quite low. These types of basic differences between entities should make people cautious to use statistics, but in reality a lot of people run wild with numerical data.

Data Pools for Different Purposes: A "Diebetic Control Report for Dr. Brown" may have completely different patient selection criteria than a "Diabetic Control Report for Dr. Lee." Third parties should not attempt to interpret the reports without a knowledge of how the reports were constructed. In reality, it is scary to have data and reports you prepare available to a group. You don't know how much caution people will employ.

Be Careful of Gibberish Calculations: The more powerful some formula appears to be, the more it will be guarded as a prized secret. This can mean junk calculations will be difficult for an organization to detect. If the number crunchers are young, they may celebrate rather than investigate incorrect results.

Best Practices:

  • "Always graph your data!"
  • Investigate unexpected results.
  • Exclude obviously incorrect data values from calculations.
A simple way to eliminate the worst data values is:
SELECT AVG(result_float),STDDEV(result_float)
FROM normal.lab
WHERE 'A1C'=lab_normal_tag
AND result_float BETWEEN 1 AND 20
A better approach is to remove illegal rows or to flag them as invalid.