![]() |
Healthcare Data QA
This website provides an overview of the software processing of medical data, with an emphasis on the traps that are often present. |
| Home Introduction Software Design Basic Obstacles Data Input Problems Human Obstacles EHR DataBases CSV Files XML Files Reports Statistics Legal Other © 2022 Kevin Pardo | EHR DatabasesWe initially assume that big-name product databases are organized and easy to use. In reality, they are often mixtures of data graveyards and quick-and-dirty additions. Here are notes about what data should be like, and what it is in the real world. The emphasis is on identifiers.Column Names: The following are reasonable column names for a patient's first name:
Eventually you will encounter data which should never have made it to production:
Duplicate columns have a tendency to propagate in derived data. People are rushed, and they begin to duplicate columns for each usage. This is horrifying, but real. All of the earlier examples of bad column names assume that the column actually contains what its
name implies.
In reality, especially early in a project, you may find that a "name_first" column actually
is filled with values such as: Smith, Cooper, Brown, Gonzales, and Johnson
My own preference for name columns is:
Ambiguous Usage: Sometimes a column has a clear name, but its usage
by the EHR software and users is ambiguous.
For example: diagnoses_start, diagnosis_made, diagnosis_entered, and diagnosis_updated
An end result, for a downstream table, might be:
diagnoses_date=LEAST(diagnosis_start,diagnosis_made,diagnosis_entered,diagnosis_updated) The LEAST() function exists in PostGres/PostgreSQL and is helpful. Data doesn't Match the Column: As mentioned earlier, you may find that phone_work contains email values. Multiple Key Columns for Essentially the Same Data: MRN (Medical Record Number, external ID) and patient_id (internal ID) may both identify patients. While a particular schema design may be justified in having both internal and external ID values, individual tables may almost randomly use one or the other, or both. The names can also vary. For example, you might eventually find that patient_id is pid, pat_id, or even id. (I may even have seen an EHR in which pid sometimes represented patient ID while at other times it represented provider id.) Table Naming Problems: There are many ways in which tables can be difficult to navigate. Examples include:
|