Healthcare Data QA

Healthcare Data QA
This website provides an overview of the software processing of medical data, with an emphasis on the traps that are often present.

Home
Introduction
Software Design
Basic Obstacles
Data Input Problems
Human Obstacles
EHR DataBases
CSV Files
XML Files
Reports
Statistics
Legal
Other

© 2022 Kevin Pardo

Software Design

Design Basics: To manage the care of a population, the following are examples of what you will want at the beginning of a project:

Goals: What you want to achieve. For example implement specific CMS measures and improve care to reach specific numerical targets.
Attributes: This includes data processing capacity, update frequency, and quality parameters.
Functional Specification: It is helpful to identify and define processing steps, results, and reports.
Client Responsibilities: In healthcare, it is especially important to get the medical people on your team to provide diagnoses codes and other lookups which will drive processing. Note that many people, even educated medical professionals, do not have the energy or skills to document project requirements well enough to be useful.
Is Not List: It may be helpful to document what is not part of the project. For example, it may be helpful to specify that formal user training will not continue past the main release of the software.

The less defined the goals, attributes, and functionality, the more developers will be pushed to work 24/7.

The client must feel responsible to provide, or at least review, data which drives the processing of patient data. The names of labs, diagnoses, and medications which are to be processed must be reviewed by the client if at all possible.

Reasonable Requirements: It is important that software requirements are realistic. Managers on both the client and the development side often underestimate how messy the work will be, and accept intricate and impractical requirements. It is also common for the client to be unable to articulate complex processing steps. Even if you, as a developer, draft a list of diagnoses to be processed by the software, the client may be too busy to approve it.
KEEP THE PROJECT SIMPLE, AND RELEASE FUNCTIONALITY IN STAGES.

Reasonable Workloads: Medical professionals doing data entry should not be expected to add hours of new tasks to support the project. Likewise, software developers should not have to edit large files manually with each data harvest. It is easy for project managers to go into fantasy land and overload users and developers alike. Already, medical professionals consider EHR's to be burdens which interfere with patient care.

Processing Model: There is variation in approaches to processing large amounts of data, but a simple model is:
EHR data → Normalized Data → Derived data
There are related models, such as ETL and ELT, for certain types of projects. There may be many more stages than this, but this is the view generally shown to the customer. In some cases, a primary project goal may be to create a data warehouse.

Patient ID's" One normalization task is to assign a globally unique ID for patients. For example, an EHR might have a public MRN (medical record number) as well as an internal ID. Some systems have three or more patient ID values. For patient Jane Smith:

Jane Smith's MRN in the EHR: 11223344
Jane Smith's internal patient id in the EHR: 9876543421
Jane Smith's ID within the new software: P95283

Related to the above is the task of merging patient data which originates from different environments. An MRN from one source will not match an MRN from another source. (Typically, one MRN value will identify different patients for each EHR harvested.)

Sunrise Hospital: MRN 123456 identifies Fred Walters.
Creekside Hospital: MRN 123456 identifies Gerald Lee.

Additionally, a single source, such as one hospital's EHR, will often assign one individual multiple MRN values over time. A patient who changes his or her family name and does not use a medical facility regularly will be at risk of receiving two MRN values.

Cindy Bow is assigned the MRN 889900 from St. Stephen's Hospital in 1997.
The same person, now using the name Cindy Bow-Jones, is assigned the MRN 2389047 from St. Stephen's Hospital in 2022.

Clean Data Values Upstream: To ensure clean data, consider upstream updates to string values such as:

Trim leading and trailing white space.
Convert tabs and carriage returns to spaces.
Convert characters to uppercase.

The source formats of ICD codes are worth reviewing. Many institutions will mix codes with the dot and without the dot.

When accessing data prepared by a client or co-worker, we often assume that the data will be "clean." This can cause a variety of processing errors.

Determining Active Patients: Identifying and processing only "active" patients is a common requirement. Deceased patients aside, many patients exit and return to a given healthcare environment. Care must be taken not to delete data permanently for "inactive" patients. (Ignoring all 2021 data for an inactive patient, in some processing schemes, may cause the 2021 data to be lost if the patient returns in 2022.)

Be Alert for Obscured Data: Ideally problem lists are a subset of all patient diagnoses, but in reality providers don't enter the important diagnoses in both places. This is a result of people having both "summary" and "everything" buckets. Discrepancies are inevitable. Also, diagnoses used to justify procedures may be placeholders, though medical organizations seem to have tried to reduce these "fake" diagnoses in recent years. Don't be surprised if important data is "hidden" by the way people enter data into an EHR.

Data Often Trickles into an EHR: Procedure data often enters a system late because billing processes are typically slow. Lab results and scanned documents may also take time to make it into the EHR. If a client provides data in monthly intervals, containing one month's data at a time, there should be an agreement as to how much data it is acceptable to drop.

It may be best for clients to provide several year's worth of procedure data at harvest time, not just values for the last month.

Data Should be Approved: Providers may be slow to sign-off on encounter findings, meaning unapproved data lingers for days or weeks. Often providers need to be encouraged to approve their encounter records within a few days. You may receive unapproved data during harvests.

One Person Should Approve the Data Schemas: Tables and columns designed by groups are often a mess, and usage of the data suffers horribly. One person should write, or at least approve, all normalized data tables. It is wise to have one person approve naming conventions for derived data as well. With an unsupervised group, eventually you will find that you have created a junkyard. (Note that "supervision" should be done by someone who has hands-on experience, not just academic degrees or management titles.) An inexperienced manager who simply shouts "QUALITY", "DESIGN," and "DEADLINE" day and night will cause havoc on the database schema.)

Version Control and Backups: Source code should be under version control or backed-up carefully. Both the main and helper databases should be backed-up as well.

Note that the people actually performing the backups may not backup as much data as developers expect. The backup staff may also scale back or even stop providing backup and restore services without notifying anyone. This is unbelievable, but sometimes IT departments can be dysfunctional.

Designs Should be Simple: Young developers often grab the latest libraries from the Internet. Even the developers who select the libraries often don't have experience with them. This is fine for a college student on summer break, but a nightmare for projects which are supposed to be robust.

Security may be an issue as well. Even commonly used software libraries have so many layers that major security holes pop-up without warning.

Separate Harvesting from Processing: Loading and processing can both be error prone and require a lot of work and debugging. It is tempting to apply some processing rules while loading data, but this means that changes to the business rules require large processing operations. Often EHR database harvests are only allowed late at night or on weekends, so fixing even a minor processing bug may cause project delays.

Legitimate data transformations during harvesting may include:

Standardizing MRN formats, such as converting MRN 012345 to MRN 12345.
Converting dates of type text to values of type date.
Converting numeric values of type text to true numeric values.
Trimming whitespace and converting other whitespace values to spaces.

The above are basic and can often be implemented in simple harvesting tools. Consistent MRN values in upstream data will greatly increase the ability to run QA comparisons on tables.

Performance: Ensure the database server application has enough RAM and that the basic database server configurations have updates from the defaults. Focus on indexes to boost performance. Indexes may need to be dropped before some bulk operations, such as insert.

Some SQL library parameters, such as those for Java's JDBC, may need to be changed to minimize program read times and prevent aborts on slow queries. Parameters include:

Fetch Size: How many result rows are returned at one time for a query.
Time Out: Some queries may take longer than the default number of seconds.

Programming methods to improve performance include committing multiple statements at a time. These bulk commits often use "prepared statements." Prepared statements benefit security.

Custom programming to increase the performance of a database server is usually a mistake. Most of us experiment with it at some point, but it means that key processing is broken out from the rest of the data work. QA will be difficult, and other developers will not be able to maintain the code easily. In many cases, it means that data in the database cannot easily be compared against data in custom caches. Custom utilities to read data files, as mentioned earlier, may be justified.

Keep life simple, and use SQL for most data transformations. (Executing SQL in a simple framework with macros and reporting may be helpful, but most data operations should be performed by the database using SQL.)