Close Server: KOPWWW05 | Not logged in

From Our Print Archives

Analytical data repository


measures data integrity

By Steve Peebles and Sandra Hastings

With multiple hospitals and related facilities supported by dozens of clinical and financial information systems, business decision-makers are often data rich, but information poor. Such was the case in 1992, when the Presbyterian Healthcare System, Dallas, began the development of the Analytical Data Repository (ADR).

The primary role of the ADR is to support business and clinical analysis by providing a tool that facilitates the storage and retrieval of data elements critical to this analysis. Specific data elements are extracted from various hospital and financial systems across the organization based upon the data strategy that 80 percent of business and clinical questions could be answered with 20 percent of the data. The result is an efficient, cost-effective tool that provides management with relevant information on which to make business decisions. The value of this information is directly related to the quality of data obtained from the various source systems, therefore the ADR has also become a convenient tool to measure the accuracy of the data and data collection process.

The ADR has addressed data quality issues through the development of detail error reports to identify specific data elements related to individual accounts that required correction. The reports are run monthly following the data load process that inserts or updates final billed patient data derived from the various hospital and financial systems. These detail reports are returned for correction to various hospital departments responsible for data entry, such as admissions and medical records. The departments correct the errors to the source systems, which results in updates to the ADR through its monthly data load process.

In addition, the ADR produces summary error reports that are used to help analysts identify where data integrity-related issues occur. A case in point relates to an ADR master patient index (MPI) that links individual patient encounters across facilities and years. The MPI is an indicator that relies on the accuracy of a social security number, the error rate of which varies by facility and year. The analyst will add this type of information as notes to their study, to clarify the degree of completeness or accuracy of the data.

Most data integrity edits performed within the ADR generally fall into two categories, structural integrity and clinical integrity. The simplest edits are those involving structural integrity where both missing data and invalid codes are identified. A data element is considered missing when a value is necessary to comply with external or internal requirements, such as a missing principal diagnosis code. An invalid code occurs when the data element's value is not contained in a corresponding reference table, as may be the case of a zip code that is not approved by the U.S. Postmaster. These edits are generally automated in the ADR because they are easily identified and generally occur in greater numbers.

Often, these errors are due to the lack of constraints applied to data entry, which can be a point of contention because data entry edits have often been criticized for slowing down the admission process or delaying access to care. Many hospital information systems (HIS) have reduced data entry edits to streamline the process, but unfortunately this burdens the output processes, such as billing, with inaccurate information. Because the ADR data integrity edits are applied post final bill, the resulting reports are an indication of the quality of completed patient data and the effectiveness of the data collection process.

The more difficult edits are those related to clinical integrity, where the relationships between data elements are evaluated to identify inconsistent data values. These errors take many forms, such as a female related diagnosis coded on a male patient. Generally, the detection of these errors comes as a by-product of analyses involving a "reasonable or appropriateness" test of the validity of the data. Because the data are neither missing nor involve invalid codes, data entry constraints are of nominal effect in eliminating these errors. The approach used to reduce these errors often involves departmental procedures that when adhered to will help reduce the potential for error.

A procedure may involve the use of "exception" codes, such as non-staff physician ID, used for identifying physicians who would not normally have a facility identifier. But when used to avoid looking up a staff physician identifier, this non-adherence to the procedure skews the statistics related to the "real" physician. While deliberate procedure violations may occur, high turnover and poor training are more likely to undermine data entry procedures.

Detection of clinical integrity errors identified through analysis may occasionally be added to the ADR automated reports, such as the female related diagnosis coded on a male patient. But the cause of the error may make this impossible, as would be the case of two physicians with the same last name, but different specialties. The analyst will discover that the wrong physician was identified with a group of patients, due to either the nature of the relationship or procedures performed, but the situation is only relevant to that specific physician.

With the proliferation of departmental systems linked through the use of multiple interfaces, there is also the potential of errors being introduced into otherwise clean data. Interface-induced errors can affect a large number of patients, and can go unnoticed until some output process utilizes the data.

Similar to the "Telephone Game," where a person whispers a message to another who then passes it on to a third person, each introducing minor changes to the message, each interface increases the complexity of transferring data, which in turn creates greater risks. To reduce this risk, the ADR data elements are extracted from the originating source system where possible. This reduces the risk of interface-related errors and in some cases provides a means of detecting errors through the matching of common data element values derived from separate source systems. Most interface-induced errors are structural and can be readily identified by the ADR. Clinical integrity errors, being more subtle in nature, usually require detection by analysts.

A concentrated effort to reduce errors detected by the ADR was initiated late in 1995, involving hospital departments responsible for data entry, information services and detail reports from the ADR. The accompanying graph shows that at the end of the first full year, nearly 25 percent of outpatients and 8 percent of inpatients of one of the system's major hospitals had at least one identifiable data integrity error that was not corrected.

Many of these errors were related to the recent implementation of a new hospital information system. The new system reduced the number of constraints applied to data entry, required different procedures and necessitated significant retraining. The focus on largely structural integrity errors detected by the ADR quickly reduced this error rate, which to date have been less than 1 percent of both inpatients and outpatients. Most errors have been eliminated through the correction process, but elimination of errors may require substantial modifications to the hospital information system to completely resolve. With most structural integrity errors being identified and corrected, more attention can be paid to clinical integrity errors in the future.

ADR detail error reports are a link in the process of error detection and correction, but the effect of the reports on most hospital departments is one of a "reactive response." The volume of daily activity coupled with making corrections leaves them little time to research and address the cause of the errors. Although some departments have adjusted procedures or training in an attempt to reduce or eliminate the cause of errors, generally these departments have little direct influence in affecting modifications to the hospital or financial systems that are used to enter the data. The ADR summary error reports can provide the departments with a means of focusing attention on high volume errors, but more often these are structural integrity errors that would require system modifications to effect significant changes.

A more coordinated approach is needed to proactively address both the causes and solutions that would more consistently improve data quality and collection in the future. ADR summary reports measure the quality of data at a particular point in time, so once errors are corrected the report re-flects the improvement on data quality. The report can reflect that the data integrity correction process appears to be improving data integrity, but only at the expense of burdening data entry departments with re-entering the data. Most data entry departments can only address internal data collection procedures and training issues, therefore more significant issues can go unresolved.

The ADR can also quantify data collection quality, by collecting statistics on detectable error rates prior to correction, but does not have authority to implement changes. This requires a governance group that can drive the political process that must occur from the "top down" in the organization, because effective changes to data integrity issues can involve significant financial and personnel resources. The group should be made up of executives both committed to, and responsible for data quality and data collection. In addition, the group should be representative of departments that collect and use the data, and those that provide data storage and access.

The ADR is a significant tool to aid health care organizations in their endeavor to improve data quality and data collection. Successfully used in an organization, it can focus attention on both the health care business and the quality of health care data.

Steve Peebles is manager of the strategic information center at Presbyterian Healthcare System, Dallas. He is responsible for managing the database administration staff of the ADR. Sandra Hastings is manager of strategic information resources, responsible for managing the analytical staff that uses the ADR.


Email: *

Email, first name, comment and security code are required fields; all other fields are optional. With the exception of email, any information you provide will be displayed with your comment.

First * Last
Title Field Facility
City State

Comments: *
To prevent comment spam, please type the code you see below into the code field before submitting your comment. If you cannot read the numbers in the below image, reload the page to generate a new one.

Enter the security code below: *

Fields marked with an * are required.

View New Jobs, Events and More


Back to Top

© 2017 ADVANCE Healthcare, an Elite CE company