Semaphore Logo clickable link to home page.
Services
Services
Strategic ConsultingSystem ImplementationInformatics SupportScientific Software Development
Domains
Domains
ClinicalResearchLife Sciences
resources
resources
BlogCase StudiesWhitepapersCareers
ABOUT US
Contact
Contact
Technical Concepts

What is Data Provenance & Why is it

Important in the Clinical Diagnostics Space?

by

The Semaphore Team

In clinical diagnostics labs, there’s a pressing need for data provenance—tracing the origin and changes over time of critical data such as electronic health records, analytical results, and workflow records.

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability, or trustworthiness. – The W3 Consortium, PROV-Overview

Data in laboratories is often produced by separate information systems, which can make it difficult to trace. When the systems aren’t connected, it can be even more challenging to maintain the data’s chain of custody and the metadata records.

‍

The result? Stakeholders can’t generate a report that describes exactly what happened to a sample, such as which agents1 and activities2 were involved, and at what times. This information about the data—the metadata—provides granular details that might have been collected in a notebook or file previously.

‍

For example, in order for a lab to produce a full provenance record on a measured volume, they need to prove how they know that a certain volume was measured. They also need to be able to answer questions such as:

  • Who recorded that volume?
  • When did they record it?
  • Where was the measurement performed (at which facility)?
  • What system and instrument(s) were used?
  • Why were they measuring it? Was it part of a standard operating procedure (SOP)? What was the name of the SOP step that they were performing?

Answering these questions efficiently requires that metadata for each specimen be recorded and stored in a standardized way so that it can be easily reviewed. While no regulatory bodies (such as the FDA, CAP, or CLIA) currently dictate how this metadata is captured for clinical laboratories, once data provenance becomes better supported in modern laboratory software, we predict that detailed traceability will become required in clinical software. We recommend preparing for this sooner rather than later.

Why is data provenance so important in clinical diagnostics?

There are a number of reasons why labs should place a high priority on addressing data provenance. For instance:

  1. Auditability, transparency, and trust in the software are critically important for labs dealing with sensitive personal data.
  2. Regulations require that private patient data and laboratory records are handled securely and tracked in case followup is required. For NGS analyses, the College of American Pathologists requires that all information used to process a patient sample—such as reagents, primers, sequencing chemistries, and platforms—be documented so that details can be extracted. These could be thought of as the “agents and activities acting upon” the patient data in regards to data provenance.
  3. Tracking provenance can help with the interpretation of data and ensuring its trustworthiness.
  4. Provenance can be used to help labs analyze whether processes were performed efficiently.
  5. Clinical research relies on collaboration and reproducibility. Data provenance supports this by providing all the data necessary to reproduce the lab’s findings.
  6. Patient safety is critically important. If a serious adverse event occurs, involved laboratories might need to perform a post-hoc investigation to see how a result they produced could have contributed to this event. Consider how well your laboratory software stack might support this type of investigation.

Looking ahead

In an ideal world, laboratory informatics systems would be able to generate and interact with data that adheres to provenance standards, such as W3C PROV.3 What we’d like to see, eventually, is the ability for labs to immediately access all metadata records linked to a sample directly from within the laboratory information management system (LIMS). Unfortunately, that’s not possible yet using an off-the-shelf LIMS. There are a lot of obstacles to overcome before a universal provenance standard is adopted and all healthcare data formatting is harmonized.

‍

However, in the meantime, labs can work with a software consultant to integrate the various components of their informatics systems to provide more robust data provenance. When you’re selecting a new vendor or consultant, be sure to confirm that they understand the importance of data provenance as a functional requirement in software in the field of clinical diagnostics. Custom clinical software should always be built with data provenance in mind.

‍

Ontology4 is a related concept, which we’ll explore in our next post.

_________

1 An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity.
2 An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.
3 The World Wide Web Consortium (W3C) has created PROV, a set of recommended standards, to support the interchange of provenance information on the Web.
4 An ontology is a formal naming of a set of concepts (similar to a dictionary) and the relationships between them that helps provide context to the data. It’s tied closely with provenance—where data comes from and what happened to it. Published ontologies help structure data by connecting the individual pieces.

‍

Explore our blog

All Blog Posts

Acceptance Criteria — The Real Star of Software Requirements in Lab Software

If your lab is implementing new software, investing time in defining and agreeing on robust acceptance criteria upfront can help you avoid problems later. In fact, the quality of acceptance criteria can make the difference between a software project that fails and one recognized by stakeholders as a resounding success.

4
min read

Six Types of Software Requirements in Lab Informatics and When to Use Them

Before labs implement new software, they should document everything the software needs to do. These “requirements” will ensure the software does what the lab intends. However, not all requirements are created equal. Different types serve different purposes throughout the software lifecycle.

5
min read

The Importance of Data Integrity in the Laboratory: Adhering to FDA Standards and ALCOA Principles

In today’s highly regulated laboratory environment, maintaining data integrity is not just a matter of good practice—it’s essential for compliance, reliability, and the overall success of lab operations. For lab managers and IT support staff, ensuring data integrity means safeguarding data’s accuracy, completeness, and consistency throughout its lifecycle.

12
min read
Semaphore Logo
  • Services
  • Domains
  • Resources
  • About Us
  • Careers
  • Contact Us
  • 1 (844) 744-3577 ext 1
  • 200-844 Courtney St.
  • Victoria, BC V8W 1C4
  • Canada
LinkedIn Social Media Icon Linking to Semaphore Account
  • Cookie Policy
  • Privacy Policy
All Rights Reserved © Semaphore Solutions Inc.