USA, May 28, 2026
Although there is much discussion about AI risk‚ this typically assumes the model is the main vector of interest․ Engineers build the model architecture‚ train it‚ and put controls around it․ They rarely discuss the constructor on which the model depends: data․
It's not only whether the data is secure or large enough‚ organizations need to know where it came from‚ how it has changed over time and what assumptions it carries․
For organizations that operationalize AI systems‚ this is when the implications of complying with the AI RMF get real․
Data provenance and lineage can have very real implications on how AI systems are built‚ who they will impact‚ and how questions will be answered․
At Logicalis‚ we see that visibility into data sources is a common problem for AI governance programs․
Data Origin Is an AI Risk Factor
AI systems learn their patterns from data‚ and the accuracy of the patterns detected by an AI system is completely dependent on the data it is created from․
Risk may be introduced if data sources are outdated‚ poorly understood‚ or combined in unintended ways․
To satisfy the AI RMF‚ organisations must be able to track the sources of the training data and operational data‚ including where it was obtained‚ in which context and at which limitations․
According to the National Institute of Standards and Technology‚ quality and representativeness of the data are central to AI risk management․
Without this visibility‚ organizations are unable to assess for bias‚ relevance‚ or impact․
Often‚ Data Lineage Breaks Over Time
Data is rarely static‚ instead being copied‚ transformed‚ improved‚ filtered and reused across systems and teams․
Important context can be lost during this movement․
Once a data asset is used in an operational process with a defined and limited purpose‚ it may also be used as a basis for automatic decisions with wider implications․
When complying with the AI RMF‚ organizations may want to record how data is transferred or changed between components․ The best practices are not entirely technical․ Instead‚ organizations should retain the context information sufficient to explain why a dataset is appropriate for current use․
The United States Government Accountability Office (GAO) has repeatedly identified poor data governance and data lineage tracking as technology risk factors to complex organizations․
Historical Data Carries Historical Assumptions
Data that is older is based around conditions that may have changed‚ as can the prevailing business practices‚ regulations‚ and social expectations․
Data itself does not automatically adjust to those changes․
One of the more difficult aspects of AI RMFs is that the models trained on existing datasets are not guaranteed to meet the current standards even if their performance metrics appear stable․
Data provenance awareness may allow organizations to detect when their assumptions about their data have become outdated‚ and respond appropriately to data changes․
The White House Blueprint for an AI Bill of Rights states that automated systems should not reinforce old discriminatory or inequitable practices․
Third Party Data Expands Responsibility
Many AI systems are trained or use data from vendors‚ partners‚ public datasets and aggregators․
Using third party data does not reduce responsibility for outcomes; it may actually increase it․
For external data inputs‚ organizations are expected to assess whether the data sources are appropriate‚ legal‚ and consistent with the use case intended for the system‚ including how data is collected‚ the boundaries of consent‚ full and complete copies of data‚ and the frequency with which data is updated․
The Federal Trade Commission has stressed the need to hold organizations accountable for automated system outcomes regardless of where the data comes from․
Responsibility remains‚ but traceability (or visibility) is lost․
Documentation Must Withstand Scrutiny
Documentation must not only be for the engineers; it must also support cases when decisions are challenged․
Organizations may be required to explain how a decision was made by an algorithm and regulators may demand evidence about how the system is working․ Leadership teams may need to make decisions on whether a system operates․
For AI RMF compliance‚ documentation about the intention for use‚ derivation of scope and limitations‚ appropriate uses‚ and potential risks is encouraged․ AI RMF documentation does not need to capture every technical detail․ It should be clear‚ correct‚ and straightforward․
Without documentation‚ organizations may need to recreate the history of a dataset under extreme pressure․
Provenance Supports Effective Monitoring
When data inputs are not being monitored‚ visibility into how AI systems are functioning is limited․
Changes in data composition can cause changes in model behavior‚ including new types of customers or transactions‚ or changes in the frequencies and patterns of transactions․
For AI RMF compliance‚ it is important that changes in systems performance can be reliably backtracked to corresponding changes in the underlying data to detect normal variation‚ model drift‚ and misuse․
NIST encourages the development of lifecycle monitoring practices‚ which monitor the impact of both model outputs and data․
Data Governance Is Ultimately Cultural
Technology tools can be used for lineage and documentation while processes can be established for data processing․
However‚ governance is constructive when organizations value data accountability․
Organizations that leverage AI RMF principles consider data ownership as a shared responsibility‚ with teams being asked about the source of data at each stage in its lifecycle when deciding data use․
This mindset may indeed discourage experimentation but reduces risk․
AI Governance Begins With Data
Thus‚ AI risk is not just about algorithmic behavior‚ but also concerns data decisions made even before the model has been trained․
Organizations can support compliance with AI RMF by tracking the provenance and data lineage throughout their data governance processes‚ following the origins‚ evolution‚ and use of data in a way that transcends what is conveyed by technical metrics․
At Logicalis‚ we work with organizations to create this level of clarity in their AI programs so that automated decisions remain explainable and auditable even as they grow․
That foundation creates accountability for AI systems from the outset․
References
- National Institute of Standards and Technology, https://www.nist.gov/itl/ai-risk-management-framework
- The White House Office of Science and Technology Policy, https://www.whitehouse.gov/ostp/ai-bill-of-rights
- Federal Trade Commission, https://www.ftc.gov/business-guidance/blog/2023/04/ai-claims-and-consumer-protection
- Government Accountability Office, https://www.gao.gov/products/gao-23-105781