4.7 Data Lineage

Last modified by publicadmin on 2025/12/16 13:04

About the Data Lineage Graph

The Data Lineage Graph displays the file's history from upload to its current destination (Upstream Lineage) along with all derivatives of the file from the current destination (Downstream Lineage). Data provenance, as shown in the Data Lineage Graph, can be useful for discovering the data lifecycle of the file as well as for auditability and data governance purposes.

The Data Lineage Graph can be accessed from the File Explorer by clicking thekebab.png Action menu icon beside a file and selecting Properties > Data Lineage Graph.

Exploring the Data Lineage Graph

To view the Data Lineage Graph in a larger window, click the expandfullscreen.pngicon. 

  • The Data Lineage Graph can be explored by scrolling with a mouse or scrolling with the Ctrl key held down to zoom in/out.
  • Click Fit view to refit the graph back to the default, centered state.
  • Click the Nodes dropdown to filter the Data Lineage Graph to show Downstream nodes, Upstream nodes, or All Nodes in the file’s lineage

Understanding the Data Lineage Graph

The Data Lineage Graph displays file lineage information as nodes starting with file upload as the first node. If the file was processed or copied, additional nodes will be displayed. Hovering on a node displays Node Info containing some of the following information:

  • Type: the file’s metadata entity while in this state. For uploaded files, this is always nfs_file (network file storage file). 
  • Name: the name of the file while in this state.
  • Process Time: the end time of the processing activity (if applicable).
  • Location: the location of the file within the VRE - either Green Room or Core
  • File Attributes: the attached attributes and values.

Different icons distinguish the first and last node and any interim processing nodes.

1615479166844-247.png

Current File

Represents the current state of the file in its lineage. Hover over the file node to view the Type, Name, and Process Time of the current file.

1615479163329-716.png

Processing Node

The Processing node indicates a processing activity such as a processing pipeline, or copy/delete action. Hover over the Processing node with your cursor to view the processing action and the date and time it was completed.

1615479172446-315.png

Upstream or Downstream Version of Current File
 Represents either an upstream or downstream version of the current file. Hover over the file node to view the Type, Name, and Upload Time of the file.

VRE UI Diagrams for User Manual - Processing Node_  Lineage.png

Data Transfer Node

This node represents data transfer or copy from the Green Room to the Core.


See Also: 

File Upload

File Download

File Processing

File Explorer

Platform Architecture


https://i.creativecommons.org/l/by-sa/4.0/88x31.png

Copyright © 2022, Indoc Research. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0  International License.