Reconstructing Data Provenance from Log Files
Tan, Y. S. (2017). Reconstructing Data Provenance from Log Files (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/11388
Permanent Research Commons link: https://hdl.handle.net/10289/11388
Data provenance describes the derivation history of data, capturing details such as the entities involved and the relationships between entities. Knowledge of data provenance can be used to address issues, such as data quality assurance, data audit and system security. However, current computer systems are usually not equipped with means to acquire data provenance. Modifying underlying systems or introducing new monitoring software for provenance logging may be too invasive for production systems. As a result, data provenance may not always be available. This thesis investigates the completeness and correctness of data provenance reconstructed from log files with respect to the actual derivation history. To accomplish this, we designed and tested a solution that first extracts and models information from log files into provenance relations then reconstructs the data provenance from those relations. The reconstructed output is then evaluated against the ground truth provenance. The thesis also details the methodology used for constructing a dataset for provenance reconstruction research. Experimental results revealed data provenance that completely captures the ground truth can be reconstructed from system-layer log files. However, the outputs are susceptible to errors generated during event logging and errors induced by program dependencies. Results also show that usage of log files of different granularities collected from the system can help resolve logging errors described. Experiments with removing suspected program dependencies using approaches such as blacklisting and clustering have shown that the number of errors can be reduced by a factor of one hundred. Conclusions drawn from this research contribute towards the work on using reconstruction as an alternative approach for acquiring data provenance from computer systems.
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
- Higher Degree Theses