Reconstructing Data Provenance from Log Files

dc.contributor.advisorKo, Ryan K.L.
dc.contributor.advisorHolmes, Geoffrey
dc.contributor.advisorRogers, Bill
dc.contributor.authorTan, Yu Shyang
dc.date.accessioned2017-10-09T00:17:50Z
dc.date.available2017-10-09T00:17:50Z
dc.date.issued2017
dc.date.updated2017-09-22T03:25:35Z
dc.description.abstractData provenance describes the derivation history of data, capturing details such as the entities involved and the relationships between entities. Knowledge of data provenance can be used to address issues, such as data quality assurance, data audit and system security. However, current computer systems are usually not equipped with means to acquire data provenance. Modifying underlying systems or introducing new monitoring software for provenance logging may be too invasive for production systems. As a result, data provenance may not always be available. This thesis investigates the completeness and correctness of data provenance reconstructed from log files with respect to the actual derivation history. To accomplish this, we designed and tested a solution that first extracts and models information from log files into provenance relations then reconstructs the data provenance from those relations. The reconstructed output is then evaluated against the ground truth provenance. The thesis also details the methodology used for constructing a dataset for provenance reconstruction research. Experimental results revealed data provenance that completely captures the ground truth can be reconstructed from system-layer log files. However, the outputs are susceptible to errors generated during event logging and errors induced by program dependencies. Results also show that usage of log files of different granularities collected from the system can help resolve logging errors described. Experiments with removing suspected program dependencies using approaches such as blacklisting and clustering have shown that the number of errors can be reduced by a factor of one hundred. Conclusions drawn from this research contribute towards the work on using reconstruction as an alternative approach for acquiring data provenance from computer systems.
dc.format.mimetypeapplication/pdf
dc.identifier.citationTan, Y. S. (2017). Reconstructing Data Provenance from Log Files (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/11388en
dc.identifier.urihttps://hdl.handle.net/10289/11388
dc.language.isoen
dc.publisherThe University of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectData Provenance
dc.subjectReconstruction
dc.subjectLog Analysis
dc.titleReconstructing Data Provenance from Log Files
dc.typeThesis
pubs.place-of-publicationHamilton, New Zealanden_NZ
thesis.degree.grantorThe University of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
6.21 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.07 KB
Format:
Item-specific license agreed upon to submission
Description: