Thumbnail Image

Mining algorithmic complexity in full-text scholarly documents

Non-textual document elements (NTDE) like charts, diagrams, algorithms play an important role to present key information in scientific documents [1]. Recent advancements in information retrieval systems tap this information to answer more complex queries by mining text pertaining to non-textual document elements. However, linking between document elements and corresponding text can be non-trivial. For instance, linking text related to algorithmic complexity with consequent root algorithm could be challenging. These elements are sometime placed at the start or at the end of the page instead of following the flow of document text, and the discussion about these elements may or may not be on the same page. In recent years, quite a few attempts have been made to extract NTDE [2-3]. These techniques are actively applied for effective document summarization, to improve the existing IR systems. Generally, asymptotic notations are used to identify the complexity lines in full text. We mine the relevant complexities of algorithms from full text by comparing the metadata of algorithm with context of paragraph in which complexity related discussion is made by authors. In this paper, we presented a mechanism for identification of algorithmic complexity lines using regular expressions, algorithmic metadata compilation of algorithms, and linking complexity related textual lines to algorithmic metadata.
Conference Contribution
Type of thesis
Bakar, A., Safder, I., & Hassan, S.-U. (2018). Mining algorithmic complexity in full-text scholarly documents. In ICADL Poster Proceedings. Hamilton, New Zealand: The University of Waikato.
The University of Waikato
© 2018 copyright with the authors.