Show simple item record  

dc.contributor.advisorPfahringer, Bernhard
dc.contributor.advisorGomes, Heitor Murilo
dc.contributor.authorChanajitt, Rajchada
dc.date.accessioned2023-07-16T23:56:23Z
dc.date.available2023-07-16T23:56:23Z
dc.date.issued2023
dc.identifier.urihttps://hdl.handle.net/10289/15896
dc.description.abstractMalware could be developed and transformed into various forms to deceive users and evade antivirus and security endpoint detection. Furthermore, if one machine in the network is compromised, it could be used for lateral movement--when malware spreads stealthily without sending an alarm to monitoring systems. Malware attacks pose security threats to modern enterprises and can cause massive financial, reputation, and data loss to major enterprises. Therefore, it is important to detect these attacks effectively to reduce the loss to the minimum level. The current research uses different approaches, including static and dynamic analysis, to detect and analyze malware categories using distinct feature sets, such as imported modules, opcodes, and API calls, which can improve performance in binary and multi-class classification problems. This thesis proposes a method for identifying and analyzing malware samples via static and dynamic approaches, including memory analysis and consecutive application operation sequences performed on the Windows 10 virtual environment. Standard classifiers and frequently used sequence models are utilized to expose the malware characteristics and benefit predictive capabilities. The features used in these algorithms are extracted from the static and dynamic analysis of malware samples, such as the rich header feature, debug information, temporary files, prefetch files, and event logs. The measurement of the classifiers and the degree of correctness are calculated using the accuracy, f1-score, Mean Absolute Error (MAE), confusion matrix, and Area under the ROC Curve (AUC). Combining two feature sets can provide the best classification performance on static file properties and dynamic analysis results, regardless of whether applying feature selection or not, achieving the accuracy and f1_score at 97% for integrating two datasets. For consecutive sequences, concatenating the Gated Recurrent Unit (GRU) and Transformers model can yield the highest accuracy at 97% for Noriben operations, while GRU can achieve the maximum accuracy for Opcode sequences at 89%.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherThe University of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectStatic
dc.subjectDynamic
dc.subjectOpcode
dc.subjectNoriben
dc.subjectGru
dc.subjectlstm
dc.subjectTransformers
dc.subject.lcshMalware (Computer software) -- Classification
dc.subject.lcshSoftware engineering -- Classification
dc.subject.lcshAlgorithms
dc.subject.lcshComputer networks -- Security measures
dc.subject.lcshComputer security --  Evaluation
dc.subject.lcshMachine learning -- Security measures
dc.titleMachine learning approaches for malware classification based on hybrid artefacts
dc.typeThesis
thesis.degree.grantorThe University of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
dc.date.updated2023-07-11T03:25:35Z
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record