Microsoft, in collaboration with Intel, has announced a new tool that uses deep learning to find and classify malware (via ZDNet). In a research project titled STAMINA (STAtic Malware-as-Image Network Analysis), the researchers developed a new approach for malware mitigation by converting malware samples into images to find textual and structural patterns.
According to the Microsoft-Intel research team, the basis for this research study is to observe if the textural and structural patterns obtained from the image conversion could be useful in the effective classification of malware samples as malicious. Notably, the research study builds on Intel’s existing work on deep transfer learning for static malware classification.
However, the research team used Microsoft’s real-world dataset to test and show that STAMINA achieved a high accuracy (i.e 99.07%) in properly detecting malware samples.
“The joint research showed that applying STAMINA to real-world hold-out test data set achieved a recall of 87.05% at 0.1% false positive rate, and 99.66% recall and 99.07% accuracy at 2.58% false positive rate overall. The results certainly encourage the use of deep transfer learning for the purpose of malware classification. It helps accelerate training by bypassing the search for optimal hyperparameters and architecture searches, saving time and compute resources in the process,” said Microsoft in a blog post last week.
Even though the new deep learning model classifies malware with high accuracy, it has some limitations as well. Microsoft itself admitted that the tool struggled with larger files “For bigger size applications, STAMINA becomes less effective due to limitations in converting billions of pixels into JPEG images and then resizing them.”
On the other hand, the Redmond giant says that it’s in a good position to train the machine learning model based on its vast access to data collection from Windows Defender.
Do you think that this tool could one day be implemented across Microsoft’s products to improve malware detection? Sound off in the comments section below.