Leveraging LLMs for Malware Analysis: Insights and Future Directions

By Gerardo Fernández, Joseliyo Sánchez and Vicente Díaz
Malware analysis is (probably) the most expert-demanding and time-consuming activity for any security professional. Unfortunately automation for static analysis has always been challenging for the security industry. The sheer volume and complexity of malicious code necessitate innovative approaches for efficient and effective analysis. At VirusTotal, we’ve been exploring the potential of Large Language Models (LLMs) to revolutionize malware analysis. We started this path last April 2023 by automatically analyzing malicious scripts, and since then, we evolved our model to analyze Windows executable files. In this post, we want to share part of our current research and findings, as well as discuss future directions in this challenging approach.

Our approach

As a parallel development to the architecture described in our previous post, we wanted to better understand what are the strengths and limitations of LLMs when analyzing PE files. Our initial approach using memory dumps from sandbox detonation and backscatter for additional deobfuscation capabilities (which will likely be the biggest challenge for the analysis) sounds like a great approach, however rebuilding binaries from memory dumps has its own problems and all this process takes additional time and computational resources – maybe it won’t be necessary for every sample! Thus, the importance of understanding what LLMs can and can’t do when faced with a decompiled (or disassembled) binary.
We also want to consider additional tools we might use to provide the LLMs with additional context, including our sandbox analysis. For decompilation we will be using Hex-Rays IDA Pro most of the time, however our approach will be using a “decision tree” to optimize what tools, prompts and additional context to use in every case.
Our LLM of choice will be Gemini 1.5. The extended token capabilities is what in essence allows us to analyze

[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.

This article has been indexed from VirusTotal Blog

Read the original article: