posted on 2014-10-28, 00:00authored byEdoardo G. Colombo
Botnets are networks of infected machines (the bots) controlled by an external entity, the botmaster, who uses this infrastructure to carry out malicious activities, e.g., spamming and Distributed Denial of Service. The Command and Control Server (C&C) is the machine employed by the botmaster to dispatch orders to and gather data from the bots, and the communication is established through a variety of distributed or centralized protocols, which can vary from botnet to botnet. In the case of DGA-based botnets, a Domain Generation Algorithm (DGA) is used to find the rendezvous point between the bots and the botmaster. Botnets represent one of the most widespread and dangerous threats on the Internet and therefore it is natural that researchers from both the industry and the academia are striving to mitigate this phenomenon. The mitigation of a botnet is a topic widely covered in literature, where we find many works that propose approaches for its detection. Still, all of these systems suffer from the major shortcomings of either using a supervised approach, which means that the system needs some a priori knowledge, or leveraging DNS data containing information on the infected machines, which leads to issues related to the users’ privacy and the deployment of such systems.
We propose Cerberus, an automated system based on machine learning, capable to automatically discover new botnets and use this knowledge to detect and characterize malicious activities. Cerberus analyzes passive DNS data, free of any privacy issues, which allows the system to be easily deployable, and uses an unsupervised approach, i.e., Cerberus needs no a priori knowledge. In fact the system applies a series of filters to discard legitimate domains while keeping domains generated by AGDs and likely to be malicious. Then, Cerberus keeps record of the activity related to the IP addresses of those domains, and, after delta time, it is able to isolate clusters of domains belonging to the same malicious activity. This knowledge is later used to train a classifier that will analyze new DNS data for detection.
We tested our system in the wild by analyzing one week of real passive DNS data. Cerberus was able to detect 47 new clusters of malicious activities: Well known botnets as Jadtre, Sality and Palevo were found among the others. Moreover the tests we ran on the classifier showed an overall accuracy of 93%, proving the effectiveness of the system.