On Using Graph Structures in Network Communications for Peer-To-Peer Botnet Detection
General Material Designation
[Thesis]
First Statement of Responsibility
Joshi, Harshvardhan P.
Subsequent Statement of Responsibility
Stallmann, Matthias
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
North Carolina State University
Date of Publication, Distribution, etc.
2020
GENERAL NOTES
Text of Note
116 p.
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
Ph.D.
Body granting the degree
North Carolina State University
Text preceding or following the note
2020
SUMMARY OR ABSTRACT
Text of Note
Botnets are used for malicious purposes, such as spam and denial of service, with huge economic costs to the society. Decentralized command & control structures of peer-to-peer (P2P) botnets make them more resilient to disruptions. However, these P2P overlay structures appear in communication graphs that are built from network flow meta-data, and can be detected using community detection techniques from graph theory. This is a promising approach for P2P botnet detection because it can work independent of device hardware and software, and is resilient to obfuscations employed by the botnets. In this thesis we formulate and address several research questions relating to the problem of P2P botnet community detection in network communication graphs, in a real-world context. First, we investigate whether P2P botnet community structures can be detected with only partial communication graph, since traffic from an entire P2P botnet is unlikely to be available in the real-world. We analyze the effectiveness of general purpose community detection algorithms from graph theory in detecting P2P botnet communities, with various levels of partial information availability. The results show that the approach can work with only about half of the nodes reporting their communication information, with only small increase in detection errors. Second, we ask how to improve the efficiency of P2P botnet community detection, given that previously proposed community-based botnet detection algorithms are too slow for real-time deployment. We propose GADFly, an algorithm that reduces computation time by using the inherent structure in communication graph to reduce the problem size, while focusing on suspicious P2P communities of interest to improve the precision. Our experiments show that GADFly is 1.5 to 10 times faster than the popular general purpose Louvain algorithm, with comparable recall and improved precision. Third, we ask how to improve the precision of P2P botnet community detection to a level that is practically useful. In our proposed algorithm BotCLAM, we combine insights into the structure of communication graphs and differing definitions of community to improve the precision of P2P botnet community detection. We show that the precision with BotCLAM is 2 to 10 times higher than Louvain and about 50% higher than the GADFly algorithm, with comparable or better recall. Fourth, we investigate whether the P2P botnet community can be identified from the detected communities by simply using the communities' graph structural characteristics. We identify P2P botnet command & control traffic characteristics that influence the communication graph structure, and the metrics to measure these structural properties. We propose a tunable approach