To create host-based or network-based intrusion detection systems, we propose ALADIN which stands for Active Learning of Anomalies to Detect INtrusions. ALADIN uses active learning combined with rare class discovery and uncertainty identification to statistically train an intrusion detection or prevention system (IDS/IPS). Active learning selects “interesting traffic” to be shown to a security expert for labeling to substantially reduce the number of labels required from an expert to reach an acceptable level of accuracy and coverage.

We have used the algorithm to analyze several daily logs of outbound network traffic, with over 13 million transfers, from Microsoft’s worldwide corporate network. The algorithm discovered a previously unknown instance of malware on the corporate network in addition to a number of other forms of malware that were logged, but not yet identified.