Baskerville is a machine operating on the Deflect network that protects sites from hounding, malicious bots. It’s also an open source project that, in time, will be able to reduce bad behaviour on your networks. Baskerville responds to web traffic, analyzing requests in real-time, and challenging those acting suspiciously.

Baskerville performance

Baskerville is a network traffic anomaly detector, used to identify and challenge malicious IP behaviour.

53,315

Challenged (past 24 hours)

99%

Precision (past 24 hours)

65

Passed Challenge (past 24 hours)

Conventional machine learning approaches for network attack detection is based around recognizing patterns of behaviour, then building and training a classification model. This requires large labelled data sets. However, the rapid pace and unpredictability of cyber-attacks make this labelling impossible in real time as well as incredibly time consuming post-incident. In addition, a signature-based approach is naturally biased towards previous incidents (i.e. previous signatures) and can be out-manoeuvred by new, previously unseen, patterns (i.e. new signatures). Baskerville is built on an unsupervised anomaly detection algorithm, Isolation Forest, which does not require a labelled dataset for training. We improve on the original algorithm in order to support not only numerical but also string features from the exhibited behaviour itself.

We’ve trained Baskerville to recognize what legitimate traffic on our network looks like, and how to distinguish it from malicious requests attempting to disrupt our clients’ websites. Baskerville has turned out to be very handy for mitigating DDoS attacks, and for correctly classifying other types of malicious behaviour. A few months ago, Baskerville passed an important milestone – making its own decisions on traffic deemed anomalous. The quality of these decisions (recall) is high and Baskerville has already successfully mitigated many sophisticated real-life attacks.

Baskerville is also a complete pipeline which receives as input incoming web logs, either from a Kafka topic, from a locally saved raw log file, or from log files saved to an Elasticsearch instance. It processes these logs in batches, forming request sets by grouping them by requested host and requesting IP. It subsequently extracts features for these request sets and predicts whether they are malicious or benign using a model that was trained offline on previously observed and labelled data. Baskerville saves all the data and results to a Postgres database, and publishes metrics on its processing (e.g. number of logs processed, percentage predicted malicious, percentage predicted benign, processing speed etc) that can be consumed by Prometheus and visualized using a Grafana dashboard. As well as an engine that consumes and process web logs, a set of offline analysis tools have been developed for use in conjunction with Baskerville. These tools may be accessed directly, or via two Jupyter notebooks, which walk the user through the machine learning tools and the investigations tools, respectively. The machine learning notebook comprises tools for training, evaluating, and updating the model used in the Baskerville engine. The investigations notebook comprises tools for processing, analyzing, and visualizing past attacks, for reporting purposes.

You can take a look at an investigation we ran comparing human lead attack analytics with those produced by Baskerville.

Also take a look at some slides related to the project here.

Bothunting with Baskerville Download