Building a Customer Firewall Archive Analyzer in AWS

Matt Duke
5 min readSep 6, 2022

Have complete granularity over a nearly proprietary solution

The experience of joining a start-up is like no other — scary, exciting, confusing — but above all — an excellent learning opportunity.

Being able to govern yourself or adapt quickly is a tremendous advantage at any job, and nothing teaches that quicker than the start-up life.

And lucky for me on my 5th start-up, I caught a black swan; a company that grew from 3 (-ish) employees to 250 within the span of a 3 months. From back-alley building of our machines to worldwide production, the product began to boom like wildfire.

I served as the AWS Cloud Architect for 4 different companies under a portfolio. 4 companies under the same principle owner, all with separate security concerns.

Being the only full-time developer on staff who was tasked with securing 10 warehouses remotely, it was overwhelming to manage this security posture while still performing daily feature requests and updates to AWS Architecture.

So I began to think up a solution of redundancy…

Services we are going to use:

  • Fargate Cluster (EC2 Instances) — Using Custom Dimensions Server image AMI
  • S3 Bucket — with various storage classes (Standard, Glacier) and lifecycle policies
  • AWS Lambda — Unzipping the folders and places the log folders into standard storage
  • AWS Sagemaker — IP insights algorithms

Data Collection:

There was an on-premises security solution(s); a small campus Firewall. It exported out 3 GB of firewall log data a day. There were 9 of these Watchguard Devices — for the 9 different warehouses all across Texas.

So that meant ~ 27 GB of data to work with a day.

I saw the opportunity to improve the security posture of Texas-wide locations by building a machine learning solution.

I Created a Fargate EC2 instance that served as the central repository for all the logs from all WatchGuard firewalls. Logs from Houston Warehouses, Lubbock, El Paso, and all offices across Texas would be present, all in the same form (.TSV) & zipped up.

Storage:

First, I determined the data retention period inside of the WatchGuard device to be one week. After one week, the logs are deleted. However, upon log creation, they are automatically sent to an S3 Bucket for retention. Logs will be readily available in standard object-class for 7 days until they are moved to Glacier.

Upon the ‘PUT’ request, a lambda function copies the logs (leaves a copy in the data retention bucket), unzips the folder, prepares the data (i.e. transforms from TSV to CSV), and then subsequently deposits the data into another S3 bucket. This new bucket is the training bucket, with data prepared for the IP Insights algorithm.

Training/Modeling:

A P3.2xlarge instance was chosen to run the training job.

After running the specified # of epochs (20), I was able to configure a Sagemaker inference endpoint.

Now, after checking validation metrics & running injections of malicious traffic, the inference endpoint was readily available we could test the Control (Watchguard Device) vs. our newly trained Sagemaker endpoint.

Our objectives for our Sagemaker endpoint & machine learning solution are to see if we could eliminate false positives with SSID intrusion and “false intruder” user device presence within the warehouses ( through anomaly scores).

Analysis:

Amazingly, the Watchguard firewall generated false positives, when the model did not. The WatchGuard firewalls would flag their own SSIDs as malicious intrusions. Once we analyzed the difference between the model and our own WatchGuard UI, we were able to downgrade the warning to a false positive.

This was meant to be a proof of concept, however, it ended up turning into a valuable asset to an IT team needing critical security insight.

With that said, would this be a viable way solution to having a redundancy?

No, and that’s why I call this a “Black Swan” event.

Alternative options for improvement?

Athena Search

Instead of building this system which requires a lot of compute instances, we can use Athena Search — a serverless query interactable tool. So long as the Logs are in a readable format, we will be able to search for them.

This way we can use the Watchguard technology & firewall logs as a source of truth, and the Athena query service to search for potentially malicious IPs.

Note: I would still not use AWS Glue here because the decompression execution time is very high

Glacier Select

Instead of dredging up entire files and paying the cost of expedited retrieval, we can use Glacier Select to simply query which objects we need within the S3 bucket, even if they are in a “frozen” Object Class.

We can use either Glacier Select or Athena Search for different use cases

  • IF Files are in Standard Object Storage USE Athena Search
  • IF Files are in Glacier Object Storage USE Glacier Select

❓ Got a question? Tech or otherwise? — Matthewduke0@gmail.com

--

--

Matt Duke

Product @ Big 4 | Public Speaker | Voracious learner | Zealous Open-Source Advocate.