Plaso Super Timelines and CloudTrails
In the realm of digital forensics and incident response, the ability to parse and analyze log files efficiently is crucial. AWS CloudTrail provides detailed logs of API calls and user activity within your AWS environment, making it an invaluable resource for security auditing, compliance monitoring, and forensic investigations. However, the sheer volume and complexity of CloudTrail logs can pose significant challenges.
Enter Plaso, an open-source tool designed for efficient log file parsing and timeline creation. Developed to handle a wide array of log formats, Plaso has been a go-to solution for forensic analysts seeking to streamline their analysis workflows of log and other artifacts.
In this blog post, we’ll delve into how Plaso handles CloudTrail logs. This write-up stems from recent experiences using Plaso for previous posts and discovering that Plaso includes a CloudTrail parser. Given the limited information at this time on how Plaso processes CloudTrail logs, this article aims to provide insights and useful information on using Plaso with CloudTrail logs.
Let’s dive in!
Prerequisites
Here are the tools and data I’m using:
- Plaso log2timeline Docker image: I have a write-up here on how to use this. For much of this write-up, I’ll assume this is known or referenced.
- Publicly available CloudTrail logs: At Blackhat 2024, I had an opportunity to chat with Korstiaan Stam from Invictus, who mentioned he had a set of logs on the Invictus GitHub page here.
- Log converter script: aws-cloudtrail2plaso.py — This script will convert raw CloudTrail logs downloaded from S3 into the lookup-event format extracted via the command-line with Boto3. Note: As of this writing, this script has only been tested with the dataset from Invictus, mentioned above.
Getting ready
Below is the folder structure I’m working with. I have an Evidence
folder for my logs and the L2T
folder for all my Plaso output. I’ve placed the aws-cloudtrail2plaso.py
script in the D:\Cases\Cloudtrail
folder.
PS D:\Cases\Cloudtrail> tree
D:.
└───Invictus
├───Evidence
│ └───aws_dataset # CloudTrail logs
└───L2T # Plaso output
Download and extract the “aws-dataset” from the Invictus GitHub page. I’ve placed my logs in D:\Cases\Cloudtrail\Invictus\Evidence\aws_dataset
.
Data Conversion
While initially working with Plaso on this project, attempting and failing at processing the CloudTrail logs, I discovered there was a small set of test data on the Plaso GitHub. But this test data was a different format from the dataset I was using.
The Plaso CloudTrail test data started with “EventId”, yet I was expecting to parse data in the following format, starting with “Records”.
Ultimately, I learned that what Plaso was expecting was the “lookup-event” data extracted via command-line or programmatically, as opposed to raw data downloaded directly from an S3 bucket.
Here is a sample of how this is done, from AWS
AWS shows this as the response syntax from the lookup-events, as defined in the Boto3 documentation here. The response structure that describes these fields can be viewed at the link as well.
{
'Events': [
{
'EventId': 'string',
'EventName': 'string',
'ReadOnly': 'string',
'AccessKeyId': 'string',
'EventTime': datetime(2015, 1, 1),
'EventSource': 'string',
'Username': 'string',
'Resources': [
{
'ResourceType': 'string',
'ResourceName': 'string'
},
],
'CloudTrailEvent': 'string'
},
],
}
The main thing I wanted to point out here, is how different this looks from the raw CloudTrail log data. However, you will see the CloudTrailEvent
field, which also contains the JSON string of the CloudTrail event itself.
Thus, I was curious if it was possible to transform my logs to the format lookup-event format that Plaso was expecting. So as a non-programmer, and DALL-E amateur, I asked for AI assistance.
Enter aws-cloudtrail2plaso.py
!
# PowerShell: Usage of the aws-cloudtrail2plaso.py script
PS D:\Cases\Cloudtrail> python .\aws-cloudtrail2plaso.py -h
usage: aws-cloudtrail2plaso.py [-h] input_directory output_filepath
Process CloudTrail JSON files and convert them to Plaso-compatible JSONL format.
positional arguments:
input_directory Path to the directory containing CloudTrail JSON, JSONL, or GZ files.
output_filepath Path and base name for the output JSONL file(s).
# PowerShell: Pre-processing with aws-cloudtrail2plaso.py
python D:\Cases\Cloudtrail\aws-cloudtrail2plaso.py `
D:\Cases\Cloudtrail\Invictus\Evidence\aws_dataset\ `
D:\Cases\Cloudtrail\Invictus\L2T\Invictus_aws_dataset.jsonl
# Output (Some info truncated)
Processed 1 of 55 files: ... us-east-1_20230710T1145Z_7xgocspSowgK0Gto.json
Processed 2 of 55 files: ... us-east-1_20230710T1145Z_s7dpHbl38neqZbm2.json
...
Processed 54 of 55 files: ... us-east-1_20230710T1235Z_YbVFCP9AYzJDhHV9.json
Processed 55 of 55 files: ... us-east-1_20230710T1240Z_C1qUFaqvZS64BcIN.json
Total records processed: 2900
Unique records found: 2900
Duplicates removed: 0
The usage and output from the aws-cloudtrail2plaso.py
Python script is shown above. This script will go through a directory of raw CloudTrail files, in json
or json.gz
format, that have the “Records” header. It will export out a JSON-L (JSON Line) formatted file that resembles the lookup-event export format.
The two images below show the newly converted data file, and a sample of the data itself.
Using Plaso CloudTrail parser
As defined in the Plaso doc, CloudTrail log parsing is supported under the JSON-L log file format.
Additionally, the parsing module for CloudTrail logs uses the JSON-L parser plugin, as shown in the Plaso parser API documentation here. From this documentation, you will find the similar event fields we see from the AWS documentation for the lookup-events, including that Cloud_trail_event
field.
Long story short, we’ll be using Plaso’s jsonl
parser to process these logs into the Plaso storage file.
Here is the PowerShell command to run the Plaso docker image, to process our CloudTrail logs:
# PowerShell: Running Plaso for Docker against CloudTrail Logs
docker run --rm `
-v D:/Cases/Cloudtrail/Invictus:/data `
log2timeline/plaso log2timeline --parsers="jsonl" `
--storage-file /data/L2T/Invictus_aws_dataset.plaso `
/data/L2T/Invictus_aws_dataset.jsonl
Here are the points describing the PowerShell command:
Command: docker run --rm
- Runs a Docker container and removes it automatically after the command finishes.
Volume Mapping: -v D:/Cases/Cloudtrail/Invictus:/data
- Maps the local directory
D:/Cases/Cloudtrail/Invictus
to the container's/data
directory, making local files accessible within the container.
Docker Image and Command: log2timeline/plaso log2timeline
- Specifies the Docker image to use, which is
log2timeline/plaso
, and executes thelog2timeline
command inside the Docker container.
Parsers Option: --parsers="jsonl"
- Limits the log2timeline parsing to JSONL (JSON Lines) format, which is appropriate for CloudTrail logs.
Storage File: --storage-file /data/L2T/Invictus_aws_dataset.plaso
- Specifies the output storage file for the parsed data, which will be saved as
Invictus_aws_dataset.plaso
.
Input File: /data/L2T/Invictus_aws_dataset.jsonl
- Indicates the input file to be parsed, located at
/data/L2T/Invictus_aws_dataset.jsonl
After running the log2timeline command via docker, we see the Plaso process window showing 2900 events parsed from our converted dataset.
Confirming with pinfo
we see the aws_cloudtrail_log
parser found 2900 events.
# PowerShell: Running pinfo on the new storage file
docker run --rm `
-v D:/Cases/Cloudtrail/Invictus:/data `
log2timeline/plaso pinfo /data/L2T/Invictus_aws_dataset.plaso
# Output of pinfo
*********************** Plaso Storage Information ************************
Filename : Invictus_aws_dataset.plaso
Format version : 20230327
Serialization format : json
--------------------------------------------------------------------------
******************************** Sessions ********************************
37d1ce06-9c8a-4cb3-99ec-45936545c224 : 2024-08-13T01:17:57.578013+00:00
--------------------------------------------------------------------------
***************************** Event sources ******************************
Total : 1
--------------------------------------------------------------------------
********************** Events generated per parser ***********************
Parser (plugin) name : Number of events
--------------------------------------------------------------------------
aws_cloudtrail_log : 2900
Total : 2900
--------------------------------------------------------------------------
Plaso Output
For the output, we’ll use the Plaso psort
command, in the docker container, to format and export the data from our Plaso storage file. Here is a sample command to export to json format.
# PowerShell: Running psort to output in json format
docker run --rm `
-v D:/Cases/Cloudtrail/Invictus:/data `
log2timeline/plaso psort `
--output_time_zone UTC -o json`
-w /data/L2T/Invictus_aws_dataset.json`
/data/L2T/Invictus_aws_dataset.plaso
Output Modules
I’ll breifly go over the supported module outputs and provide a sample so you can see what that looks like. Here is the output list from psort
for reference.
# PowerShell command to view the psort output modules in Docker.
docker run --rm `
-v D:/Cases/Cloudtrail/Invictus:/data `
log2timeline/plaso psort -o list
******************************** Output Modules ****************************
Name : Description
----------------------------------------------------------------------------
dynamic : Dynamic selection of fields for a separated value output
format.
json : Saves the events into a JSON format.
json_line : Saves the events into a JSON line format.
kml : Saves events with geography data into a KML format.
l2tcsv : CSV format used by legacy log2timeline, with 17 fixed fields
l2ttln : Extended TLN 7 field | delimited output.
null : Output module that does not output anything.
opensearch : Saves the events into an OpenSearch database.
opensearch_ts : Saves the events into an OpenSearch database for use with
Timesketch.
rawpy : native (or "raw") Python output.
tln : TLN 5 field | delimited output.
xlsx : Excel Spreadsheet (XLSX) output
----------------------------------------------------------------------------
Not all of these work for our data and some of them are better than others. So, use what fits your needs, if you decide to use Plaso for CloudTrail parsing.
TL, DR — The supported formats are:
dynamic, json, json_line, l2tcsv, l2ttln, rawpy, tln, and xlsx
The json
and jsonl
output formats are very similar. It’s worth noting that these two formats include the summarized message field. Which seem handy for log review. These are likely the most useful outputs for large amounts of data, as you could later load them into a review platform such as Splunk. I have a write-up here that should help with that, though there may be some differences to consider.
The dynamic
and l2tcsv
outputs are both CSV format.
The xlsx
format is limited to the message summary and datetime information mostly. Probably fine for a quick overview, but lacks a lot of info.
The tln
and l2ttln
outputs are very similar. They are pipe delimited, with the l2ttln
output having two extra field for Time zone and Notes.
The raw python output is an interesting view of the data. Not useful for review here, though it’s neatly organized. Perhaps useful for individual records pasted into a report.
Observations
Plaso, in its current form, may not be the best tool for most CloudTrail log reviews. A major drawback is its apparent limitation to parsing only lookup-event formatted logs. I would welcome any feedback or guidance on how to get it to work with raw CloudTrail logs, such as those accessible via S3.
Nevertheless, I appreciate the summary provided in the message field. It is quite useful for quickly gaining visibility into events, especially considering the large volume of data in CloudTrail logs.
Of course, Plaso is about Timelining all the things! So, this parser for CloudTrail is likely still a great tool when you have a lot of different sources to bring together for a broad view of activity.
Final Thoughts
Plaso offers a valuable tool set for parsing and analyzing CloudTrail logs, especially with its ability to generate detailed timelines and event summaries. By exporting out the lookup-events via command line or converting raw CloudTrail logs from S3 to the lookup-event format that Plaso expects, you can effectively utilize Plaso to gain quick insights and understand the events within your AWS environment, from either log export you choose.
Despite the need to pre-process some logs into a compatible format, the streamlined analysis and useful summaries provided by Plaso make it a beneficial tool for forensic investigations. The steps and tools outlined in this post demonstrate how you can harness Plaso’s capabilities to enhance your log analysis workflow.
For forensic analysts and incident responders, Plaso remains a powerful and reliable tool for efficiently parsing and interpreting CloudTrail logs, providing clear and actionable insights.
Connect!
I hope you’ve found this post useful. I’d love to hear your thoughts and ideas, so reach out and connect here or on LinkedIn.