Creating a Spark SQL View from a Mordor Dataset¶
Author: Jose Rodriguez (@Cyb3rPandah)
Project: Infosec Jupyter Book
Public Organization: Open Threat Research
License: Creative Commons Attribution-ShareAlike 4.0 International
Reference: https://mordordatasets.com/introduction.html
Extracting Mordor JSON File¶
Get compressed Zip file¶
We could use the wget command and the -O (output document file) option to save the file in a datasets folder. However, to keep it simple for this workshop, we already downloaded the dataset for you.
! wget https://raw.githubusercontent.com/hunters-forge/mordor/master/datasets/large/apt29/day1/apt29_evals_day1_manual.zip -O datasets/apt29_evals_day1_manual.zip
Extract JSON file¶
We are using the unzip command and -o (Overwrite) and -d (different directory) options to save the file in datasets folder
! unzip -o ../datasets/apt29_evals_day1_manual.zip -d ../datasets/
Archive: ../datasets/apt29_evals_day1_manual.zip
inflating: ../datasets/apt29_evals_day1_manual_2020-05-01225525.json
We will store the path of the json file in a variable to facilitate our code
apt29Json = '../datasets/apt29_evals_day1_manual_2020-05-01225525.json'
Creating a SQL View¶
Create Spark session¶
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Mordor") \
.config("spark.sql.caseSensitive","True") \
.getOrCreate()
Read JSON file¶
%%time
apt29Df = spark.read.json(apt29Json)
CPU times: user 4.44 ms, sys: 2.46 ms, total: 6.89 ms
Wall time: 27.9 s
apt29Df.show(n = 1, vertical = True)
-RECORD 0-----------------------------------------------
@timestamp | 2020-05-01 22:55:...
@version | 1
AccessList | null
AccessMask | null
AccessReason | null
AccountName | SYSTEM
AccountType | User
Action | null
Active | null
ActiveProfile | null
ActivityID | null
AdapterName | null
AdapterSuffixName | null
AdditionalInfo | null
AdditionalInfo2 | null
AdvancedOptions | null
AlgorithmName | null
Application | null
Archived | null
AuthenticationPackageName | null
BitlockerUserInputTime | null
BootMenuPolicy | null
BootMode | null
BootStatusPolicy | null
BootType | null
BuildVersion | null
CallTrace | C:\windows\SYSTEM...
CallerProcessId | null
CallerProcessName | null
CalloutId | null
CalloutKey | null
CalloutName | null
CalloutType | null
Category | null
ChangeType | null
Channel | Microsoft-Windows...
ClassId | null
ClassName | null
ClientCreationTime | null
ClientProcessId | null
CommandLine | null
Company | null
CompatibleIds | null
Conditions | null
Config | null
ConfigAccessPolicy | null
ContextInfo | null
CorruptionActionState | null
CountNew | null
CountOfCredentialsReturned | null
CountOld | null
CreationUtcTime | null
CurrentDirectory | null
CurrentStratumNumber | null
DCName | null
Default SD String: | null
Description | null
DestAddress | null
DestPort | null
DestinationHostname | null
DestinationIp | null
DestinationIsIpv6 | null
DestinationPort | null
DestinationPortName | null
Details | null
Device | null
DeviceDescription | null
DeviceId | null
DeviceName | null
DeviceNameLength | null
DeviceTime | null
DeviceVersionMajor | null
DeviceVersionMinor | null
Direction | null
DirtyPages | null
DisableIntegrityChecks | null
DisabledPrivilegeList | null
DnsServerList | null
Domain | NT AUTHORITY
DriveName | null
DwordVal | null
EdgeTraversal | null
ElevatedToken | null
EmbeddedContext | null
EnableDisableReason | null
EnabledNew | null
EnabledPrivilegeList | null
EntryCount | null
ErrorCode | null
EventCountTotal | null
EventID | 10
EventIdx | null
EventReceivedTime | 2020-05-01 22:55:26
EventTime | 2020-05-01 22:55:23
EventType | INFO
ExecutionProcessID | 3496
FileName | null
FileVersion | null
FilterId | null
FilterKey | null
FilterName | null
FilterRTID | null
FilterType | null
FinalStatus | null
Flags | null
FlightSigning | null
GrantedAccess | 0x1000
Group | null
GroupMembership | null
GroupPolicyApplied | null
HandleId | null
Hash | null
Hashes | null
HiveName | null
HiveNameLength | null
Hostname | UTICA.dmevals.local
HypervisorDebug | null
HypervisorLaunchType | null
HypervisorLoadOptions | null
IdleImplementation | null
IdleStateCount | null
Image | null
ImageLoaded | null
ImagePath | null
ImpersonationLevel | null
Initiated | null
InstanceId | null
IntegrityLevel | null
InterfaceGuid | null
InterfaceName | null
IpAddress | null
IpPort | null
Ipaddress | null
IsExecutable | null
IsTestConfig | null
KernelDebug | null
KeyFilePath | null
KeyLength | null
KeyName | null
KeyType | null
KeysUpdated | null
Keywords | -9223372036854775808
LastBootGood | null
LastBootId | null
LastShutdownGood | null
LayerId | null
LayerKey | null
LayerName | null
LayerRTID | null
LinkName | null
LmPackageName | null
LoadOptions | null
LocalAddresses | null
LocalOnlyMapped | null
LocationInformation | null
LogDroppedPacketsEnabled | null
LogSuccessfulConnectionsEnabled | null
LogonGuid | null
LogonId | null
LogonProcessName | null
LogonType | null
LooseSourceMapped | null
MajorVersion | null
MandatoryLabel | null
MaxRunspaces | null
MaximumPerformancePercent | null
Message | Process accessed:...
MessageNumber | null
MessageTotal | null
MinRunspaces | null
MinimumPerformancePercent | null
MinimumThrottlePercent | null
MiniportName | null
MiniportNameLen | null
MinorVersion | null
ModifyingApplication | null
ModifyingUser | null
MulticastFlowsEnabled | null
NewProcessId | null
NewProcessName | null
NewProfile | null
NewSd | null
NewThreadId | null
NewTime | null
NewValue | null
NewValueType | null
NominalFrequency | null
NotificationPackageName | null
Number | null
ObjectName | null
ObjectServer | null
ObjectType | null
ObjectValueName | null
OldProfile | null
OldSd | null
OldTime | null
OldValue | null
OldValueType | null
Opcode | null
OpcodeValue | 0
Operation | null
OperationMode | null
OperationType | null
Origin | null
OriginalFileName | null
PackageName | null
ParentCommandLine | null
ParentImage | null
ParentProcessGuid | null
ParentProcessId | null
ParentProcessName | null
Path | null
Payload | null
PerformanceImplementation | null
PipeName | null
PreAuthType | null
PreviousCreationUtcTime | null
PreviousTime | null
PrivilegeList | null
ProcessCreationTime | null
ProcessGuid | null
ProcessId | 900
ProcessName | null
ProcessingMode | null
ProcessingTimeInMilliseconds | null
Product | null
Profile | null
ProfileChanged | null
ProfileUsed | null
Profiles | null
Properties | null
Protocol | null
ProviderContextKey | null
ProviderContextName | null
ProviderContextType | null
ProviderGuid | {5770385F-C22A-43...
ProviderKey | null
ProviderName | null
ProviderType | null
PuaCount | null
PuaPolicyId | null
QfeVersion | null
QueryName | null
QueryResults | null
QueryStatus | null
ReadOperation | null
Reason | null
ReasonForRejection | null
RecordNumber | 138294
RelativeTargetName | null
RemoteAddresses | null
RemoteAdminEnabled | null
RemoteEventLogging | null
RemoteMachineID | null
RemoteUserID | null
ResourceAttributes | null
RestrictedAdminMode | null
RestrictedSidCount | null
ReturnCode | null
RuleAttr | null
RuleId | null
RuleName | -
RuleStatus | null
SchemaVersion | null
ScriptBlockId | null
ScriptBlockText | null
SecurityOptions | null
SecurityPackageName | null
Sent UpdateServer | null
Service | null
ServiceAccount | null
ServiceFileName | null
ServiceName | null
ServiceSid | null
ServiceStartType | null
ServiceType | null
ServiceVersion | null
SettingType | null
SettingValue | null
SettingValueDisplay | null
SettingValueSize | null
Severity | INFO
SeverityValue | 2
ShareLocalPath | null
ShareName | null
ShutdownActionType | null
ShutdownEventCode | null
ShutdownReason | null
Signature | null
SignatureStatus | null
Signed | null
SourceAddress | null
SourceHandleId | null
SourceHostname | null
SourceImage | C:\windows\system...
SourceIp | null
SourceIsIpv6 | null
SourceModuleName | eventlog
SourceModuleType | im_msvistalog
SourceName | Microsoft-Windows...
SourcePort | null
SourcePortName | null
SourceProcessGUID | {6bbf237a-cafb-5e...
SourceProcessGuid | null
SourceProcessId | 900
SourceThreadId | 504
StartAddress | null
StartFunction | null
StartModule | null
StartTime | null
StartType | null
State | null
Status | null
StopTime | null
SubLayerKey | null
SubLayerName | null
SubLayerType | null
SubjectDomainName | null
SubjectLogonId | null
SubjectUserName | null
SubjectUserSid | null
SupportInfo1 | null
SupportInfo2 | null
TSId | null
TargetDomainName | null
TargetFilename | null
TargetHandleId | null
TargetImage | C:\windows\System...
TargetInfo | null
TargetLinkedLogonId | null
TargetLogonGuid | null
TargetLogonId | null
TargetName | null
TargetObject | null
TargetOutboundDomainName | null
TargetOutboundUserName | null
TargetProcessGUID | {6bbf237a-cb97-5e...
TargetProcessGuid | null
TargetProcessId | 2092
TargetProcessName | null
TargetServerName | null
TargetSid | null
TargetUserName | null
TargetUserSid | null
Task | 10
TaskContentNew | null
TaskName | null
TerminalSessionId | null
TestSigning | null
ThreadID | 4396
TicketEncryptionType | null
TicketOptions | null
TimeSource | null
TimeSourceRefId | null
TokenElevationType | null
TransactionId | null
TransmittedServices | null
Type | null
UpdateReason | null
User | null
UserData | null
UserID | S-1-5-18
UserName | null
UserSid | null
UtcTime | 2020-05-02 02:55:...
VendorIds | null
Version | 3
VersionLen | null
VirtualAccount | null
VsmLaunchType | null
VsmPolicy | null
Weight | null
Workstation | null
WorkstationName | null
host | wec.internal.clou...
param1 | null
param10 | null
param11 | null
param2 | null
param3 | null
param4 | null
param5 | null
param6 | null
param7 | null
param8 | null
param9 | null
port | 60737
tags | [mordorDataset]
only showing top 1 row
Expose the dataframe as a SQL view¶
apt29Df.createOrReplaceTempView('apt29')
Analyzing the APT29 dataset¶
Filtering on Sysmon event 1: Process Creation
sysmon1 = spark.sql(
'''
SELECT Image, ProcessId, ProcessGuid
FROM apt29
WHERE Channel = 'Microsoft-Windows-Sysmon/Operational'
AND EventID = 1
''')
sysmon1.show(n = 5, truncate = False)
+---------------------------------------------------------+---------+--------------------------------------+
|Image |ProcessId|ProcessGuid |
+---------------------------------------------------------+---------+--------------------------------------+
|C:\ProgramData\victim\‮cod.3aka3.scr |8524 |{47ab858c-e13c-5eac-a903-000000000400}|
|C:\Windows\System32\conhost.exe |5156 |{47ab858c-e144-5eac-aa03-000000000400}|
|C:\Windows\System32\cmd.exe |2772 |{47ab858c-e144-5eac-ab03-000000000400}|
|C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe|5944 |{47ab858c-e14e-5eac-ac03-000000000400}|
|C:\Windows\System32\SearchProtocolHost.exe |4152 |{47ab858c-e17d-5eac-ad03-000000000400}|
+---------------------------------------------------------+---------+--------------------------------------+
only showing top 5 rows