At Flare-on 7th there was a very interesting malware analysis challenge that envolved a very unique hide technique for malicious Macros. This technique is called VBA Stomp, this works by hiding the real source code compiled in P-Code, the “bytecode” used in macros, and then making the visible source code as a fake one, if the malicious crafted document run in the same version that was previously compiled, the p-code will be executed.
Why is this useful ?
Document analysis
Even more, malwares are embedded in documents, this challenge explores how can a malware can inject a malicious code and abuse from VBA macros to exploit machines.
Advanced evasion technique
VBA Stomp is the key of this challenge, and in my humble opinion, a amazing injection technique. The source code literally does not exists in plain text, we must go deeper in order to acomplish the real malware intentions.
I was fooled in that challenge, and I want to share my write-up to solve this one, check it out.
Challenge Message:
Nobody likes analysing infected documents, but it pays the bills. Reverse this macro thrill-ride to discover how to get it to show you the key.
In this challenge we need to analyse a malicious xls document that contain a malicious macro and make use modern and advanced techniques.
Recon
Documents like xls, make use of OLE format, that format store documents, “folder” and bytes streams, its mainly used for documents like, docx, xlsx and others.
I will use oletools to get some internal file informations, such all the streams and VBA projects inside this document, before open the document itself.
Oledump
We can use oledump to get all the streams in the OLE file (document)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Z:\Projects\CTF\FlareOn2020\report>oledump.py -v report.xls
1: 108 '\x01CompObj'
2: 244 '\x05DocumentSummaryInformation'
3: 352240 'Workbook'
4: 97 '_VBA_PROJECT_CUR/F/\x01CompObj'
5: 284 '_VBA_PROJECT_CUR/F/\x03VBFrame'
6: 163 '_VBA_PROJECT_CUR/F/f'
7: 1143744 '_VBA_PROJECT_CUR/F/o'
8: 534 '_VBA_PROJECT_CUR/PROJECT'
9: 68 '_VBA_PROJECT_CUR/PROJECTwm'
10: m 1388 '_VBA_PROJECT_CUR/VBA/F'
11: 10518 '_VBA_PROJECT_CUR/VBA/Sheet1'
12: M 1785 '_VBA_PROJECT_CUR/VBA/ThisWorkbook'
13: 4327 '_VBA_PROJECT_CUR/VBA/_VBA_PROJECT'
14: 3345 '_VBA_PROJECT_CUR/VBA/__SRP_0'
15: 486 '_VBA_PROJECT_CUR/VBA/__SRP_1'
16: 592 '_VBA_PROJECT_CUR/VBA/__SRP_2'
17: 140 '_VBA_PROJECT_CUR/VBA/__SRP_3'
18: 3158 '_VBA_PROJECT_CUR/VBA/__SRP_4'
19: 473 '_VBA_PROJECT_CUR/VBA/__SRP_5'
20: 448 '_VBA_PROJECT_CUR/VBA/__SRP_6'
21: 66 '_VBA_PROJECT_CUR/VBA/__SRP_7'
22: 827 '_VBA_PROJECT_CUR/VBA/dir'
We can see that we have VBA code at 10 and 12 streams, denoted by letter M, let’s dump that:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Z:\Projects\CTF\FlareOn2020\report>oledump.py -v report.xls -v -e -s10
Attribute VB_Name = "F"
Attribute VB_Base = "0{4CAFACAA-2A4D-41F3-9B86-24F5964089BB}{A9F6A711-2CEB-4939-941D-EAC47AFB9092}"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Attribute VB_TemplateDerived = False
Attribute VB_Customizable = False
Z:\Projects\CTF\FlareOn2020\report>oledump.py -v report.xls -v -e -s 12
Attribute VB_Name = "ThisWorkbook"
Attribute VB_Base = "0{00020819-0000-0000-C000-000000000046}"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = True
Attribute VB_TemplateDerived = False
Attribute VB_Customizable = True
Sub Workbook_Open()
Sheet1.folderol
End Sub
Sub Auto_Open()
Sheet1.folderol
End Sub
Notice that we have 2 VBA projects here, project F and project ThisWorkBook, but the script source was at Sheet1, take a look at the open function:
1
2
3
Sub Workbook_Open()
Sheet1.folderol
End Sub
But the oledump was very vague here, we just know that something called Sheet1.folderol is being called when the macro run, let’s dump the whole code using olevba
Olevba
1
olevba report.xls > output
That give to us a lot of useful information about the VB code inside, but I will stick in Sheet1 code, because it’s where the function folderol is found.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
Private Declare Function InternetGetConnectedState Lib "wininet.dll" _
(ByRef dwflags As Long, ByVal dwReserved As Long) As Long
Private Declare PtrSafe Function mciSendString Lib "winmm.dll" Alias _
"mciSendStringA" (ByVal lpstrCommand As String, ByVal _
lpstrReturnString As Any, ByVal uReturnLength As Long, ByVal _
hwndCallback As Long) As Long
Private Declare Function GetShortPathName Lib "kernel32" Alias "GetShortPathNameA" _
(ByVal lpszLongPath As String, ByVal lpszShortPath As String, ByVal lBuffer As Long) As Long
Public Function GetInternetConnectedState() As Boolean
GetInternetConnectedState = InternetGetConnectedState(0&, 0&)
End Function
Function rigmarole(es As String) As String
Dim furphy As String
Dim c As Integer
Dim s As String
Dim cc As Integer
furphy = ""
For i = 1 To Len(es) Step 4
c = CDec("&H" & Mid(es, i, 2))
s = CDec("&H" & Mid(es, i + 2, 2))
cc = c - s
furphy = furphy + Chr(cc)
Next i
rigmarole = furphy
End Function
Function folderol()
Dim wabbit() As Byte
Dim fn As Integer: fn = FreeFile
Dim onzo() As String
Dim mf As String
Dim xertz As Variant
onzo = Split(F.L, ".")
If GetInternetConnectedState = False Then
MsgBox "Cannot establish Internet connection.", vbCritical, "Error"
End
End If
Set fudgel = GetObject(rigmarole(onzo(7)))
Set twattling = fudgel.ExecQuery(rigmarole(onzo(8)), , 48)
For Each p In twattling
Dim pos As Integer
pos = InStr(LCase(p.Name), "vmw") + InStr(LCase(p.Name), "vmt") + InStr(LCase(p.Name), rigmarole(onzo(9)))
If pos > 0 Then
MsgBox rigmarole(onzo(4)), vbCritical, rigmarole(onzo(6))
End
End If
Next
xertz = Array(&H11, &H22, &H33, &H44, &H55, &H66, &H77, &H88, &H99, &HAA, &HBB, &HCC, &HDD, &HEE)
wabbit = canoodle(F.T.Text, 0, 168667, xertz)
mf = Environ(rigmarole(onzo(0))) & rigmarole(onzo(1))
Open mf For Binary Lock Read Write As #fn
Put #fn, , wabbit
Close #fn
mucolerd = mciSendString(rigmarole(onzo(2)) & mf, 0&, 0, 0)
End Function
Function canoodle(panjandrum As String, ardylo As Integer, s As Long, bibble As Variant) As Byte()
Dim quean As Long
Dim cattywampus As Long
Dim kerfuffle() As Byte
ReDim kerfuffle(s)
quean = 0
For cattywampus = 1 To Len(panjandrum) Step 4
kerfuffle(quean) = CByte("&H" & Mid(panjandrum, cattywampus + ardylo, 2)) Xor bibble(quean Mod (UBound(bibble) + 1))
quean = quean + 1
If quean = UBound(kerfuffle) Then
Exit For
End If
Next cattywampus
canoodle = kerfuffle
End Function
In folderol function, we see that the function Split is being called with the VBName F, and access F.T, but F macro is shown as empty, let’s investigate:
Decoding strings
The F variable is a reference to a VBA form, you can see that is a simple user interface with encoded and encrypted data, take a look
I also used ssviewer to interpret the ole file itself, and I found in the stream o the same data in the form.
Here we can see that we have some encoded data divided by “.”, this will give us a list of encoded strings, and each time that the malware wants to access an string, it calls the rigmarole function, this function get the string in hex representation, and subtract the second byte for the first, jumping 2 bytes each time.
Rigmarole function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Function rigmarole(es As String) As String
Dim furphy As String
Dim c As Integer
Dim s As String
Dim cc As Integer
furphy = ""
For i = 1 To Len(es) Step 4
c = CDec("&H" & Mid(es, i, 2))
s = CDec("&H" & Mid(es, i + 2, 2))
cc = c - s
furphy = furphy + Chr(cc)
Next i
rigmarole = furphy
End Function
Now we ca just split the same way that the malware do and run the same decode algorithm, here is the decode algorithm in python:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from pprint import pprint
def decode_string(encoded_string):
out = ""
for s in range(0, len(encoded_string), 4):
c = int(encoded_string[s:s+2], base=16)
s = int(encoded_string[s+2:s+4], base=16)
cc = c - s
out += chr(cc)
return out
if __name__ == '__main__':
encoded_values = "9655B040B64667238524D15D6201.B95D4E01C55CC562C7557405A532D768C55FA12DD074DC697A06E172992CAF3F8A5C7306B7476B38.C555AC40A7469C234424.853FA85C470699477D3851249A4B9C4E.A855AF40B84695239D24895D2101D05CCA62BE5578055232D568C05F902DDC74D2697406D7724C2CA83FCF5C2606B547A73898246B4BC14E941F9121D464D263B947EB77D36E7F1B8254.853FA85C470699477D3851249A4B9C4E.9A55B240B84692239624.CC55A940B44690238B24CA5D7501CF5C9C62B15561056032C468D15F9C2DE374DD696206B572752C8C3FB25C3806.A8558540924668236724B15D2101AA5CC362C2556A055232AE68B15F7C2DC17489695D06DB729A2C723F8E5C65069747AA389324AE4BB34E921F9421.CB55A240B5469B23.AC559340A94695238D24CD5D75018A5CB062BA557905A932D768D15F982D.D074B6696F06D5729E2CAE3FCF5C7506AD47AC388024C14B7C4E8F1F8F21CB64".split(".")
decoded_values = []
for encoded in encoded_values:
decoded_values.append(decode_string(encoded))
for i,v in enumerate(decoded_values):
print("decoded_table[{}] = {}".format(i, v))
)
This will give to us what I called, decoded_table, with this decoded strings it’s possible to interpret the code using the plain text strings.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
python decode_strings.py
decoded_table[0] = AppData
decoded_table[1] = \Microsoft\stomp.mp3
decoded_table[2] = play
decoded_table[3] = FLARE-ON
decoded_table[4] = Sorry, this machine is not supported.
decoded_table[5] = FLARE-ON
decoded_table[6] = Error
decoded_table[7] = winmgmts:\\.\root\CIMV2
decoded_table[8] = SELECT Name FROM Win32_Process
decoded_table[9] = vbox
decoded_table[10] = WScript.Network
decoded_table[11] = \Microsoft\v.png
]
Now, let’s just continue to read the script and when the malware call the decode string function, we just look at our decoded table.
The evasion system
In the code below, we can see that the malware has a simple evade system, I already decoded all the ringmarole calls for the real string value.
1
2
3
4
5
6
7
8
9
10
11
12
Set fudgel = "winmgmts:\\.\root\CIMV2"
Set twattling = fudgel.ExecQuery("SELECT Name FROM Win32_Process")
For Each p In twattling
Dim pos As Integer
pos = InStr(LCase(p.Name), "vmw") + InStr(LCase(p.Name), "vmt") + InStr(LCase(p.Name), "vbox")
If pos > 0 Then
MsgBox rigmarole("Sorry, this machine is not supported", vbCritical, "Error")
End
End If
Next
In order words, this code makes a wmi query to get all running process, and then, if some process contain any trace that is in the virtualized enviroment, an alert box will appear.
Fake Stage 2 extraction
Using the decode table we can see when the second stage is dropped:
1
2
3
4
5
6
7
8
9
10
11
# key ?
xertz = Array(&H11, &H22, &H33, &H44, &H55, &H66, &H77, &H88, &H99, &HAA, &HBB, &HCC, &HDD, &HEE)
wabbit = canoodle(F.T.Text, 0, 168667, xertz)
mf = Environ("AppData") & "\Microsoft\stomp.mp3"
Open mf For Binary Lock Read Write As #fn
Put #fn, , wabbit
Close #fn
mucolerd = mciSendString("play" & mf, 0&, 0, 0)
So, the stage 1 will drop a file with .mp3 extension, and use the mciSendString function to “play” that song, now we need to get the F.T.Text value and understand how the decryption process in canoodle function works.
The two faces of L
The function that decrypt the song is that:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Function canoodle(panjandrum As String, ardylo As Integer, s As Long, bibble As Variant) As Byte()
Dim quean As Long
Dim cattywampus As Long
Dim kerfuffle() As Byte
ReDim kerfuffle(s)
quean = 0
For cattywampus = 1 To Len(panjandrum) Step 4
kerfuffle(quean) = CByte("&H" & Mid(panjandrum, cattywampus + ardylo, 2)) Xor bibble(quean Mod (UBound(bibble) + 1))
quean = quean + 1
If quean = UBound(kerfuffle) Then
Exit For
End If
Next cattywampus
canoodle = kerfuffle
End Function
This can be easily rewritten in Python as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def decrypt_stage(stage2, pos, key, length):
decrypted = []
len_key = len(key)
key_count = 0
for i in range(0, len(stage2), 4):
byte_at = int(stage2[i+pos:i+2+pos], base=16)
b = byte_at ^ key[key_count % len_key]
decrypted.append(b)
key_count += 1
if key_count == length:
break
return bytes(decrypted)
The idea here is, we get a text, a length and where in the string we should split, because this will be a text in the hex format, and we just will decrypt the first byte of this 2 bytes, example:
1
2
b = "C4BA"
b[0+pos:2] ^ key# C4
This is also using a XOR encryption using an block cipher, so we need to always make sure to stay in the range, using mod.
Decrypting the audio
Just like before, I dumped the data using ssviewer, I edited the file to make it start at the 58 value, the start of L.Text(Look at the form again).
Now, calling the function:
1
2
3
4
5
6
7
8
9
10
11
12
key = b"\x11\x22\x33\x44\x55\x66\x77\x88\x99\xAA\xCC\xDD\xEE"
input_file = sys.argv[1]
stage2_decrypted = open(input_file, "r").read().strip()
size = 168667
decrypted_stage2 = decrypt_stage(stage2_decrypted, 0, key, size)
with open("stage2.dec", "wb") as sfd:
print(decrypted_stage2[:4])
sfd.write(decrypted_stage2)
I get this sound:
The sound has only a random sounds, and the title is This is not what you should be looking at… and the Album title is P. Code.
Let’s talk about the real thing here
Ok, forget about almost everything above, the real thing is… that this code above is fake.
VBA Stomping
After research a lot about P-Code, I discovered that VBA macros, actually are compiled when the document is ready and not only hold the real code. The compiled code makes the run process faster if the person who open the document has the same VBA version.
That “feature” can be used to inject malicious code inside the document and write a fake source file to trick analysts and anti virus solutions, if the document has the same VBA version as the attacker, this code will be executed and using that is possible to hide the real code, this technique is called VBA Stomping.
We can dump the real P-Code with the amazing job created by Dr. Vesselin Bontchev the author of pcodedmp, using this we can read the original source code. The P-Code has a format that remember ASM, so in order to get the almost exactly code as the macro, I found this another cool tool based in pcodedmp, pcode2code.
The code in P-Code
Let’s use pcodedmp and get the real code:
1
$ pcode2code report.xls | tee real_code.vbs
I will not dump the whole code again, because only certain things changed, the decrypt and decode table still the same, so we will use our previous work to get the flag.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
' Already decoded the strings looking at our decode table
Set groke = CreateObject("WScript.Network")
firkin = groke.UserDomain
If firkin <> "FLARE-ON" Then
MsgBox "Sorry, this machine is not supported." vbCritical, "Error"
End
End If
n = Len(firkin)
For i = 1 To n
buff(n - i) = Asc(Mid$(firkin, i, 1))
Next
wabbit = canoodle(F.T.Text, 2, 285729, buff)
mf = Environ("AppData" & "\Microsoft\v.png")
Open mf For Binary Lock Read Write As #fn
' a generic exception occured at line 68: can only concatenate str (not "NoneType") to str
' # Ld fn
' # Sharp
' # LitDefault
' # Ld wabbit
' # PutRec
Close #fn
Set panuding = Sheet1.Shapes.AddPicture(mf, False, True, 12, 22, 600, 310)
What has changed here?
- It get the hostname and check if is equal to FLARE-ON
- The FLARE-ON name is reversed and its created an buff with their numbers values, in ascii
- The byte used to decrypt is in the second position
- The length of the output file is greater
- The key is the reversed FLARE-ON string
Very cool, isn’t ?
Finally, the real flag
Now, its just a matter to adapt the script to decrypt for us:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if __name__ == '__main__':
# FLARE-ON
# key = "FLARE-ON"
key = b"\x4e\x4f\x2d\x45\x52\x41\x4c\x46" # reversed
input_file = sys.argv[1]
stage2_decrypted = open(input_file, "r").read().strip()
size = 285730
decrypted_stage2 = decrypt_stage(stage2_decrypted, 2, key, size)
with open("stage2.dec", "wb") as sfd:
print(decrypted_stage2[:4])
sfd.write(decrypted_stage2
And then:
The decrypted flag image is
That a very good example of this technique, I lost a lot of time just looking at the wrong place because I didn’t knew this about this technique, it’s very amazing how this works and a very clever evasion system, we see a lot anti virus solutions having trouble to analyze malicious scripts, can imagine how they will deal with a malicious code injected at the compiled code ?
I recommend the following resources: