Home VBA Stomping: The macro hidden in plain sight
Post
Cancel

VBA Stomping: The macro hidden in plain sight

At Flare-on 7th there was a very interesting malware analysis challenge that envolved a very unique hide technique for malicious Macros. This technique is called VBA Stomp, this works by hiding the real source code compiled in P-Code, the “bytecode” used in macros, and then making the visible source code as a fake one, if the malicious crafted document run in the same version that was previously compiled, the p-code will be executed.

Why is this useful ?

Document analysis

Even more, malwares are embedded in documents, this challenge explores how can a malware can inject a malicious code and abuse from VBA macros to exploit machines.

Advanced evasion technique

VBA Stomp is the key of this challenge, and in my humble opinion, a amazing injection technique. The source code literally does not exists in plain text, we must go deeper in order to acomplish the real malware intentions.

I was fooled in that challenge, and I want to share my write-up to solve this one, check it out.

Challenge Message:

Nobody likes analysing infected documents, but it pays the bills. Reverse this macro thrill-ride to discover how to get it to show you the key.

In this challenge we need to analyse a malicious xls document that contain a malicious macro and make use modern and advanced techniques.

Recon

Documents like xls, make use of OLE format, that format store documents, “folder” and bytes streams, its mainly used for documents like, docx, xlsx and others.

I will use oletools to get some internal file informations, such all the streams and VBA projects inside this document, before open the document itself.

Oledump

We can use oledump to get all the streams in the OLE file (document)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Z:\Projects\CTF\FlareOn2020\report>oledump.py -v report.xls
  1:       108 '\x01CompObj'
  2:       244 '\x05DocumentSummaryInformation'
3:    352240 'Workbook'
4:        97 '_VBA_PROJECT_CUR/F/\x01CompObj'
  5:       284 '_VBA_PROJECT_CUR/F/\x03VBFrame'
  6:       163 '_VBA_PROJECT_CUR/F/f'
  7:   1143744 '_VBA_PROJECT_CUR/F/o'
  8:       534 '_VBA_PROJECT_CUR/PROJECT'
  9:        68 '_VBA_PROJECT_CUR/PROJECTwm'
 10: m    1388 '_VBA_PROJECT_CUR/VBA/F'
 11:     10518 '_VBA_PROJECT_CUR/VBA/Sheet1'
 12: M    1785 '_VBA_PROJECT_CUR/VBA/ThisWorkbook'
 13:      4327 '_VBA_PROJECT_CUR/VBA/_VBA_PROJECT'
 14:      3345 '_VBA_PROJECT_CUR/VBA/__SRP_0'
 15:       486 '_VBA_PROJECT_CUR/VBA/__SRP_1'
 16:       592 '_VBA_PROJECT_CUR/VBA/__SRP_2'
 17:       140 '_VBA_PROJECT_CUR/VBA/__SRP_3'
 18:      3158 '_VBA_PROJECT_CUR/VBA/__SRP_4'
 19:       473 '_VBA_PROJECT_CUR/VBA/__SRP_5'
 20:       448 '_VBA_PROJECT_CUR/VBA/__SRP_6'
 21:        66 '_VBA_PROJECT_CUR/VBA/__SRP_7'
 22:       827 '_VBA_PROJECT_CUR/VBA/dir'

We can see that we have VBA code at 10 and 12 streams, denoted by letter M, let’s dump that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 Z:\Projects\CTF\FlareOn2020\report>oledump.py -v report.xls -v -e -s10
Attribute VB_Name = "F"
Attribute VB_Base = "0{4CAFACAA-2A4D-41F3-9B86-24F5964089BB}{A9F6A711-2CEB-4939-941D-EAC47AFB9092}"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Attribute VB_TemplateDerived = False
Attribute VB_Customizable = False

Z:\Projects\CTF\FlareOn2020\report>oledump.py -v report.xls -v -e -s 12
Attribute VB_Name = "ThisWorkbook"
Attribute VB_Base = "0{00020819-0000-0000-C000-000000000046}"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = True
Attribute VB_TemplateDerived = False
Attribute VB_Customizable = True
Sub Workbook_Open()
Sheet1.folderol
End Sub

Sub Auto_Open()
Sheet1.folderol
End Sub

Notice that we have 2 VBA projects here, project F and project ThisWorkBook, but the script source was at Sheet1, take a look at the open function:

1
2
3
Sub Workbook_Open()
Sheet1.folderol
End Sub

But the oledump was very vague here, we just know that something called Sheet1.folderol is being called when the macro run, let’s dump the whole code using olevba

Olevba

1
olevba report.xls > output

That give to us a lot of useful information about the VB code inside, but I will stick in Sheet1 code, because it’s where the function folderol is found.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
Private Declare Function InternetGetConnectedState Lib "wininet.dll" _
(ByRef dwflags As Long, ByVal dwReserved As Long) As Long

Private Declare PtrSafe Function mciSendString Lib "winmm.dll" Alias _
   "mciSendStringA" (ByVal lpstrCommand As String, ByVal _
   lpstrReturnString As Any, ByVal uReturnLength As Long, ByVal _
   hwndCallback As Long) As Long

Private Declare Function GetShortPathName Lib "kernel32" Alias "GetShortPathNameA" _
    (ByVal lpszLongPath As String, ByVal lpszShortPath As String, ByVal lBuffer As Long) As Long

Public Function GetInternetConnectedState() As Boolean
  GetInternetConnectedState = InternetGetConnectedState(0&, 0&)
End Function

Function rigmarole(es As String) As String
    Dim furphy As String
    Dim c As Integer
    Dim s As String
    Dim cc As Integer
    furphy = ""
    For i = 1 To Len(es) Step 4
        c = CDec("&H" & Mid(es, i, 2))
        s = CDec("&H" & Mid(es, i + 2, 2))
        cc = c - s
        furphy = furphy + Chr(cc)
    Next i
    rigmarole = furphy
End Function

Function folderol()
    Dim wabbit() As Byte
    Dim fn As Integer: fn = FreeFile
    Dim onzo() As String
    Dim mf As String
    Dim xertz As Variant
    
    onzo = Split(F.L, ".")
    
    If GetInternetConnectedState = False Then
        MsgBox "Cannot establish Internet connection.", vbCritical, "Error"
        End
    End If

    Set fudgel = GetObject(rigmarole(onzo(7)))
    Set twattling = fudgel.ExecQuery(rigmarole(onzo(8)), , 48)
    For Each p In twattling
        Dim pos As Integer
        pos = InStr(LCase(p.Name), "vmw") + InStr(LCase(p.Name), "vmt") + InStr(LCase(p.Name), rigmarole(onzo(9)))
        If pos > 0 Then
            MsgBox rigmarole(onzo(4)), vbCritical, rigmarole(onzo(6))
            End
        End If
    Next
        
    xertz = Array(&H11, &H22, &H33, &H44, &H55, &H66, &H77, &H88, &H99, &HAA, &HBB, &HCC, &HDD, &HEE)

    wabbit = canoodle(F.T.Text, 0, 168667, xertz)
    mf = Environ(rigmarole(onzo(0))) & rigmarole(onzo(1))
    Open mf For Binary Lock Read Write As #fn
      Put #fn, , wabbit
    Close #fn
    
    mucolerd = mciSendString(rigmarole(onzo(2)) & mf, 0&, 0, 0)
End Function

Function canoodle(panjandrum As String, ardylo As Integer, s As Long, bibble As Variant) As Byte()
    Dim quean As Long
    Dim cattywampus As Long
    Dim kerfuffle() As Byte
    ReDim kerfuffle(s)
    quean = 0
    For cattywampus = 1 To Len(panjandrum) Step 4
        kerfuffle(quean) = CByte("&H" & Mid(panjandrum, cattywampus + ardylo, 2)) Xor bibble(quean Mod (UBound(bibble) + 1))
        quean = quean + 1
        If quean = UBound(kerfuffle) Then
            Exit For
        End If
    Next cattywampus
    canoodle = kerfuffle
End Function

In folderol function, we see that the function Split is being called with the VBName F, and access F.T, but F macro is shown as empty, let’s investigate:

Decoding strings

The F variable is a reference to a VBA form, you can see that is a simple user interface with encoded and encrypted data, take a look

I also used ssviewer to interpret the ole file itself, and I found in the stream o the same data in the form.

Here we can see that we have some encoded data divided by “.”, this will give us a list of encoded strings, and each time that the malware wants to access an string, it calls the rigmarole function, this function get the string in hex representation, and subtract the second byte for the first, jumping 2 bytes each time.

Rigmarole function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Function rigmarole(es As String) As String
    Dim furphy As String
    Dim c As Integer
    Dim s As String
    Dim cc As Integer
    furphy = ""
    For i = 1 To Len(es) Step 4
        c = CDec("&H" & Mid(es, i, 2))
        s = CDec("&H" & Mid(es, i + 2, 2))
        cc = c - s
        furphy = furphy + Chr(cc)
    Next i
    rigmarole = furphy
End Function

Now we ca just split the same way that the malware do and run the same decode algorithm, here is the decode algorithm in python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from pprint import pprint

def decode_string(encoded_string):
	out = ""
	for s in range(0, len(encoded_string), 4):
		c = int(encoded_string[s:s+2], base=16)
		s = int(encoded_string[s+2:s+4], base=16)

		cc = c - s
		out += chr(cc)

	return out	


if __name__ == '__main__':
	encoded_values =  "9655B040B64667238524D15D6201.B95D4E01C55CC562C7557405A532D768C55FA12DD074DC697A06E172992CAF3F8A5C7306B7476B38.C555AC40A7469C234424.853FA85C470699477D3851249A4B9C4E.A855AF40B84695239D24895D2101D05CCA62BE5578055232D568C05F902DDC74D2697406D7724C2CA83FCF5C2606B547A73898246B4BC14E941F9121D464D263B947EB77D36E7F1B8254.853FA85C470699477D3851249A4B9C4E.9A55B240B84692239624.CC55A940B44690238B24CA5D7501CF5C9C62B15561056032C468D15F9C2DE374DD696206B572752C8C3FB25C3806.A8558540924668236724B15D2101AA5CC362C2556A055232AE68B15F7C2DC17489695D06DB729A2C723F8E5C65069747AA389324AE4BB34E921F9421.CB55A240B5469B23.AC559340A94695238D24CD5D75018A5CB062BA557905A932D768D15F982D.D074B6696F06D5729E2CAE3FCF5C7506AD47AC388024C14B7C4E8F1F8F21CB64".split(".")
	
	decoded_values = []
	for encoded in encoded_values:
		decoded_values.append(decode_string(encoded))
	
	
	for i,v in enumerate(decoded_values):
		print("decoded_table[{}] = {}".format(i, v))	
)

This will give to us what I called, decoded_table, with this decoded strings it’s possible to interpret the code using the plain text strings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 python decode_strings.py
decoded_table[0] = AppData
decoded_table[1] = \Microsoft\stomp.mp3
decoded_table[2] = play 
decoded_table[3] = FLARE-ON
decoded_table[4] = Sorry, this machine is not supported.
decoded_table[5] = FLARE-ON
decoded_table[6] = Error
decoded_table[7] = winmgmts:\\.\root\CIMV2
decoded_table[8] = SELECT Name FROM Win32_Process
decoded_table[9] = vbox
decoded_table[10] = WScript.Network
decoded_table[11] = \Microsoft\v.png
]

Now, let’s just continue to read the script and when the malware call the decode string function, we just look at our decoded table.

The evasion system

In the code below, we can see that the malware has a simple evade system, I already decoded all the ringmarole calls for the real string value.

1
2
3
4
5
6
7
8
9
10
11
12
 Set fudgel = "winmgmts:\\.\root\CIMV2"
 Set twattling = fudgel.ExecQuery("SELECT Name FROM Win32_Process")

For Each p In twattling
    Dim pos As Integer
    pos = InStr(LCase(p.Name), "vmw") + InStr(LCase(p.Name), "vmt") + InStr(LCase(p.Name), "vbox")
    If pos > 0 Then
        MsgBox rigmarole("Sorry, this machine is not supported", vbCritical, "Error")

        End
    End If
Next

In order words, this code makes a wmi query to get all running process, and then, if some process contain any trace that is in the virtualized enviroment, an alert box will appear.

Fake Stage 2 extraction

Using the decode table we can see when the second stage is dropped:

1
2
3
4
5
6
7
8
9
10
11
# key ?
xertz = Array(&H11, &H22, &H33, &H44, &H55, &H66, &H77, &H88, &H99, &HAA, &HBB, &HCC, &HDD, &HEE)

wabbit = canoodle(F.T.Text, 0, 168667, xertz)
mf = Environ("AppData") & "\Microsoft\stomp.mp3"

Open mf For Binary Lock Read Write As #fn
    Put #fn, , wabbit
Close #fn

mucolerd = mciSendString("play" &  mf, 0&, 0, 0)

So, the stage 1 will drop a file with .mp3 extension, and use the mciSendString function to “play” that song, now we need to get the F.T.Text value and understand how the decryption process in canoodle function works.

The two faces of L

The function that decrypt the song is that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Function canoodle(panjandrum As String, ardylo As Integer, s As Long, bibble As Variant) As Byte()
    Dim quean As Long
    Dim cattywampus As Long
    Dim kerfuffle() As Byte
    ReDim kerfuffle(s)
    quean = 0
    For cattywampus = 1 To Len(panjandrum) Step 4
        kerfuffle(quean) = CByte("&H" & Mid(panjandrum, cattywampus + ardylo, 2)) Xor bibble(quean Mod (UBound(bibble) + 1))
        quean = quean + 1
        If quean = UBound(kerfuffle) Then
            Exit For
        End If
    Next cattywampus
    canoodle = kerfuffle
End Function

This can be easily rewritten in Python as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def decrypt_stage(stage2, pos, key, length):
	decrypted = []
	len_key = len(key)
	key_count = 0

	for i in range(0, len(stage2), 4):
		byte_at = int(stage2[i+pos:i+2+pos], base=16)
		b = byte_at ^ key[key_count % len_key]
		
		decrypted.append(b)
		key_count += 1

		if key_count == length:
			break

	return bytes(decrypted)

The idea here is, we get a text, a length and where in the string we should split, because this will be a text in the hex format, and we just will decrypt the first byte of this 2 bytes, example:

1
2
b = "C4BA"
b[0+pos:2] ^ key# C4

This is also using a XOR encryption using an block cipher, so we need to always make sure to stay in the range, using mod.

Decrypting the audio

Just like before, I dumped the data using ssviewer, I edited the file to make it start at the 58 value, the start of L.Text(Look at the form again).

Now, calling the function:

1
2
3
4
5
6
7
8
9
10
11
12
key = b"\x11\x22\x33\x44\x55\x66\x77\x88\x99\xAA\xCC\xDD\xEE"

input_file = sys.argv[1]

stage2_decrypted = open(input_file, "r").read().strip()
size = 	168667
decrypted_stage2 = decrypt_stage(stage2_decrypted, 0, key, size)

    
with open("stage2.dec", "wb") as sfd:
    print(decrypted_stage2[:4])
    sfd.write(decrypted_stage2)

I get this sound:

The sound has only a random sounds, and the title is This is not what you should be looking at… and the Album title is P. Code.

Let’s talk about the real thing here

Ok, forget about almost everything above, the real thing is… that this code above is fake.

VBA Stomping

After research a lot about P-Code, I discovered that VBA macros, actually are compiled when the document is ready and not only hold the real code. The compiled code makes the run process faster if the person who open the document has the same VBA version.

That “feature” can be used to inject malicious code inside the document and write a fake source file to trick analysts and anti virus solutions, if the document has the same VBA version as the attacker, this code will be executed and using that is possible to hide the real code, this technique is called VBA Stomping.

We can dump the real P-Code with the amazing job created by Dr. Vesselin Bontchev the author of pcodedmp, using this we can read the original source code. The P-Code has a format that remember ASM, so in order to get the almost exactly code as the macro, I found this another cool tool based in pcodedmp, pcode2code.

The code in P-Code

Let’s use pcodedmp and get the real code:

1
$ pcode2code report.xls | tee real_code.vbs

I will not dump the whole code again, because only certain things changed, the decrypt and decode table still the same, so we will use our previous work to get the flag.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
' Already decoded the strings looking at our decode table
Set groke = CreateObject("WScript.Network")
firkin = groke.UserDomain
If firkin <> "FLARE-ON" Then
    MsgBox "Sorry, this machine is not supported." vbCritical, "Error"
    End
End If

n = Len(firkin)
For i = 1 To n
    buff(n - i) = Asc(Mid$(firkin, i, 1))
Next

wabbit = canoodle(F.T.Text, 2, 285729, buff)
mf = Environ("AppData" & "\Microsoft\v.png")
Open mf For Binary Lock Read Write As #fn
' a generic exception occured at line 68: can only concatenate str (not "NoneType") to str
'	# Ld fn
'	# Sharp
'	# LitDefault
'	# Ld wabbit
'	# PutRec
Close #fn

Set panuding = Sheet1.Shapes.AddPicture(mf, False, True, 12, 22, 600, 310)

What has changed here?

  • It get the hostname and check if is equal to FLARE-ON
  • The FLARE-ON name is reversed and its created an buff with their numbers values, in ascii
  • The byte used to decrypt is in the second position
  • The length of the output file is greater
  • The key is the reversed FLARE-ON string

Very cool, isn’t ?

Finally, the real flag

Now, its just a matter to adapt the script to decrypt for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if __name__ == '__main__':
	# FLARE-ON
	# key = "FLARE-ON" 
	key = b"\x4e\x4f\x2d\x45\x52\x41\x4c\x46" # reversed

	input_file = sys.argv[1]

	stage2_decrypted = open(input_file, "r").read().strip()
	size = 	285730
	decrypted_stage2 = decrypt_stage(stage2_decrypted, 2, key, size)
	
		
	with open("stage2.dec", "wb") as sfd:
		print(decrypted_stage2[:4])
		sfd.write(decrypted_stage2

And then:

The decrypted flag image is

That a very good example of this technique, I lost a lot of time just looking at the wrong place because I didn’t knew this about this technique, it’s very amazing how this works and a very clever evasion system, we see a lot anti virus solutions having trouble to analyze malicious scripts, can imagine how they will deal with a malicious code injected at the compiled code ?

I recommend the following resources:

This post is licensed under CC BY 4.0 by the author.