Python vs VBA/Basic, Part I

If you work in the financial services industry or have an analytical role within corporate finance you'll eventually have to discuss the benefits of VBA and whether to use it or another programming language to solve your tasks. As we approach the end of 2015 it's still clear that VBA has quite the foothold within the world of finance. And this isn't just for folks who land at a bank and suddenly inherit some portfolio analysis spreadsheet from the 90s, as just this year I hired a first-year co-op student from a finance program who entered the summer work term hoping to learn VBA from his manager (The manager had used VBA for his previous employers).

My belief is that Python is a more robust tool to use than VBA and I'll demonstrate the basis of my beliefs over a few articles. While all of the code and data won't always be shown on the blog, you will be able to see it on the Budget Processor GitHub Page.

I'll also quickly note that I'm using LibreOffice/OpenOffice's VBA equivalent called BASIC, which going forward I'll refer to as VBA, because they should be stylistically similar.

The first comparison I want to show is when loading budgets that have been submitted by department managers, and structuring that data into a better format for analysis and warehousing.

Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import csv
import os

output_data = [
        ['cost_center', 'account', 'month', 'value'],
    ]

cc_prefix = "CC"
months = [  'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
            'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

files = []

for newfile in os.listdir(os.getcwd()):
    extns = newfile.split(".")
    if len(extns) == 2 and extns[-1] == "csv" and extns[0].startswith("CC"):
        files.append(newfile)
        print (newfile)

for afile in files:
    next_cc_name = afile.split('_')[0]
    cost_center = next_cc_name[next_cc_name.find(cc_prefix)+len(cc_prefix):]
    with open(afile, newline='') as csvfile:
        budgetreader = csv.reader(csvfile, delimiter='.', quotechar='"')
        for row in budgetreader:
            rowdata = (','.join(row))
            row_data_list = rowdata.split(',')
            for col in range(1, 13):
                try:
                    int(row_data_list[col])
                except:
                    continue
                new_csv_row = [ cost_center,
                                row_data_list[0],
                                months[col-1],
                                row_data_list[col]]
                output_data.append(new_csv_row)

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(output_data)

It's a pretty simple script that sets up the output data structure and data format configurations (which could eventually be placed in a config file). An added plus is that there are no library dependencies, this is native Python 3.

VBA

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Sub BudgetProcess

    Dim Doc as Object
    DIm Sheet as Object
    Dim allSheets() as Object
    Dim Cell as Object

    Doc = ThisComponent
    Sheets = Doc.Sheets()
    OutputSheet = Doc.Sheets.getByName("Output")

    NumSheets = Sheets.count()
    Dim SheetName(NumSheets) as String
    Dim SheetNameCounter as Integer
    SheetNameCounter = 0

    Dim ThisSheetName
    Dim SheetCounter
    For SheetCounter = 0 to NumSheets-1
        ThisSheetName = Doc.Sheets(SheetCounter).Name
        Cell = OutputSheet.getCellByPosition(7,SheetCounter)
        Cell.String = ThisSheetName
        If Left(ThisSheetName,6) = "Budget" Then
            Cell = OutputSheet.getCellByPosition(8,SheetCounter)
            Cell.String = "Include"
            SheetName(SheetNameCounter) = ThisSheetName
            SheetNameCounter = SheetNameCounter + 1
        End If
        Next SheetCounter

    Dim Months(12) As String
    Months(0) = "Jan"
    Months(1) = "Feb"
    Months(2) = "Mar"
    Months(3) = "Apr"
    Months(4) = "May"
    Months(5) = "Jun"
    Months(6) = "Jul"
    Months(7) = "Aug"
    Months(8) = "Sep"
    Months(9) = "Oct"
    Months(10) = "Nov"
    Months(11) = "Dec"

    '['cost_center','account','month','value']

    Cell = OutputSheet.getCellByPosition(9,0)
    Cell.Value = UBound(SheetName)
    SheetCounter = 0
    Dim ReadRow, ReadCol as Integer
    Dim WriteRow, WriteCol as Integer
    Dim ReadCell, WriteCell as Object
    WriteRow = 0
    WriteCol = 0

    WriteCell = OutputSheet.getCellByPosition(WriteCol,WriteRow)

    Dim BudgetSheet as Object
    For SheetCounter = 0 to UBound(SheetName)-1
        If SheetName(SheetCounter) <> "" Then
            BudgetSheet = Doc.Sheets.getByName(SheetName(SheetCounter))
            For ReadRow = 1 to 160
                For ReadCol = 1 to 12
                    WriteRow = WriteRow + 1
                    ReadCell = BudgetSheet.getCellByPosition(ReadCol,ReadRow)
                    WriteCell = OutputSheet.getCellByPosition(0,WriteRow)
                    WriteCell.string = SheetName(SheetCounter)
                    WriteCell = OutputSheet.getCellByPosition(1,WriteRow)
                    WriteCell.value = BudgetSheet.getCellByPosition(0,ReadRow).value
                    WriteCell = OutputSheet.getCellByPosition(2,WriteRow)
                    WriteCell.string = Months(ReadCol-1)
                    WriteCell = OutputSheet.getCellByPosition(3,WriteRow)
                    WriteCell.value = ReadCell.value
                    Next ReadCol
                Next ReadRow
        End If
        Next SheetCounter

End Sub

The code block for VBA is almost exactly twice as many number of lines, especially true since I'm not using multiline statements in VBA.

Why I prefer Python to VBA:

maintenance
Let's say something changes to your processing requirements. Since the Python code and the data itself aren't coupled to the same file, you can more easily track changes to your code over time, and rollback when necessary. In VBA if your code changes you have to carefully align all of the correct templates and then call the updated code, so your versioning would be limited to your spreadsheet files themselves.
database integration
Should you want to place your output into a database, and you should, you only need another few lines in Python. In VBA, we would have to add the code to save the output to CSV, complete with UTF-8 checks which should be easy in LibreOffice, and finally, either manually or through a chrontab, run a database script (or some other ETL routine) to load the data. Depending how this is done it could be quite interfering to your analytical process, should you want to run many uploads on an ad-hoc schedule.
user repetition
The Python script allows the receiver of the data, or (even better!) the department manager, save the file to a CSV in a specified folder. If news files are added or deleted then you just need to save a new file (more on integrity another time). Within the VBA example you'll have to move spreadsheets from file to file, which is notably more onerous on your finance user. (Note: There is almost certainly a way to get VBA to load files from the file system, but I do not know how to do it)
platform independence
We can use the Python files on Linux, Mac, or Windows, and across any version of Windows you'd like. This isn't true for VBA, which will show fragility even across versions of Excel, let alone different versions of Windows. This may not be the case for a simple example like this.
style
This is a less serious point in some ways, but the Pythonic styles are hard to overlook. Lines 14-18 show a quick and efficient way to decide what should be processed from what shouldn't, in VBA we need lines 19-30. In Python from line 25-37 we see how we transpose the horizontal to vertical data, including a quick try/except block to force a numeric constrain on the data. We need lines 60-79 in VBA to pull off the same transposition and have to be more mindful of our data types (.string / .value) which should there be any problems will (thankfully) throw an error in execution.

Depending on how you setup your processes you can use either system successfully, and I don't want to suggest otherwise. In the next few weeks I'll be work to extend the aforementioned ideas to show database integration, variance analysis, computation and visualization across both languages.

social