If you work in the financial services industry or have an analytical role within corporate finance you'll eventually have to discuss the benefits of VBA and whether to use it or another programming language to solve your tasks. As we approach the end of 2015 it's still clear that VBA has quite the foothold within the world of finance. And this isn't just for folks who land at a bank and suddenly inherit some portfolio analysis spreadsheet from the 90s, as just this year I hired a first-year co-op student from a finance program who entered the summer work term hoping to learn VBA from his manager (The manager had used VBA for his previous employers).
My belief is that Python is a more robust tool to use than VBA and I'll demonstrate the basis of my beliefs over a few articles. While all of the code and data won't always be shown on the blog, you will be able to see it on the Budget Processor GitHub Page.
I'll also quickly note that I'm using LibreOffice/OpenOffice's VBA equivalent called BASIC, which going forward I'll refer to as VBA, because they should be stylistically similar.
The first comparison I want to show is when loading budgets that have been submitted by department managers, and structuring that data into a better format for analysis and warehousing.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import csv
import os
output_data = [
['cost_center', 'account', 'month', 'value'],
]
cc_prefix = "CC"
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
files = []
for newfile in os.listdir(os.getcwd()):
extns = newfile.split(".")
if len(extns) == 2 and extns[-1] == "csv" and extns[0].startswith("CC"):
files.append(newfile)
print (newfile)
for afile in files:
next_cc_name = afile.split('_')[0]
cost_center = next_cc_name[next_cc_name.find(cc_prefix)+len(cc_prefix):]
with open(afile, newline='') as csvfile:
budgetreader = csv.reader(csvfile, delimiter='.', quotechar='"')
for row in budgetreader:
rowdata = (','.join(row))
row_data_list = rowdata.split(',')
for col in range(1, 13):
try:
int(row_data_list[col])
except:
continue
new_csv_row = [ cost_center,
row_data_list[0],
months[col-1],
row_data_list[col]]
output_data.append(new_csv_row)
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(output_data)
|
It's a pretty simple script that sets up the output data structure and data format configurations (which could eventually be placed in a config file). An added plus is that there are no library dependencies, this is native Python 3.
VBA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | Sub BudgetProcess
Dim Doc as Object
DIm Sheet as Object
Dim allSheets() as Object
Dim Cell as Object
Doc = ThisComponent
Sheets = Doc.Sheets()
OutputSheet = Doc.Sheets.getByName("Output")
NumSheets = Sheets.count()
Dim SheetName(NumSheets) as String
Dim SheetNameCounter as Integer
SheetNameCounter = 0
Dim ThisSheetName
Dim SheetCounter
For SheetCounter = 0 to NumSheets-1
ThisSheetName = Doc.Sheets(SheetCounter).Name
Cell = OutputSheet.getCellByPosition(7,SheetCounter)
Cell.String = ThisSheetName
If Left(ThisSheetName,6) = "Budget" Then
Cell = OutputSheet.getCellByPosition(8,SheetCounter)
Cell.String = "Include"
SheetName(SheetNameCounter) = ThisSheetName
SheetNameCounter = SheetNameCounter + 1
End If
Next SheetCounter
Dim Months(12) As String
Months(0) = "Jan"
Months(1) = "Feb"
Months(2) = "Mar"
Months(3) = "Apr"
Months(4) = "May"
Months(5) = "Jun"
Months(6) = "Jul"
Months(7) = "Aug"
Months(8) = "Sep"
Months(9) = "Oct"
Months(10) = "Nov"
Months(11) = "Dec"
'['cost_center','account','month','value']
Cell = OutputSheet.getCellByPosition(9,0)
Cell.Value = UBound(SheetName)
SheetCounter = 0
Dim ReadRow, ReadCol as Integer
Dim WriteRow, WriteCol as Integer
Dim ReadCell, WriteCell as Object
WriteRow = 0
WriteCol = 0
WriteCell = OutputSheet.getCellByPosition(WriteCol,WriteRow)
Dim BudgetSheet as Object
For SheetCounter = 0 to UBound(SheetName)-1
If SheetName(SheetCounter) <> "" Then
BudgetSheet = Doc.Sheets.getByName(SheetName(SheetCounter))
For ReadRow = 1 to 160
For ReadCol = 1 to 12
WriteRow = WriteRow + 1
ReadCell = BudgetSheet.getCellByPosition(ReadCol,ReadRow)
WriteCell = OutputSheet.getCellByPosition(0,WriteRow)
WriteCell.string = SheetName(SheetCounter)
WriteCell = OutputSheet.getCellByPosition(1,WriteRow)
WriteCell.value = BudgetSheet.getCellByPosition(0,ReadRow).value
WriteCell = OutputSheet.getCellByPosition(2,WriteRow)
WriteCell.string = Months(ReadCol-1)
WriteCell = OutputSheet.getCellByPosition(3,WriteRow)
WriteCell.value = ReadCell.value
Next ReadCol
Next ReadRow
End If
Next SheetCounter
End Sub
|
The code block for VBA is almost exactly twice as many number of lines, especially true since I'm not using multiline statements in VBA.
Why I prefer Python to VBA:
- maintenance
- Let's say something changes to your processing requirements. Since the Python code and the data itself aren't coupled to the same file, you can more easily track changes to your code over time, and rollback when necessary. In VBA if your code changes you have to carefully align all of the correct templates and then call the updated code, so your versioning would be limited to your spreadsheet files themselves.
- database integration
- Should you want to place your output into a database, and you should, you only need another few lines in Python. In VBA, we would have to add the code to save the output to CSV, complete with UTF-8 checks which should be easy in LibreOffice, and finally, either manually or through a chrontab, run a database script (or some other ETL routine) to load the data. Depending how this is done it could be quite interfering to your analytical process, should you want to run many uploads on an ad-hoc schedule.
- user repetition
- The Python script allows the receiver of the data, or (even better!) the department manager, save the file to a CSV in a specified folder. If news files are added or deleted then you just need to save a new file (more on integrity another time). Within the VBA example you'll have to move spreadsheets from file to file, which is notably more onerous on your finance user. (Note: There is almost certainly a way to get VBA to load files from the file system, but I do not know how to do it)
- platform independence
- We can use the Python files on Linux, Mac, or Windows, and across any version of Windows you'd like. This isn't true for VBA, which will show fragility even across versions of Excel, let alone different versions of Windows. This may not be the case for a simple example like this.
- style
- This is a less serious point in some ways, but the Pythonic styles are hard to overlook. Lines 14-18 show a quick and efficient way to decide what should be processed from what shouldn't, in VBA we need lines 19-30. In Python from line 25-37 we see how we transpose the horizontal to vertical data, including a quick try/except block to force a numeric constrain on the data. We need lines 60-79 in VBA to pull off the same transposition and have to be more mindful of our data types (.string / .value) which should there be any problems will (thankfully) throw an error in execution.
Depending on how you setup your processes you can use either system successfully, and I don't want to suggest otherwise. In the next few weeks I'll be work to extend the aforementioned ideas to show database integration, variance analysis, computation and visualization across both languages.