I’m into budgeting and tracking expenses. I think a lot about personal finance, have written my thoughts earlier too. But the thing is that all tracking operates on data. And getting data out of the silo in a structured format is pretty hard.
Account aggregator is supposed to be solving for this. But each vendor in turn again locks data inside their silo and make their app more compelling. Until and unless we build plug and play pipelines we can never really do well in terms of developing personal finance management apps which make sense I think.
I’ve recently started using Actual Budget for budgeting for me and my fiancee. Its a very nifty software which can be self hosted. Now the challenge is to get data from the bank to Actual.
Figuring out how to get data into Actual
Here is the idea. HDFC thankfully gives a CSV export option of their bank statement. If I can figure out how to get this data into actual using a library like ActualPy, then everything is sorted.
Note that I’m using HDFC account, based on your bank these details can change (or not!)
Parsing the CSV file
So step one is CSV parsing. Some catches during the process.
- Some lines in the CSV file have more than the required 7 columns. usually because the narration has a comma in it.
- Some times, the ref number / UTR field will have zero. A lot of software relies on a unqiue identifier. In this case, its better to combine narration, date and hope it stays unique. There is technically a possibility it won’t though.
There is one catch, Actual works heavily on payee information.
Parsing Payee Information from the narration
What does this mean? Here is a sample narration from the bank statement
UPI-ZOMATO LTD-ZOMATO-ORDER@PAYTM-PYTM0123456-401537805904-ZOMATO PAYMENT
This seems gibberish, but this is loaded with information. Delimited by `-`, the first segment is the mode of the payment. the second is the payee name, third is the UPI ID, fourth is the IFSC, fifth is something I did not really get. I’m guessing the last is the note sent to the UPI network.
There seems to be a catch. Sometimes to fit in information, the payee name is truncated. Not really sure in what exact circumstances the bank takes a call to truncate the payee name. Need to look at more data and figure that out. Similarly, NEFT & RTGS also follow a similar structure. So, parsing the payee information is actually fairly convenient.
With this, more than 80% of the transactions would be covered. But then there are other important ones
ACH narrations, ACH stands for automated clearing house. If you happen to have a recurring eNACH, or a mutual fund has to post returns back to your account you’ll incur a ACH line item in your bank statement. ACH narrations are interesting because I’ve seen multiple variations in my bank statement itself. Usually it is ACH
followed by a C if its a credit or D if its a debit. then after a hyphen, there is text explaining the transaction followed by an ID of sorts (I guess?). Need to look into this more
Here are some examples
ACH C- SYMPHONY LIMITED FV-SYM0INT02024W ACH C- ITC LIMITED-2643579 ACH D- INDIAN CLEARING CORP-H5GGPYWT2R6R ACH C- IRFC LIMITED-TAX FRE-2418947 .ACH DEBIT RETURN CHARGES 230124 230124-MIR2403472992640 ACH C- LTIMIN INT 2024 25-354851 |
Currently I’m not setting a payee for these transactions in my personal setup.
Finally, The Actual Budget Importer
This is a streamlit app currently hosted in my scripts repo. It takes in csv bank statement of HDFC & posts data to your actual server.
A simple, no BS application which takes in a csv file, the account you are targeting in the server and just posts transactions.
It takes care of de duplication, payee parsing and other quirks regarding commas in narration etc..
With this, data in actual is finally making sense. Now budgeting on top is going to be simple finally!