Building a Financial Recording Chatbot with Go and Gemini API

chatbot

Lately, I have observed that my wife is using a finance application to store our daily spending. She has been doing that for around a year. Yes, tracking daily spending is somehow important in some families... and boring because you need to open the app, write down what you have spent, set a category for it, and then store it. It's a long process, until we discovered a chatbot called http://hemat.ai/ (only available in the Indonesian language). TLDR: hemat.ai is a tool to track the financial activity within a family, organisation, or corporation that integrates with an AI to parse the user input, outputting a structured response, and sets the category for each transaction.

This sparks an idea to create my own chatbot, but not as complete as hemat.ai (the other reason is that hemat.ai is costing money each month for the subscription, jk). Starting by comparing most of the current big LLM providers like ChatGPT, Claude, DeepSeek, and Gemini. For the conclusion, I'm continuing with Gemini because it has a free tier and a pretty good flash model, and of course, with some limitations like a rate limit and others, but who is gonna spam the transaction until they get rate limited, right?

Continue with Gemini, previously, I also saw a Genai library for Gemini that was written in Go. It's a library that was officially created by Google; you can see it here. I was impressed because Go is my main programming language. After that, I concluded that I would write the chatbot in Go and use the Gemini API.

The diagram above is the concept of the chatbot that I have imagined.

  1. The user sends the expense data to Whatsapp.
  2. The backend receives the user input and sends it to Gemini.
  3. Gemini will process the expense data and produce a structured output.
  4. The backend stores the structured output in the database and responds back to the user.

It's not including the OCR to read from the receipt, just like hemat.ai does, but the concepts for producing a structured output and parsing the category from the user input are already matched. Now I have a new problem; I don't have a Whatsapp business account nor the Whatsapp API access to connect Whatsapp to my backend service. Until I found a library called whatsmeow. It supports sending messages to a specific number, receiving messages and reading them using the event handler, and many more!

I started coding it. Firstly, I successfully connected my Whatsapp account (a 2nd number) to the backend service, yay! Then, I need to send the user input to Gemini. After that, create a prompt so Gemini can understand what it needs to do. Pro tip: You can use Gemini itself (https://gemini.google.com/) to generate the prompt, and it works perfectly!

Here is my prompt sample for outputting a structured output:

I finished my prompt, and my backend is now integrated with Whatsapp and Gemini. Now I just need to store it in the database and respond to a message to the user that the expense data has been stored in the database.

And here is the result (in Indonesian)

Chatbot successfully recording

The chatbot successfully responds to me and my wife's chat, and it also stores it in the database! The problem is, how do I fetch my data from the database, how can I obtain my expenses from yesterday, or in a specific date range? How can I tell the bot to fetch the expense data by sending it in a human-readable format, for example: a summary of yesterday's expenditure, or last month's expenditure summary?

My idea is: I create a specific keyword (since the chatbot is only used by me and my wife), and if the specific keyword is received in the backend event handler, it will process the summary of my expenses by parsing the date first, and then query it through the database, and then enrich the data using Gemini once again.

And the result?

Chatbot summary result

That's it! Now I can utilise my own chatbot.