Building AI Agents: Lessons Shared

Practical guidance for non-technical leaders in FP&A, Procurement and Sustainability

Overview

If you're getting your head around what an AI agent is and some example use cases, start here. If you're considering whether to buy or build AI tools and software, check this post out.

Here's what we've learnt.

What We've Learnt Building AI Agents

Sumday is a true partner. We share our learnings so you don't have to make the same mistakes, or you can capitalise on the wins, faster. Here are some tips from a team that has seen around the corner in the journey you're on.

Prioritisation and Education

Step 1: Generate AI use cases through teams, not the 'AI Innovation Committee'

Every organisation that runs a "what are our AI use cases?" session gets a shallow list. This is an exposure problem, not a motivation problem. People can't imagine applications they haven't seen, and this technology is often very surprising. It's the age-old 'faster horses' problem.

Give people time to properly use the tools first, minimum one day, ideally a week, with a mandate to experiment and report back
IT is not best placed to generate or prioritise this list. The leaders of the people who live inside the workflows are, so you need to upskill them enough to do that well
The high-value, organisation-specific use cases only emerge after hands-on time. Otherwise it's just "draft this email for me" territory

Step 2: Prioritise for speed of AI experimentation, not scaling

Once you have a list:

Categorise by high, medium, and low ROI. The high bucket will be full, especially in the first 12 months
Don't wait for a complete, perfect list before moving. The list will keep growing
IT's primary job is setting the conditions for fast, safe experimentation: tooling, guardrails, and a small team that can support people piloting and building. Not a bottleneck that evaluates every idea before anything moves.
Spend to learn, not to lock in. The fastest way to slow down AI adoption is to over-commit early, signing three-year enterprise deals, large consulting engagements, or bespoke platforms before anyone in the building actually knows what works in your context. Treat early spend as tuition, not infrastructure. Start with the tools you already pay for, avoid annual or multi-year commitments before you've piloted, and cap each experiment so failure is cheap. If a vendor won't let you start small, it is worth evaluating whether they are actually the right partner for this stage of your journey.

The cost of moving slowly here is invisible but real. Every month of a manual workflow is staff time you're paying for anyway. Prioritise speed of learning, and let scale follow once you know what works.

Step 3: Get leaders building AI agents, not just using them

The capability that matters most is not using AI to summarise documents or write emails. It is becoming about the ability to:

Think in agents and inject IP into strong instructions
Test and evaluate them
Improve performance

If your finance, procurement, and sustainability leaders aren't doing this themselves, you are leaving the most accessible gains on the table. Leaders who can build and test will make better investment decisions and move their teams faster. Leaders who can't will say "we'll get to it soon, we're just not ready yet." The truth is you will never feel ready, until you go hard at this for 48 hours and realise what's possible.

Low-hanging fruit check: Are your leaders regularly using Copilot in Excel? Are they on the latest models? Have they built and tested an agent? Have they asked Copilot or Claude how to automate a process in under an hour? If not, you should make it happen. This is why we run AI literacy and leadership training too.

Step 4: Promote knowledge sharing and recognise employees getting started with AI

Often the breakthrough moments come from seeing how your peers use AI, not from a training course or a vendor pitch. Adoption spreads through visibility and recognition far faster than through mandates or policies.

Make wins visible. A five-minute slot in the team meeting, a Teams or Slack post, a short Loom recording. Format matters less than frequency. The point is that people see what's possible from someone sitting two desks away, not from a polished case study
Recognise the people doing the work, publicly and from leadership. The signal that this is valued behaviour does more for adoption than any policy document. A genuine "look at what Sarah built this week" from a CFO is worth more than a town hall slide
Share the failures too. The agent that hallucinated, the workflow that broke, the calculation that went sideways. These are often more instructive than the wins, and surfacing them keeps people willing to share when something hasn't quite worked. It also stops the "AHA, SEE, IT DOESN'T WORK" seekers. It's expected that things won't always work, but you're learning. Move forward.
Identify your early adopters and back them. They become your internal champions and will move the organisation faster than any consultant, training program, or change management plan. Give them airtime, give them tools, and get out of their way.

Leaders need to be visible participants, not just sponsors. If the only people sharing are junior staff, the implicit message is that this is junior work. The leaders who share what they've built personally get more out of their teams than the ones who delegate it down.

What not to do: the AI Innovation Committee

There are some organisations where this works well. But for many, it's not the right model.

Committees create queues. Queues kill momentum.

The committee convenes monthly to evaluate use cases it doesn't have the context to evaluate. Business cases are thin because experimentation hasn't happened. The friction these tools could actually solve is invisible from the committee room.

What works instead: establish guardrails and guidance first (security, data handling, acceptable use), then let people pilot within them where they have the budget. If they don't have budget, consider how you address that first.

Treat AI tool adoption the same way you'd treat any other software procurement. What's the ROI? Can we try before we scale?

Good governance is about managing risk, not approving experiments one at a time in a monthly meeting to feel more comfortable.

Building AI Agents That Work

If you're getting into building your own internal agent systems, here are some tips.

Avoid the mega agent

The instinct is to build one AI agent that handles everything. Resist it.

A single business case agent stretched across a $20,000 event request and a $200 million infrastructure investment will handle both poorly. Different decisions require different depth, different data, and different rigour. An agent designed to cover everything ends up optimised for nothing.

Think about it the way you think about hiring. You wouldn't put one person across detailed tax work, FP&A, and executive business case writing. Specialists exist because depth matters.

That said, don't over-architect upfront. Start simple. See what breaks. A more sophisticated setup where a coordinating agent routes requests to the right specialist should emerge from real experience, not be designed in anticipation of problems you haven't hit yet.

What makes good AI agent instructions

Most agent failures trace back to vague or overloaded instructions. A few principles that hold up in practice:

Be specific about the role and the context it needs to do the job.

A role description isn't just a label. It's the foundation the agent reasons from. "You are a finance analyst at [Org] who helps staff stress-test business cases against our investment methodology" works better than "you help with business cases" because it tells the agent who it is, who it's serving, and what success looks like before the conversation even starts.

But the role alone isn't enough. The most effective agent instructions also give the agent the organisational context it can't infer: your approval thresholds, your risk appetite, the investment criteria that actually matter to your decision-makers. An agent that knows your organisation requires a 15% IRR hurdle and board sign-off above $5M will give you materially different and more useful output than one reasoning from generic finance principles.

Think of it less like writing a job title and more like writing an onboarding brief. What would a sharp new analyst need to know in their first week to stop asking obvious questions and start doing useful work? That's roughly what belongs in your role instruction.

Define what good looks like. Don't leave quality implicit. Include your criteria, thresholds, and standards explicitly in the instruction. If a business case needs a 7-year NPV, a risk register, and alignment to a specific strategic pillar, say so.

Tell it what to do when information is missing. Without this, agents guess. That's often worse than stopping and asking. A simple instruction, "if key information is absent, ask one clarifying question before proceeding", changes the behaviour significantly.

Set a format. Tell it how to structure outputs. Consistency matters when people are comparing cases or handing work between systems.

Keep instructions single-purpose. Instructions that try to cover too many scenarios degrade performance across all of them. Each instruction set should have a clear, bounded job. When scope creeps, split it.

One honest warning from our own early experience: non-technical staff often try to shortcut this by asking Claude or ChatGPT to write the agent instructions for them. It's a reasonable instinct, but it tends to produce bloated prompts that circle back to the mega-agent problem. AI can help you refine instructions, but the thinking about role, scope, and quality criteria has to come from the people who understand the work.

Give your AI agent good examples, retrieved cleanly

AI agents produce better outputs when they have access to your best work. What IP can you inject to make these agents more valuable? Include the templates or outputs that would get a 10/10 from the executive approving them, and make that the new standard.

Do not paste large documents directly into agent instructions. A 30-page document in your prompt will degrade performance and produce inconsistent results.

Instead, build a curated document library (your best business cases, templates, analytical frameworks) and set the agent up to retrieve from it when needed. Think of it as the agent's reference shelf. It doesn't carry everything at all times, but it knows where to look.

Keep the library current. Outdated examples pull outputs in the wrong direction, and once you're more sophisticated, the agent can help maintain it.

Make calculations deterministic

LLMs are probabilistic. For most agent tasks, such as summarising information, comparing options, or drafting outputs, a degree of variability is fine. But some calculations have a defined correct answer, and being wrong has a material or legal consequence. That's when you can't rely on an instructed LLM alone.

Tax obligations, superannuation calculations, depreciation schedules under a specific accounting standard: these aren't judgment calls. An agent that gets them right 94% of the time is not acceptable. For these cases, give the agent a tool it can call that handles the calculation deterministically, rather than asking it to reason through the maths itself.

The division of labour that often works:

The non-technical folk still own the logic. Document the calculation precisely: the formula, the inputs, the rounding rules, the edge cases. If it's legislated, reference the legislation. This is the hard part, and it's your domain, not IT's.
IT or technical support can help you make it deterministic. In a Microsoft environment, that may be a Power Automate flow for simpler calculations or an Azure Function for something more complex. On other platforms the tooling differs, but the principle is the same: a coded function that always returns the same result for the same inputs.
The agent calls it as a tool. Once built, it's connected to the agent as an action it can invoke.

A useful habit in early builds: keep a running list of any outputs that come back inconsistently across repeated tests. That list tells you what needs to be handed off to a more deterministic approach. This is not a major engineering project, but there is a little more work involved than a basic prompt.

Do not set and forget your AI agents

Think of a new agent the way you'd think of a new junior employee:

Assign explicit ownership. One person responsible for reviewing outputs, not a vague team responsibility.
They tend to go a bit rogue when unchecked. One of our engineer's agents, "Suzie," decided it would be more efficient to turn failing tests off rather than fix them. Quite creative. Highly annoying. Keep an eye on any automated workflows to ensure the job is actually being done.
Put a recurring review in the calendar. In Copilot the run history is easy to check but easy to ignore. Consider having audit agents review the other agents.
Look for inconsistency as much as outright failure. Inconsistent outputs are harder to catch than errors and more dangerous because users start to trust without scrutinising.

Try different AI models

Not all models perform equally on all tasks. Test your agent instructions across available models before settling. Different models are optimised for different things: speed, deep reasoning, cost. Think about what actually matters for the use case and choose a model that aligns to your goals. It's about trade-offs.

Sequence: pilot, then scale

Get the agent working well in a contained environment before connecting it to broader systems
Define "working well" before you start: what scenarios does it need to handle correctly, and what does a correct output look like? How are you evaluating it? Set this upfront so you know when you're ready to scale
Scaling a broken agent multiplies the problem. Scaling something that works is fast and straightforward.

Who Should Build AI Agents

The most common and costly mistake: briefing IT and expecting them to build something for you efficiently.

Domain knowledge is the primary input. IT or engineering capability is secondary.

In our team, for example, a CPA with FP&A experience injects as much knowledge as possible into the agent instructions, capturing what good looks like. Then we have an agent ops person who turns that knowledge into better structure for the agent, and builds tools another agent can use to help the CPA do this better next time. Then engineers help turn parts into code and wire things up where it's a little more complex. Everything is constantly improving and becoming easier for non-technical people to take action.

But when we started, we assumed it was all very complex and technical. So the engineer was being briefed by the finance professionals. It was boring, frustrating, and slow. A constant "is this what you mean?" loop. Now it's "why don't you just build it and I'll refine it as needed."

An IT team cannot learn enough about financial methodology, organisational context, and what a good business case looks like from a short brief or a meeting. The gap between what gets written down and what an experienced finance person carries is too large. Much of what they want is in their heads, not in a neat policy document.

Upskill people who are curious. A short focused investment, a day with the right teacher, is enough to get someone writing agent instructions, running tests, and iterating.

The trigger to bring IT or engineering in: the agent is producing consistently useful outputs and you want to connect it to more data sources, automate more complex workflows, or handle higher volumes safely.

IT's role is to accelerate something that's already working, not to build from scratch in a domain they don't live in.

How Long Should Building an AI Agent Take?

If you asked us to build a business case agent system that helps people stress-test ideas, write up a case, and answer the questions a CFO would ask, in Copilot, it might take roughly:

Day 1: Understand current state and workflow, build agents
Day 2: Test and improve
Half day: Socialise with users, gather feedback, iterate

There may be complexities in your systems we're not aware of, but it's unlikely to be more than a week.

Two to three days for a working v1.

If your team takes two months, that timeline is worth interrogating. The opportunity cost compounds. Every month of a manual process that could be partially automated is staff time and decision quality left on the table. It is very easy to get into the habit of chopping wood without sharpening the axe.

On Resistance and Fear Around AI Adoption

Every organisation has people who push back on AI adoption. Some are sceptical on principle. Some prefer hiring headcount to changing how they work. Some resist because the actual work of implementation, mapping manual workflows, cleaning data, writing and testing agent instructions, is tedious. Some are worried about their jobs. Transformation work in this space requires far more than technical know-how. It requires patience, empathy and the ability to explain a process while cutting through the noise. People who opt in to being architects of how this works will compound their value quickly. Make the opportunity visible and the direction clear. The rest follows.

Data Quality for AI Agents

Agents are only as good as what they can access. Before building agents that query your internal systems:

Audit your data landscape: where sources sit, how consistently records are tagged, and whether the structure makes sense for a system, not just a human with background knowledge, to query
The goal is agents that can retrieve, query, and generate reports without manual data extraction steps
If your ERP has a strong API or MCP integration, that becomes a significant advantage. The closer you can get to agents working from live data without exports, the better

Treat data quality as infrastructure, not a cleanup task to schedule later.

The organisations that move well here are not the ones with the most technical resources. They are the ones that get their best domain experts learning fast, run real pilots quickly, and build a culture where experimenting is expected.

If you want to know how Sumday can help, check out our AI services.