Aakash’s Blog

From on-prem to the cloud - Lessons Learned

2026-05-03T00:00:00+00:00

Moving from on-prem to cloud changed the tooling, speed, and scaling model, but not the core laws of systems design. Capacity, latency, state, failure handling, and topology still matter just as much. Microservices can help, but only when they solve a real scaling or operational problem instead of just moving complexity around.

A recent conversation with Raymond Oyondi on Peerlist made me rack my memories a bit and reflect on how much software and infrastructure have changed over the years.

I joined the industry back when cloud still felt more like a concept than a default. A lot of systems were still being built and maintained in environments where the infrastructure was very much in your hands. You knew the machines, the network, the limits, the weak points. If something needed scaling, it was not a button click and a dashboard graph. It meant spinning up another server, configuring it, wiring it into the network and load balancer, deploying the application, syncing state, setting up monitoring, and making sure the whole thing did not fall apart under pressure. We had automation in places, of course, but nowhere near the kind of convenience people now take for granted.

A lot has changed since then. But the funny thing is, the biggest lesson for me is that the old principles never really went away.

Cloud changed the speed. It changed the abstractions. It changed how easily we can provision, scale and recover. But it did not change the laws underneath. Capacity still matters. Latency still matters. State still causes pain. Network boundaries still introduce failure. Bad assumptions still come back to collect interest.

That is probably the biggest thing I learned moving from on-prem and bare-metal thinking into cloud-native systems: the tooling changed more than the fundamentals did.

Earlier, a lot of software lived as one big application. One service, one deployment unit, one giant block with hard coupling inside it. It was not always pretty, but it was straightforward in one sense: most of the complexity lived inside the application itself. Since the infrastructure was under our control, nobody really panicked about it. You managed the box, tuned the app, scaled when needed, and kept things moving.

Then cloud became normal, and with it came speed, flexibility, and a different cost model. Suddenly, scaling was easier. You no longer had to treat infrastructure changes like a mini project every single time. But that convenience also exposed something important: a lot of monoliths were expensive in ways people had not fully noticed before.

You would see an application chewing through resources and the default response would be to scale the whole thing. More compute, more memory, more replicas, more money. But when you looked closer, often only certain parts of the application were actually responsible for that load. Maybe one workflow was CPU-heavy. Maybe one module was doing aggressive I/O. Maybe one part had bursty traffic while the rest of the system just sat there minding its own business.

That is where the architectural shift really starts to make sense.

Instead of treating the software like one sealed black box, you begin to see it as a collection of components with different scaling patterns and different operational needs. So you start isolating them. You break out the hot paths. You separate the parts that need to scale from the parts that do not. Pretty soon, what used to be a monolith starts becoming a patchwork of smaller services talking to each other.

And yes, that can absolutely be the right move.

But I also think this is where a lot of people get seduced by architecture diagrams and forget the bill that comes later.

Microservices are not free. They reduce one kind of pain and introduce another. You gain independent scaling, but you also gain more network hops, more deployment surfaces, more observability needs, more operational coordination, more failure modes, and more opportunities for state to become inconsistent. The complexity does not disappear. It just moves.

Earlier, if two parts of the system needed to coordinate, that problem often lived inside one process boundary. Now it may live across services, shared storage, queues, caches, retries, and eventual consistency rules. You may need supporting software to make the architecture work. You may need shared storage. You may need to handle read-write races and stale data. You may need to think much harder about idempotency, ordering, duplicate events, and what “correct” even means in a distributed system.

So for me, the lesson was never “microservices good, monolith bad.” That is too simplistic and honestly a bit lazy.

The real lesson was this: design around the behavior of the system, not around fashionable architecture labels.

If one deployable unit works, keep it one deployable unit. If certain modules clearly have different scaling needs, isolate them. If you are introducing distributed complexity, make sure the benefits are worth the operational cost. Use the minimum supporting software necessary. Every extra moving part is one more thing to monitor, patch, debug, secure and explain at 2 AM.

Another lesson that became much more obvious in the cloud-native world is that deployment topology matters a lot more than many developers initially think. Two services talking to each other on a diagram is easy. The actual topology, where they run, how they communicate, what latency sits between them, how failover behaves, where state lives, and what happens during partial failure, is where reality begins.

I have also come to appreciate observability discipline much more over time. In distributed systems, tracing tools are great and OpenTelemetry has helped a lot, but tooling alone does not save you. If your logs are inconsistent, your labels are exploding in cardinality, your trace attributes are a mess, and every team names the same thing differently, you are not observing a system. You are generating noise. Good observability needs discipline: standard log formats, sensible naming conventions, rules for metrics and labels, and a sampling strategy that matches the criticality of the application. Otherwise, you either drown in telemetry or pay too much to keep it.

So when I think about high availability at scale, my biggest lesson learned is actually a simple one.

Break systems into modules where it genuinely helps. Keep supporting software to a minimum. Be aware of deployment topology. Respect state. And never assume cloud removed the need for sound systems thinking. It did not. It just made it easier to build distributed systems before earning the scars required to run them well.

Cloud is powerful. But it is still someone else’s computer. And the old bare-metal lessons still hold stronger than people think.

From a builder to a founder

2026-04-30T00:00:00+00:00

TL;DR: Generic, I know, but it’s the truth. I spent 10 years in TCS, moved into SaaS with my friend Rupam, and learned that building is only one small part of founder life. While helping grow our products, I felt the pain of messy browser research and built TabMate to solve it. Shipping it felt great. Now comes the harder part: marketing, distribution, doubt, and the daily fight to keep going.

Aloha, back to blogging after a while! So much has happened since!

I quit my job at TCS after serving 10 years to take a foray into the Founder world. I learned a lot during my time there (I was lucky enough to be in projects that had a mix of dev, devops, infra, platform, security and AI) and matured as a developer. The only drawbacks were - constraints that would not let me build solutions for common pains we faced unless the client approved and, not being paid as much for the work I did. Well, in hindsight, it’s just how service based companies work, and I am not complaining. I wanted to have the freedom to build what I want and be able to say truthfully that I get paid for doing what I love.

I started my founder journey under an old college friend, Rupam. Since college days, he always had a founder mindset. I remember he had built a social networking website for our college - all with plain old HTML, CSS, PHP and SQLite. He is one of those old-school programmers whose learned the trade through sweat and toil. He, too, was in TCS though, he quit 6 years before me to start his own ventures.

When I joined him, he already had two profitable products in the market and one in the pipeline. I am grateful he agreed to mentor and guide me, given how differently we saw things when it came to managing the software lifecycle! See, I was from a controlled, constrained world where the user base was guaranteed hence, I optimized for longevity with failsafes, fallbacks, redundancy and best practices. He, on the other hand, optimized for velocity, stable MVPs, user-building and feedback driven features - and it makes sense because in the SaaS world, it doesn’t matter how polished or robust your solution is if no one uses it. I joined him with the project, Smart Bank Statement where, I learned the basics of the Founder life - and the hard truth that code is only a small part of it!

With Smart Bank Statement now stable and slowly gaining users, we turned our focus to improving what we have and, if possible, start on a new project. He started addressing feature requests from one of the products while I started looking into the other. During this time, my work involved browsing way more than I used to as I was actively learning the part of the software lifecycle that most devs don’t get to experience - selling it. The workflow was the same most days - google search, open up tabs, open a ChatGPT session, go here, copy this, paste that, fight ChatGPT over forgetting stuff, re-find the same info again - it got annoying, monotnonous and boring pretty soon. I wanted a ChatGPT in my browser which had context across all my tabs and also, remember where we were when I came back the next day. I basically wanted a retrieval agent IN THE BROWSER!

Well, we developers love a challenge. I set out to build it (hence the delay in this post). It initially started out as a side panel that could just see the text you selected, save memories which you choose to save and used a heuristic retrieval system. With time, however, it evolved into an assistant that could remember what you saved across your sessions and bring them up when necessary! Whenever I found something useful, I would just pin it or save it as a memory. Later, when I was prepping and needed reference, I would just ask it and that piece of info saved god knows how many tabs and hours ago just shows up! I named it TabMate out of love (and because I couldn’t think of anything else).

I pitched this idea to Rupam. My goal was to contribute to the pot. If I daresay, to me, our partnership sounds like the faint whispers an institute of products in the making. He has already contributed three products, it’s only fair for me to pitch in and pull my weight. He was skeptical and cross-questioned the idea, just like a rigorous co-founder should. One thing he said really stuck with me - “We, devs, build tools for ourselves then, we think everyone will find it useful. But most often, that is not the case.” I realized that he was right - the tool started out as a dev’s assistant but, dev workflows are pretty niche and varies from dev to dev. I couldn’t pitch a generic dev assistant at the browser level, I had to find the proper group of people whose work involves scouring across websites and living in “tabland”.

He let me take time to think and build it through and, after a lot of brainstorming and researching across the internet (lol), I finally managed to build a stable MVP. Yeah, it took longer than usual as I had to iterate and tune the retrieval loops and user flows. I was finally able to release it publicly on 27th April 2026. Now begins the hard game. The dopamine rush of building and shipping is over.

As I sit here now, putting into action the marketing strat I have for TabMate (trust me, it adapts every day), so many thoughts are playing across my mind. Did I do the right? When will conversion actually start? What else can I do to improve its reach and distribution? How do I tune my strategy? Do I give ads? Do I give it all up and go back to working for someone else? I read a quote on IndieHackers which read - “I built and I failed and I kept building.” Now, that guy sits at over $30K/month revenue. While stories and quotes are motivating, the human mind is a prison when left alone to think about and contemplate all possibilities. Mostly, it tends to converge on the negatives. There are times I think that it’s best to keep TabMate for myself and concentrate on the products we already have. I guess I am still human. Yet, the thought of having real users for a system I built with my own hands is really seducing and I keep doing what is necessary - ethical and fair but, necessary.

It’s the early days and yes, there are a lot of uncertainties. TabMate might live for a while then be integrated into browsers, get shelved and stay as my personal tool or, truly live its potential. I mean, this context switching pain is something everyone must be feeling, I just have to get to the right kind of people. I might pivot and build other ideas - let’s see how it goes. For now, the dopamine rush of building has settled and the dread of marketing has set in. I have to find ways to get a dopamine rush out of this phase too. Maybe take some programmatic help? Hmmm, let’s see.

How I Got Into Building Smart Bank Statement

2026-03-29T00:00:00+00:00

TL;DR: This was not my idea originally. Rupert had already started exploring the space when I got involved. Once I joined, we looked harder at the market, the actual workflow pain and (aakashh242.github.io). What started as a broader finance direction became a much narrower product: take messy bank statement PDFs and turn them into structured, usable data. From there, we worked together to get the MVP out.

How I came into it

An old college friend and roommate of mine, Rupert, had already started thinking in this space before I joined. So this was not one of those stories where two people sit down on day one with a blank page and magically arrive at the final product. The motion had already started. I entered after that, and once I did, my role became less about “coming up with the idea” and more about pressure-testing it, sharpening it and helping move it toward something that could become a real product.

At the time, the idea-space was wider. Like many things around finance, it is very easy to drift toward the flashy layer first: dashboards, summaries, personal finance views, spending insights, charts and all the things that look good in a demo. On paper, that feels like the obvious direction. People do want visibility into their money, after all.

The problem is that this part of the market is crowded and, more importantly, the pain is softer. There is a difference between a problem people find interesting and a problem they are willing to pay to make disappear. The more we looked at it, the more it felt like the “analyzer” route sat closer to the first category. Useful, maybe. Attractive, sure. But harder to build a serious business around unless there is a very strong edge.

Where the idea started to tighten

So we kept looking.

The more practical side of the workflow started standing out. Not the part where somebody wants prettier insights. The part where somebody already has the data locked inside a bank statement PDF and needs it in a usable format for real work.

That was more interesting.

Because on the surface, converting bank statements to Excel sounds solved. It sounds like one of those dull utility problems the internet has already handled ten times over. But once you look at the actual inputs people deal with, the ugliness shows up quickly. Scanned statements, inconsistent layouts, broken table structure, different debit-credit conventions, weird balance columns, low-quality images, multi-page files, sometimes even multiple accounts in the same document. Suddenly this “simple conversion” problem stops being simple.

And that is before the downstream pain even begins.

Getting rows out of a PDF is not the same as getting reliable data. That distinction matters a lot more in accounting and bookkeeping workflows than it does in casual consumer use-cases. If the extraction is only mostly correct, somebody still has to sit there and verify the output line by line. One shifted row, one wrong amount, one broken balance trail and the time savings start collapsing. In these workflows, “almost correct” is not a nice middle ground. It is often just another form of manual work.

That was the point where the product started becoming more real to me.

What changed once I joined

Once we teamed up, the conversation changed from “what can we build in finance?” to “what painful workflow exists here that people actually need solved?” That is a much better question, because it forces you to stop thinking in vague product language and start looking at where time is genuinely being lost.

We looked at existing players too. There were already tools in the market, obviously. Some looked dated. Some were too broad or enterprise-heavy. Some could handle easy statements but would struggle as soon as the documents became messy. Some could extract data, but still left enough cleanup and checking on the user that the problem was not really solved.

That gap mattered.

To me, the opportunity was never “nobody is doing this.” That is usually the wrong lens anyway. The real opportunity was that the problem was still painful enough, despite existing tools, that there was room for a more focused and more accurate product.

So the idea got narrower.

Not a personal finance dashboard. Not a generic document AI platform. Not a bloated accounting suite. Just a focused workflow: upload a bank statement PDF and get back structured output that is usable enough to save real time.

That kind of narrowing is easy to say and much harder to do.

The build and the MVP

Once the direction became sharper, the implementation questions also became sharper. You stop thinking only in terms of OCR and start thinking about statement variance, normalization, row structure, balances, reconciliation, scanned versus digital PDFs, error detection and the difference between extraction that merely looks plausible and extraction that can actually be trusted.

That distinction shaped how we approached the product.

The goal could not just be “convert PDF to Excel.” There are too many ways to technically do that while still dumping the messy part back onto the user. The output had to be clean enough that it reduced work, not just moved work to a different stage.

That is what we built the MVP around.

Rupert had the initial seed. I joined once things were already underway. From there, together, we took it through the more difficult but more valuable phase: questioning the original direction, tightening the scope, understanding the market better and actually getting a usable MVP built instead of staying stuck in idea-land.

Why I like this story more than the polished version

A lot of startup stories get rewritten after the fact to sound cleaner than they were. Two founders see a giant market, spot a perfect gap, align instantly and start executing with full clarity. Real life is usually more uneven than that.

This one certainly was.

The idea was already in motion before I came in. The initial space was broader than where we ended up. The clearer version of the product only emerged after spending more time with the pain, the market and the workflow details.

But honestly, I prefer that kind of story.

It feels more real. Better products often come out of that process: not from trying to sound ambitious from the beginning, but from being honest enough to keep narrowing until the pain becomes sharp and the value becomes obvious.

That is how I got into building Smart Bank Statement.

Not by inventing the idea from zero, but by joining an old friend, helping pressure-test it and then building with Rupert toward something much more grounded than where it began.

Dev Blog - Proving the Remote MCP Adapter’s Security Guardrails part 1

2026-03-16T00:00:00+00:00

TL;DR: In the last post, I said v0.3.0 would harden the Remote MCP Adapter against poisoned tool metadata and weak session semantics. This post is the proof. I built a mutable mock MCP server, ran live adapter instances against it, and captured evidence for four security controls: tool-definition pinning, metadata sanitization, description minimization, and session-integrity binding.

In the last post, I wrote about four security issues I wanted to tackle in the Remote MCP Adapter.

That post was about design and intent.

This one is about evidence.

I did not want to stop at unit tests and say “trust me, it works.” So I built a local test harness around a mutable FastMCP server and used it to exercise the adapter end to end. The result is a set of reproducible evidence artifacts showing what the adapter actually does when tool metadata changes, when descriptions are too verbose, and when a session is reused under the wrong authenticated context.

What I tested

I ran four live security scenarios:

Tool-definition pinning and drift detection
Tool metadata sanitization
Tool description truncation and stripping
Session integrity hardening for stateful flows

Each scenario produced:

the exact adapter config used for the run
raw upstream tool snapshots
raw adapter tool snapshots
error payloads
process logs
SQLite state snapshots where relevant
a per-scenario summary

So this was not a mocked “assert function returned true” setup. It was a real adapter process talking to a real mutable MCP server.

The test setup

The setup was simple on purpose:

a mutable FastMCP upstream server running locally
one or more local adapter instances with scenario-specific config
a runner script that switched upstream revisions, called the adapter, and saved the results

The upstream could change its catalog on demand. That made it possible to test:

a benign first tool catalog
a changed description later in the same session
dirty metadata
very long descriptions
reused sessions after an auth-context change

This was exactly the kind of thing I wanted to prove before claiming the adapter had become a safer boundary.

1. Tool-definition pinning

This was the most important one.

The adapter was configured to:

pin the first visible tool catalog for a session
block mid-session drift
invalidate the session when drift is detected

What happened

The first tools/list call established the session baseline.

Then I changed the upstream tool description.

The next tools/list in the same session failed with a drift message. The adapter invalidated that session and refused to keep using it. Reusing the same session again resulted in 409 Conflict. Starting a fresh session succeeded and picked up the upgraded catalog.

What that proves

This closes the “rug pull” path where an upstream server can look safe during the initial review and then quietly mutate the tool surface after trust has already been established.

The trust boundary becomes:

first catalog exposure pins trust for that session
mid-session tool drift is not silently accepted
a new session is required to accept upstream changes

That is exactly the behavior I wanted.

2. Tool metadata sanitization

The second scenario targeted dirty model-visible metadata.

The mock upstream exposed tool metadata with:

decomposed Unicode
zero-width characters
dirty schema descriptions

I ran the adapter twice:

once with sanitization enabled
once with sanitization set to block

What happened

With sanitization enabled, the adapter cleaned the visible metadata before forwarding it.

The differences were visible in the captured tool snapshots:

dirty title -> normalized title
dirty description -> normalized description
dirty schema property description -> normalized schema property description

With block mode enabled, the dirty tool disappeared from the adapter-visible catalog entirely.

What that proves

The adapter no longer has to behave like a naive tunnel for model-visible metadata.

It can:

clean suspicious text conservatively
or refuse to forward a tool whose metadata had to be changed

That gives operators a real first layer of defense against poisoned tool metadata.

3. Tool description truncation and stripping

The third scenario was about description surface minimization.

This is different from metadata sanitization.

Sanitization cleans obviously suspicious text. Description policy answers a different question:

How much tool prose should the model see at all?

The mock upstream exposed very long tool descriptions and very long nested schema descriptions.

I ran the adapter in two modes:

truncate
strip

What happened

In truncate mode:

the top-level tool description was shortened to the configured limit
the nested schema description was also shortened

In strip mode:

the top-level tool description was removed
the nested schema description was removed too

What that proves

This is not just a UI tweak on the top-level tool description.

The policy applies to the model-visible description surface more broadly, including nested schema prose. That matters because otherwise an upstream could just move the same persuasive or poisoned text from the tool description into schema field descriptions.

So this control now works the way it should.

4. Session integrity hardening

The fourth scenario focused on session integrity.

This one matters because the adapter is stateful. It stores uploads, artifacts, tombstones, and other per-session state. That means session handling is part of the product’s security posture, not just a transport detail.

For this test I:

enabled adapter auth
used disk-backed state persistence
established one session with token A
restarted the adapter against the same persisted state
tried to reuse the old session with token B

What happened

The old session had already been bound to the first authenticated context.

When I tried to reuse it under the rotated token, the adapter rejected it with 409 Conflict.

Then I started a fresh session under token B, and that worked.

The persisted SQLite state showed separate trust-context fingerprints for the old and new sessions.

What that proves

Knowing or reusing an Mcp-Session-Id is not enough on its own.

When auth is enabled, the adapter now binds the session to the authenticated context that created it. A stale session cannot just be picked up under a different auth context and treated as valid.

That is the right direction for a stateful gateway.

What the evidence bundle contains

I saved the full evidence bundle and will attach it as a ZIP artifact.

Evidence bundle: Download the full ZIP artifact

It includes:

a top-level report per scenario
a machine-readable summary
per-scenario configs
per-scenario logs
raw upstream and adapter snapshots
persisted SQLite state snapshots

So anyone interested can inspect the actual evidence instead of relying on screenshots or paraphrases.

One honest note

There is one small wrinkle in the captured client behavior.

When a blocked session is retried, the FastMCP client sometimes surfaces the failure as a generic 409 Conflict instead of preserving the full response body each time.

That does not weaken the result, because the evidence still shows:

the initial detailed block message
the repeated 409 response
the persisted invalidation or trust-binding state
the fresh-session success path

Still, it is worth calling out plainly.

Why this release matters

The Remote MCP Adapter started as a way to make remote MCP servers more practical by handling uploads, artifacts, and stateful mediation.

That is still true.

But once a gateway starts mediating tool metadata and storing session state, it can no longer pretend it is just a dumb transport wrapper. It is part of the security boundary whether it wants to be or not.

That is what v0.3.0 is really about.

Not security theater. Not vague “hardened mode” marketing.

Actual controls. Actual live tests. Actual evidence.

What comes next

I am not done with the security work yet.

But this release crosses an important line: the adapter is now starting to defend the boundary it creates, instead of just expanding it.

That was the goal.

Dev Blog - Securing the Remote MCP Adapter

2026-03-16T00:00:00+00:00

TL;DR: I built Remote MCP Adapter to solve remote file and artifact handling. Then I realized the same adapter could also be abused unless it actively defends against poisoned tool metadata and session misuse. So v0.3.0 is about turning that middleware into a safer boundary, not just a convenient one.

After finishing off the core work for the Remote MCP Adapter, I took a step back and started sharing it in forums to see how others are solving the same issue. You can read more about the inspiration behind it in this blog.

I read this blog on dev.to that talks about how the MCP protocol has an attack layer via tool descriptions. This issue might not affect folks running MCP servers locally but becomes a headache for teams and organizations wanting to host them centrally. An attacker can essentially poison tool descriptions or manipulate tool arguments to make the Agents perform sinister stuff! The author of that post has built a tool, Pipelock that acts as a firewall for AI agents. Do show some love to his work!

The realization

While pondering over the article, I realized that I have created a monster that could easily be used to infiltrate systems. And it’s out in the wild! So I’m taking the next logical step - learn from the blog, explore MCP attack surfaces and provide built-in defense against these attack surfaces.

While writing the v0.3.0 release, I am focusing on addressing four issues -

Tool definition pinning and drift detection - when enabled, the adapter will baseline tools during the first list_tools call for a session. Any drifts or changes in tool titles, schemas and descriptions in subsequent calls will be detected and either warned or blocked entirely.
Normalize and sanitize tool schemas before forwarding - add a metadata preprocessing feature be able to apply a conservative sanitization step to model-visible tool metadata before that metadata reaches the client or model.
Tool description minimization/stripping - allow users to minimize or remove tool descriptions altogether - for those extra-secure environments.
Harden adapter-managed session semantics for stateful HTTP/SSE flows - once the adapter stores uploads, artifacts, cancellation state, or other per-session data, session integrity is no longer just an MCP transport concern. It becomes part of the product’s own security posture. The goal is to make sure the adapter’s own stateful features cannot be misused just because a session ID is known.

I can see a long night up ahead as I gear up to write and test these guardrails out. I shall publish my test results here in a new blog once I have them ready.

Crossing limits

2026-03-11T00:00:00+00:00

TL;DR: Too many MCP tools in the context window slow agents down and worsen tool selection. GitHub tackles this with clustering and embedding-guided routing. In remote-mcp-adapter, Code-Mode avoids the problem by letting agents discover tools progressively instead of loading them all upfront.

The Model Context Protocol has been both - a boon and a curse for Agentic workflows. Yes, it allows you to connect your agent with diverse systems without needing to write custom integrations for every one of them. But as your tasks grow in breadth and complexity, the more integrations you need hence, the more MCP servers you run. This eventually results in large of number of tools that get shoved into your agent’s context window. Maybe the agent needs just 4-5 tools to perform the task, but you still end up paying the price for those extra tokens in the context window.

Why is it harmful

If you look at it from the Agent’s perspective, it sees

the system instructions
task specific instructions
previous tool call results (if any)
tool descriptions and schemas
and finally, your task or the next task at hand

Do you see the problem? It sees the task at the end and, a lot of unrelated information beforehand can confuse even the best of the LLMs. You’ll notice an increased latency, a tendency to choose the wrong tools, losing context in between and claiming a half-finished task as complete and an overall decrease in quality and consistency.

A simple way to restrict this issue is by limiting the number of tools you can have active in a request. GitHub Copilot used to restrict to 128 tools per request but after a lot of flak from the community, they decided to remove that limit. So how did they solve the too-many tools problem? You can read about it here. In a nutshell, GitHub improved tool selection in Copilot by grouping tools into clusters and using embeddings to pre-select the most relevant ones, so the model doesn’t have to reason over hundreds of tools every time.

A conscious trade-off

While working on my remote-mcp-adapter, I came to the realization that my adapter will contribute to increased token usage.

One of the adapter’s functionality is to override tools that required file uploads from clients. While overriding the tool, it also appends instructions on how to perform a staged-upload to the original description. This is done so that the model knows about the original semantics and constraints but also aligns with the staged-upload procedure. This means for every upload-type tool configured, more tokens are sent in a list_tools call. Although each upstream has its own MCP mount path, clients configured to connect to all upstreams would eventually get hit by context bloat.

I did not want to replace the tool description entirely with the upload-staging instruction and risk losing semantics so, I went for keeping only the first sentence of the upstream tool description trimmed to 50 token. I figured the savings in tokens should make up for sacrificing a bit of semantics.

Enter Code-Mode

After reading that FastMCP 3.1.0 brought support for Code-Mode, I was ecstatic. My adapter uses FastMCP so I could just implement a config toggle to enable code-mode.

Code-Mode allows the Agent to progressively discover the tools it needs without having to bloat the context window with all tool definitions. Instead of all your 1000 tools, it surfaces 5 tools - search, tags, list_tools, get_schema and execute. The Agent searches for certain keywords, discovers tools matching those, decides which ones to use, get their schemas then triggers an execute call. The video below will demonstrate Agent behavior without vs with code-mode.

Limits bypassed

With Code-Mode’s progressive discovery integrated into remote-mcp-adapter, teams can configure as many upstreams as they want (within their infra limits, of course). The latest v0.2.0 release of remote-mcp-adapter now includes Code-Mode. Let’s see how it fares.

Remote MCPs as local

2026-03-05T00:00:00+00:00

TLDR; Check out remote-mcp-adapter which provides stateful proxies for upstream MCP servers and handles the file exchange interaction with “file-touching” tools.

Recently, a use-case came along where we were tasked with hosting a central MCP platform. The idea was to bring all MCP servers under a common umbrella, apply guardrails and governance on them and make it the go-to place for all things MCP in the org. And it made sense, given the dangers unverified MCP servers in the wild pose. If we could provide most of the tools teams needed and enabled a secure self-service model to add more servers, we could push teams to use the org-approved MCP platform.

The problem

Although mainly built for local usage, most MCP servers did implement the Streamable HTTP protocol, allowing them to be hosted remotely. We did not run into many hiccups till we got around to adding servers that work with files - either consume or produce them. An example is the Playwright MCP server that can produce artifacts in the form of console logs, screenshots, saved PDFs and consume local files for browser uploads.

While other tools worked as expected, problems arose with the file-touching tools. Since the server and the agent did not share a filesystem, artifacts generated would never reach the agent and whenever the agent needed to upload files, the server would not find it.

MCP constructs to the rescue

The MCP specs define a construct called resources to share data that provides context to language models. They are perfect for sharing the server generated artifacts with the agents. However, file uploads require special handling too as the MCP specification does not support file uploads via elicitation yet.

I initially wrote a wrapper that acted as a proxy between the agent and the Playwright MCP server but pretty soon, need arose to host many more of these types of “file-touching” MCP servers centrally. As a result, instead of writing separate proxies for each, I wrote the remote-mcp-adapter which provides stateful proxies for upstream MCP servers and handles the file exchange interaction with “file-touching” tools.

Being a consumer of opensource, I am hopeful it will be beneficial to the community.

A skill issue

2026-02-27T00:00:00+00:00

TL;DR: LLMs can accelerate development dramatically, but they also widen the ownership gap if you don’t understand your system deeply. Without discipline, agents tend to duplicate logic, bloat control files, and create subtle technical debt. Tooling like persistent “skills” helps, but clean architecture is still a developer responsibility.

Do you remember the old days when there were no LLMs and you wrote the entire code by hand? It took time to deliver but, you had the bragging rights of knowing the entire system inside out. When something broke, you didn’t just troubleshoot blindly, you already had a suspicion of where the bug could be.

With the advent of the AI bubble, the pace at which code can be spewed out increased drastically! Now, more people can produce code. It indeed was a revolution - now, non-tech folks could get a taste of the dopamine hits of seeing your code work and the frustrations when it didn’t. I mean, it got the general person to a terminal, what more can you ask for? 😄

I’m not going to lie - I use LLMs and coding agents in my workflow. While it has helped clean gaps in my architecture and ship faster, it has also led to increased time in code review and reading through code and refactoring. As this article puts it, The Ownership Gap in Production becomes a real problem if you do not understand entirely what your codebase does and I want to avoid it at all costs. I need to know how my systems work.

Repeats and Monoliths

My experience with AI-assisted development has been mixed. Yes, I enjoy the depth I can go to when hardening architecture and the pace at which I can implement once I have locked on a spec. The rush of laying out the groundwork then, seeing your feature get implemented is great. Tests work too, awesome! But now, review the additions and changes and compare with specs - that’s the boring part.

Even if the implementation matches the spec, what I observed was the changes would be limited to a few files and a lot of logic would be duplicated across them. Be it improving/fixing an existing feature or adding a new one, the few major files containing control logic would grow into monoliths with huge functions all over, spaghetti-like call stacks and repetition of same logic that could be generalized. The time I saved in writing the whole thing will now be spent in cleaning up. I tried prompting to stay DRY and modular but thanks to context compression, in a few turns, the behavior returned.

Agent Skills

Since prompts didn’t help much, next step was to look at constructs that stay active in the agent’s working context window. I looked at Agent Skills and found the clean-code-skills that enforce Robert C. Martin’s Clean Code principles. Unlike prompts that get compressed out of context, skills remain active in the agent’s working memory, influencing structure across turns.

I found it really useful and experienced the difference first-hand on the code quality and technical debt when using it for a fresh project. However, at times, I found that on existing or larger code bases, agents would tend to duplicate functionality and turn existing files into giants by cramming as much as possible into them.

I added an extra skill to target specifically this problem - called it clean-features - and I found it useful. I thought of contributing it back and I have put in a pull request with the owner. 🤞

Meanwhile, if you are interested, you can use my fork. I shall keep it synced with the owner’s upstream as I use this skill in my workflow.

Parting Thoughts

In the end, this isn’t about AI. It’s about skills. Agents will focus on finishing tasks, not on making sense. If you don’t understand modularity, separation of concerns, and long-term maintenance, you will just create technical debt more quickly. AI doesn’t replace discipline; it highlights the problems that come from not having it.

PS - I have also started using desloppify to monitor and maintain code quality.

The curse of context windows

2026-02-20T00:00:00+00:00

TL;DR: Large-document extraction with LLMs fails less from “bad reasoning” and more from hard output limits. JSON structured outputs waste tokens on repeated keys and still truncate on big PDFs. Switching to CSV reduces overhead but doesn’t fix truncation—your output can still cut off silently. The reliable fix is chunking the document into page batches, processing chunks asynchronously with strict concurrency limits (semaphores), and stitching results back in order; run summarization as a separate pass.

I was working on a problem to extract structured information from large documents on very lean infrastructure. The input documents were either PDF, CSV or Excel. For CSV and Excel, extraction was pretty straightforward but PDFs posed a separate challenge of their own.

The PDFs we ingested were mostly a mix of digital and scanned. The moment scanned PDFs come in to the picture, one can imagine the various edge cases that come with them - image quality, noise, orientation, spillovers etc.

OCR was the obvious option and with so many opensource libraries available, we were spoilt for choices. I wanted to use Docling as my prior experience with it has been good so far (I shall write a separate blog on those use-cases) but we were constrained by the infra.

Docling uses deep learning to extract structure - something very costly on just CPU. I ran a few tests and it was evident that we needed another approach - we just cannot wait hours for a document to process.

Iteration 1 - LLM with Structured Outputs

Latest LLMs are now capable of processing documents, why not use them? After evaluating the major providers and models, we chose Gemini 2.5 Flash. It ticked all the checkboxes we had -

Large context window to support huge PDFs (1 million tokens)
Fast output speed
Controllable reasoning
Affordable at scale

Our first approach was using structured outputs. The data to extract had to follow a certain schema and also have a kind of “summary”. Seemed a perfect use-case for outputs following a strict JSON schema.

Well, it worked well for smaller documents. As soon as documents grew beyond a certain size, the output parsing would fail. Why? Because the output exceeded the model’s maximum allowed output tokens. Setting this to max did not help either.

Iteration 2 - Why waste tokens on repeated extra characters?

We realized that a lot of the output tokens were being consumed as JSON field names and characters. Rows of data, same schema = same identifiers repeated for every row. A real waste, in my view. So we took a different route.

Since the extracted data had to follow a certain schema, we took a two-step approach. Make the LLM output raw CSV in one pass and make it generate the summary in another parallel pass. CSV is much condensed than JSON and should solve the parsing problem, right?

Well, it kind of did eliminate the parsing problems and the CSV output would parse successfully too. But on a closer inspection, it revealed that the problem had not been solved - just masked by the CSV parser. The LLM would return complete rows but for large documents, rows would suddenly stop. The output was still truncated - we just didn’t catch it this time as there were no parsing errors.

Iteration 3 - Divide and conquer!

Taking a leaf from the old-school world of batch automations, we thought - instead of the entire document, why not pass batches of pages to the LLM? This could enable us to target two birds with one stone:

the issue of truncation
really slow speeds when processing large documents (the more the LLM has to output, the longer the wait time)

So we implemented a pipeline that chunks documents into batches, process them asynchronously, collect the results and perform order-aware stitching to produce the final output. The separate summarization pass still stayed.

We tested this on large documents and found that the truncation problem was gone! Now we were able to process large documents at respectable speeds without the fear of losing data.

What I Learned

Structured outputs is good but only when constrained
For large-scale data extraction using LLMs, not the reasoning capability but the output token limit is your enemy
Asynchronous processing for I/O heavy tasks can go out of bounds and cause all sorts of issues - from rate-limits to memory spikes. Always have semaphores
Thinking tokens matter - especially for large documents:
- For one-to-one copy-paste extraction, with a few prompt engineering tricks, you can get the LLM to follow order and output the copy exactly as the source without using thinking tokens
- For extractions which involve copy-paste, transformations and data inference, with vs without thinking tokens produce noticeable differences in the transformed and inferred data