A note to the people who own the outcome
This one is for the business owners, delivery leads, and engineering managers behind every solution built on Virto Commerce — not only the developers.
When production is degraded, the clock is the enemy. Customers are checking out right now, a GoLive is on the line, and every hour matters. In those moments, the single biggest predictor of how fast we resolve the issue together is not the severity of the bug — it’s how we work the problem. That’s a process you own as a team, and it’s worth investing in before the incident, not during it.
So consider this the opening article in a series.
The goal is simple: help you and your team get to the real cause faster, work effectively with Virto Commerce and the Virto Team, and stop spending energy on the one activity that never fixes anything — deciding whose fault it is.
The mindset that actually resolves incidents
There’s a pattern I’ve seen more than once across our ecosystem. Let me describe a character — entirely fictional, any resemblance to a real sprint is purely coincidental. ![]()
A senior developer is convinced the platform is broken. They spend two weeks — call it 200 hours — building a case. Beautiful screenshots. Annotated diagrams. A theory. At the end of it the conclusion is: “It’s a Virto Commerce problem.” The incident is still open. Customers are still affected. And now we’ve lost two weeks that could have been spent on diagnosis.
Here’s the thing: even if the platform were the cause, the 200 hours spent proving it didn’t fix anything. The energy went into transferring responsibility instead of removing the problem.
The mindset that works is the opposite:
- What can we stabilize right now? (Buy back headroom, isolate the workload, restore service.)
- What is the actual cause? (Platform, Custom Code, or Data — proven, not assumed.)
- What is the correct long-term fix? (So it doesn’t come back next month.)
Notice “whose fault is it” is not on that list. We’ll find out where the cause lives as a byproduct of the analysis — but blame is never the objective. Virto Team is on your side here. We want the same thing you do: a healthy production system and a fast path to it.
Do the homework before the fire
Modern tooling — Virto Commerce, Virto Cloud, Azure Monitor, profilers, distributed tracing — gives you almost everything you need to find a root cause. But tools only help the team that already knows how to use them. The worst time to discover you don’t have access to Application Insights is during a Priority 1 (P1) incident - a critical, high-impact business disruption requiring immediate, all-hands response.
So, in calm weather:
- Understand your architecture today. Virto Commerce is composable and it evolves with modern requirements — scalability options, background processing, indexation strategies, caching. Know how your specific solution is wired and where it can flex.
- Verify your access and tooling now. Application Insights, the Azure portal, database access on non-prod, logs, the ability to spin up a copy of production. If any of these is missing, that’s a ticket to open this week, not mid-incident.
- Know how to reproduce and observe. A team that can reliably reproduce an issue on a non-prod copy is a team that resolves it.
Preparation isn’t overhead. It’s the difference between a two-hour incident and a two-week one.
The platform is not your project (and that distinction matters)
This is the part I most want partners to internalize.
Virto Commerce is the platform. Your solution is the platform plus everything you and your integrators added on top — custom modules, custom code, third-party dependencies, configuration, integrations, and your data. The platform is heavily tested, runs across 100+ companies, and is exercised every day. Your customizations are unique to you. That’s not a criticism — it’s the whole point of a composable platform. But it means “the platform is broken” should be a conclusion, not a starting assumption.
How do you tell whose layer a problem lives in? A few practical examples:
- Reproduce it on clean Virto Commerce. This is the single most powerful move. Take the same scenario — same request, same data shape — and run it on a vanilla install (
start-localgives you the full stack on your machine in one command). If it doesn’t reproduce on clean Virto, the cause is very likely in your customizations or data, not the platform. - Read the stack trace. If the exception originates in
YourCompany.CustomModule.*, that’s a strong signal. If it’s deep in a platform call but only under your specific data, suspect data or configuration. - Change one variable. Same code, same queries, more CPU headroom → problem disappears? Then it was resource saturation, not a code defect (more on this below).
- Check what changed. A module upgrade, a config change, a data import, a new integration — incidents usually have a trigger. Find the diff.
When in doubt, Virto Team can help reproduce a scenario on clean Virto Commerce to settle the question objectively. We’re glad to. But the fastest answer usually comes from the team that already has the context.
Collecting the right information is 90% of the solution
I’ll say that again because it’s the most important sentence in this article: gathering the right information correctly is about 90% of solving the problem. A clean diagnosis almost always resolves itself once the data is on the table. A messy one drags on for weeks regardless of how smart the people are.
Here’s what “the right information” looks like.
1. Trust your data — and the right tools
Use the real instruments, and learn to read them correctly:
- Azure Monitor / Application Insights — request and dependency durations, exceptions, performance counters.
- Error details and dependency calls — the actual exception, the actual SQL, the actual downstream call.
- CPU, memory, thread pool, and GC metrics — always together, never in isolation.
One critical caveat, because it bites everyone eventually: when CPU is above ~85–90%, Application Insights durations become misleading. Under CPU saturation the thread pool can’t schedule work fast enough, so AI reports wall-clock time (execution plus scheduling delay), not real execution time. A dependency that looks like it took 120 ms may have spent 110 ms waiting for CPU and 10 ms doing actual work. The database didn’t get slow — the host ran out of CPU. I wrote this up in detail here, and it’s required reading before you interpret any duration chart:
Do Not Blindly Trust Application Insights Durations When CPU Is Overloaded
The rule that follows: scale first, optimize second. Stabilize CPU headroom so your telemetry becomes trustworthy again, then profile and optimize.
2. Versions — what changed
Record the platform version and the version of every module, and note what changed recently. “It worked last week” is a clue; “we upgraded module X and changed appsettings on Tuesday” is half the answer.
Virto Commerce has native tools for exporting the current version of the installed platform and modules.
- Click on platform version
- Click either Download manifest, Download package or Copy
3. Reproduction and expected result
Write down: how to reproduce the problem, and what the expected result is. A request that “is slow” is not a report. “POST to graphql/AddOrUpdateCart takes 867 ms p95 under N concurrent users, we expect <150 ms” is a report someone can act on.
Application Insights allows exporting data in different formats, my favourite is Copy data, it exports end-to-end transactions as json document
4. Capture the live session, not a screenshot
When the problem is “a page or request is slow,” the most useful artifact you can hand us is the actual network session — not a photo of the timings.
A HAR file (HTTP Archive) is the de-facto standard for recording a full web session: every request and response, headers, payloads, and precise timings, all in one file your browser produces natively. It’s replayable, measurable, and removes the guesswork — which is exactly why providing one to support tends to expedite resolution dramatically.
We wrote up the how-to here:
Capture Web Session Traffic
It works in Chrome, Edge, or Firefox — Chrome DevTools is marginally the simplest:
- Right-click the page → Inspect.
- Open the Network tab.
- Reproduce the slow action, then click the download / Export HAR button (the tooltip > > reads “Export HAR”). Choosing “Save all as HAR with content” includes the > request/response bodies.
- Name the file and Save — all requests on the page are captured into that one file.
A few habits make the capture far more valuable:
- Reproduce, then record. Start the capture, perform only the slow action, then export. A focused HAR beats a noisy one full of unrelated traffic.
- Mind the secrets. A HAR contains headers, cookies, and tokens — capture from a test account where you can, and treat the file as sensitive when sharing.
- Add the context. Note the timestamp, the environment, what you did, and the expected-vs-actual result, so the timings in the HAR map to a story.
5. Let the AI assistants do the heavy lifting
This is genuinely changing how fast we diagnose issues, so use it. Virto Commerce now ships AI assistance across the platform, and our documentation is available to AI coding assistants directly:
- Virto OZ provides context-aware assistance and a developer copilot across the platform — it can locate, aggregate, and summarize platform data and support technical investigation in plain language. See Getting Started → AI Assistance in the Platform Dev Docs.
- Our docs are in Context7, so if you use Claude Code, Cursor, or any MCP-compatible assistant, you can have it reason over the current Virto Commerce documentation instead of relying on stale model memory: Virto Commerce Docs Are Now Available in Context7.
An AI assistant that can read your stack trace and the current docs will often point you at the right module in minutes. Use it as your first responder.
If you’re stuck, hand the problem over the right way
Sometimes you genuinely can’t tell where the issue lives, and you bring it to Virto Team. Wonderful — that’s what we’re here for. But how you hand it over determines how fast we can help. Context transfer is everything.
I’m attaching some real anonymized examples of how problems reach us, because the contrast is instructive. (Names removed — this is about the pattern, not the people. Every one of us has done the “wrong” version at some point, myself included.)
How not to hand over a problem
- A screenshot of a data grid instead of the data. A photo of an end-to-end transaction table tells me a request was slow. The exported transaction data tells me why. When the export exists and we get a screenshot of it, we’ve lost the most useful artifact.
- “The database is slow” with no evidence — or with evidence that actually points elsewhere. I’ve seen reports conclude “SQL is the suspect” and “maybe Elastic too,” sitting right next to a CPU-at-100% chart. As we just covered, at 100% CPU the durations are lying to you. The data was there; it was read backwards.
- Cause and effect reversed. “A simple entity update took 2 minutes, so the database is broken” — when the entity update was slow because the application was paused waiting for CPU. The symptom got promoted to a cause.
- A link only one team can open. Pasting an Azure portal link that support can’t access (“the link content is inaccessible to me, but maybe you have access”) moves the work back to us to even see the problem.
I’ll be candid about why this matters beyond convenience: when a handover is mostly screenshots and a conclusion, it often reads — fairly or not — as closing the ticket on our side rather than solving the problem. Even when that’s not the intent, it has the same effect: it slows everyone down. The fix isn’t more effort, it’s the right artifacts.
How to hand over a problem
- Export the data, don’t photograph it. The actual transaction details file, the JSON, the query results. If it exists as data, send the data.
- The exact reproduction. The exact request (or a request that reproduces it), inputs, and the environment it happened on.
- The full exception — message and stack trace as text, not a cropped screenshot.
- The metrics in context — CPU/memory alongside the durations, for the same time window, so nobody reads a saturated host as a slow database.
- Versions and recent changes.
- Access that works for us, or the content extracted and shared directly.
A few extra suggestions while we’re here: give us a non-prod environment we can poke at rather than only screenshots of prod; state your expected vs. actual explicitly; and if you have a hypothesis, tell us what you already ruled out and how. That turns a guessing game into a collaboration.
Meet us with an active stance
When you follow Virto Team’s recommendations, we move fast. We’ll get on a call, we’ll dig into it together, we’ll reproduce on clean Virto if needed. But this is a two-way street, and I want to be honest about the friction I sometimes see.
On a genuine P1, the rhythm too often goes: we send a recommendation or offer a call slot within the hour — and the reply comes two days later. On a P1, a two-day round trip isn’t a process, it’s an outage extended by latency. We can bring urgency, expertise, and tooling. What we can’t bring is your active engagement on your own production system. Meet us halfway and these incidents close in hours, not weeks.
Know where it is? Contribute the fix instead of waiting
Here’s a genuinely empowering option that partners underuse: if your investigation lands on something in the platform — a bug, a missing extensibility point, a small improvement — you don’t have to wait for us to schedule it. Virto Commerce is open source. Branch from dev, open a PR, sign the CLA on your first contribution, and every PR builds an Alpha release you can test before merge.
Contributing the fix is frequently faster than waiting in a queue, it gets the improvement to the whole ecosystem, and — selfishly for you — it means the next platform upgrade already contains your fix instead of re-breaking your patch. Here’s how:
How to Contribute to Virto Commerce
Production is a daily practice, not a one-time launch
The last shift I’ll ask for is the most strategic. Running in production is a daily discipline, not a milestone you pass at GoLive. Your project grows — new clients, new features, more data — and your technical team should see and understand what’s happening: the trends, what’s approaching a limit, what will need a change or an improvement and roughly when.
Virto Commerce’s architecture forgives a lot and simplifies a lot — but it can only help a team that’s watching. And please don’t forget to plan upgrades to stable Virto Commerce platform versions as part of that practice. We are improving the platform constantly — performance, scalability, tooling, AI assistance — and staying current means many problems get fixed for you before you ever hit them.
Bottom line
We are open, and we are here to help. We know the platform deeply. What we don’t know are the unique details and nuances of your specific solution — and that’s exactly the knowledge that resolves most incidents. We invest heavily in automation, tooling, and AI assistance to close that gap from our side.
So let’s make a deal: we’ll stop nothing short of helping you succeed, and in return, let’s not spend our shared energy moving responsibility back and forth. Let’s point all of it at the problem. Collect the right data, hand it over cleanly, meet us with urgency, contribute when you can, and watch your production like the living system it is.
That’s how these get solved in hours instead of weeks. Every time.








