How Pendo used LangSmith to trace Novus from user behavior to code fixes

Zain Lakhani
July 1, 2026
6
min
Go back to blog

Guest post by Zain Lakhani, Chief AI Officer, Pendo

Novus is a product agent that detects usability issues in live applications, fixes the underlying code, and improves the user experience. It achieves a 90%+ success rate on PM-reviewed evals and we shipped it to production in days, not months. LangSmith is a core reason we could do both.

AI coding tools sped up shipping, but left the product feedback loop behind 

Our users have traditionally been product managers looking at dashboards, talking to users, and writing PRDs based on their discovery. Now they're product engineers shipping code every day. Our existing platform wasn’t meant for that speed.

Everyone in the market is focused on enabling people to become developers. They’re helping teams run four tickets at once with AI coding tools, effectively solving the "coding and shipping" problem. The resulting velocity ignores the second half of the end-to-end product lifecycle: the vital combo of the developer and the product manager. The developer ships quickly, the PM collects feedback and provides context on what to iterate on, and the developer continues to ship. 

The result is a broken feedback loop. Code hits production without the previously-common user acceptance testing. As a result, a lot of what's going out is difficult-to-use software that struggles to meet its adoption and retention goals.

Novus exists to close the full cycle; you've shipped something, users are struggling, and we fix it right after. Keep going fast—we'll catch and address issues before they become a problem.

Novus turns product analytics and session replays into code fixes 

A user links their codebase and installs a Novus snippet that monitors all user clicks and records session replays. Novus aggregates this behavioral data and uses AI to interpret it, surfacing concrete, actionable issues continuously. It might say: "We noticed a 3% funnel conversion drop-off from checkout to order confirmation on a page that gets a thousand visits a day."

The agent's intelligence lies  in the end-to-end analysis: using session replays to diagnose the root cause (e.g., identifying rage clicks), correlate that behavior with the specific code files involved, and generate a suggested fix. 

That cycle has a lot of moving parts. When something goes wrong (a tool call returns unexpected data, a subagent goes sideways, a prompt change degrades output quality) you need to see exactly what happened. That's why we shipped LangSmith tracing to production as part of the Claude Agent SDK integration in Novus. It's now our primary window into how the system behaves.

LangSmith debugs Novus in production

LangSmith has been our agent observability platform from first design-partner conversation through production. What we look at has shifted as Novus matured, but LangSmith has remained a constant foundation. 

Traces showed how users interacted with Novus and which use cases to prioritize

During the design-partner phase, we lived in LangSmith’s trace view. Every morning, first thing, we'd open it and read through individual conversations—what people asked the agent, how it responded—and that's how we picked out our use cases. We read what users actually did, straight off production, without any guessing or potentially false assumptions. Over time, those use cases became the suggested prompts we shipped at open beta, and then the backbone of our eval sets.

In production, traces still do the obvious job. Every run generates a full trace tree—inputs, outputs, tool calls, subagent invocations, token counts, cost data—so when a customer tells us a generated PR didn't address the right issue, we pull up the trace and walk through every decision the agent made. The nested structure maps to how the agent is organized, so it's straightforward to see where a reasoning step went wrong.

Trace tags connect support issues, customer activity, and cost

We tag every trace with username, conversation ID, and organization. That routing means any support or engineering issue goes straight to the relevant trace vs. us hunting through logs. It also lets us monitor cost at the per-organization level. We want token spend funnelled to the smartest models  but still need to know what it costs and where. The tagging tells us which orgs and which workflows are expensive without making us give up quality.

Usage data shows how each customer gets value from Novus

That same tagging shows us which organizations are leaning on which use cases, which has turned into one of the most valuable aspects of LangSmith. It tells us how each customer is actually getting value from Novus, and we tailor our outreach and customer-engineering engagements around it. We've never had this clean a view into how customers use the product from an AI perspective.

Thread view indicates whether multi-turn conversations reach a resolution

Novus is multi-turn. A developer might ask follow-up questions about a detected issue before a PR gets generated. Thread view lets us see the full conversation trajectory instead of isolated turns, which is key when you're trying to tell whether the agent actually guided someone toward a resolution or just produced output.

Feedback scores capture how users respond to Novus outputs 

We view feedback scores on runs directly in LangSmith, which gives us a signal on how outputs land in practice, not just in testing.

LangSmith traces showed when Novus used analytics or code context, instead of both 

We noticed very early on, by watching traces, that the agent was choosing to take either the product analytics data into account or the code-based context, but very rarely both. We caught this early in LangSmith and tuned our prompts accordingly to make it more explicit that the power of Novus comes from combining the two. Relying solely on product analytics or code-based context brings us back to the pre-Novus era.

Results

  • 25% time saved compared to previous products for identifying and evaling new use cases
  • 60% of AI problems caught via traces before being caught by customers

Novus is built for product teams shipping faster than they can observe 

Novus is built for product engineers. That is, teams responsible for both the shipping velocity and usage. As AI coding tools keep compressing the time between idea and production, the gap between what's deployed and what's understood is only going to grow. Our job is to close that gap automatically, within minutes of a user session.

Pendo provides product analytics that help companies understand user behavior and drive product adoption. Novus is Pendo's product agent for automatically detecting usability issues, fixing the underlying code, and improving the user experience—closing the loop between behavioral data and better software.

See what your agent is really doing

LangSmith, our agent engineering platform, helps developers debug every agent decision, eval changes, and deploy in one click.