Measure performance for any use-case
Getting the most out of your models starts with understanding and benchmarking performance. Without it, you're flying blind.
Q&A correctness
Retrieval relevance
Agent effectiveness
Data extraction quality
Chat bot helpfulness
and more
Curate datasets
LangSmith makes it easy to build custom datasets. Upload your own, generate them manually, or pull them in directly from logs.
Capture feedback in the flow
Get the most out of your models by incorporating user feedback. Log feedback to the associated traces. Identify places where your systems are underperforming and then iterate. All in one place.
Easily swap and study models providers
Set yourself up for our many-model world. Make data-informed decisions about what models to use, when.
Resources
Learn more about Evaluation best practices
Webinar
Blog Post
Blog Post