EVALUATION

Ship with confidence through evaluation

Measure performance for any use-case

Getting the most out of your models starts with  understanding and benchmarking performance. Without it, you're flying blind.

Q&A correctness

Retrieval relevance

Agent effectiveness

Data extraction quality

Chat bot helpfulness

and more

Curate datasets

LangSmith makes it easy to build custom datasets. Upload your own, generate them manually, or pull them in directly from logs.

Capture feedback in the flow

Get the most out of your models by incorporating user feedback. Log feedback to the associated traces. Identify places where your systems are underperforming and then iterate. All in one place.

Easily swap and study models providers

Set yourself up for our many-model world. Make data-informed decisions about what models to use, when.