Dmitrii

Posted on Dec 11, 2024

Open Source LLMOps LangSmith Alternatives: LangFuse vs. Lunary.ai

#ai #devops #opensource #tooling

LangSmith is a powerful LLMOps platform, but its cost and cloud reliance can be drawbacks. Open-source options like LangFuse and Lunary.ai offer open source self-hostable alternatives. This guide compares their features to help you choose the best fit for your needs.

⚠️ Note: This information is accurate as of December 2024. The landscape of LLMOps evolves rapidly, so updates within the next three months are likely. Also I'm focusing on TypeScript and NodeJs integrations and tooling, not Python.

For testing, I’ve integrated these tools into my LangGraph.js demo project, which mirrors common production tasks:

Nested execution flow (subgraphs).
Gemini and OpenAI LLM calls.
Input parameter handling.
Data retrievals from Qdrant.
Tagging for trace organization.
Conditional map-reduce branching.

Let’s explore how these platforms stack up against LangSmith.

TL;DR for the Busy Reader

Observability Only:
- Free and self-hosted: Choose LangFuse.
- Cloud-based: Opt for Lunary, which is more cost-effective.
Full-Feature LLMOps Suite:
- Using LangChain/LangGraph? Stick with LangSmith.
- Exploring other frameworks? Go with LangFuse.
Special Conditions: Non-profits, educational institutions, and open-source projects can negotiate favorable terms with LangFuse and LangSmith.

Traces

Traces are vital for effective LLM observability. All three platforms excel in:

Tree structure for trace visualization.
Metadata tagging for better trace organization.
Role-based conversation history (e.g., assistant, user, system).
Visual clarity with a polished UI.
Duration tracking for operations.
Pricing details for calls.
Error tracking for debugging.
Session-based data organization.
Support for Base64 images.

Platform-Specific Limitations

Image/Attachment Display: Neither LangFuse nor Lunary supports displaying images or attachments via URLs.
Lunary:
- Automatic PII masking requires an Enterprise license, with no manual masking options.
LangFuse:
- Automatic masking is unavailable, but basic manual masking is supported.

Overall, all three products are quite similar in a positive way when it comes to observability features

LangSmith UI

Lunary UI

LangFuse UI

Metadata and Search

All platforms offer robust search capabilities by date, metadata, IDs, status, and duration. No notable differences here.

Monitoring

Monitoring features are strong across all three platforms:

LangSmith: Offers custom charts for deeper insights.
LangFuse & Lunary: Provide built-in dashboards with filtering options.
- Lunary: Advanced monitoring available in the Team Tier.

LangFuse Monitoring Example

Datasets

LangSmith is fully featured.

Key Differences

Lunary: Doesn’t support adding traces to datasets directly.
Exporting:
- LangFuse lacks export options, such as those needed for OpenAI fine-tuning.
- Lunary requires a Team license for exports.

Playground

One of the must-have feature for debugging and improving agent prompts.

Lunary: Offers limited usage in the basic tier.
LangFuse: Requires a $100/user Pro license for self-hosted deployments.

LangFuse Playground Configuration

Lunary Playground Configuration

Hard to compete with Langsmith here.

Deployment

Lunary: Docker/Kubernetes deployment requires an Enterprise license. You have to install and maintain the product on your own. Don't remember when the last time run anything not in containers.
LangFuse: The free self-hosted docker version lacks a playground and LLM-as-judge evaluators.

Prompt Experiments

All platforms perform well in this area, offering robust tools for testing and refining prompts.

Integrations

LangFuse & Lunary: Compatible with LangChain, LangGraph, LlamaIndex, DSPy, and more.
LangSmith: Limited to LangChain and LangGraph.

Evaluators

Evaluators are essential for scoring traces and running tests on datasets.

LangSmith: Full evaluator suite.
LangFuse: Full evaluator suite. Requires a paid license for LLM-as-judge. Not limited with langchain.
Lunary: Lacks built-in evaluators.

LangFuse Evaluation

Both LangSmith and LangFuse have advanced scoring and evaluation features, extremely useful toolset for any LLM application.

Documentation

All 3 platforms provide perfect complete documentation with tons of examples.

Pricing

For a small team of 3 users:

Platform	Self-Hosted	Cloud
Lunary	Free	$20/user/month
LangSmith	N/A	$39/user/month (50% for startups)
LangFuse	Free (Observability), $100/user/month (LLMOps)	$60/user/month (50% for startups)

⚠️ Note: This information is accurate as of December 2024

More platforms

OpenLLMetry sdk + Traceloop. It's a great platform, but I was seaching for self-hosted product.
Phoenix - Arize AI one of the most powerful LLMOps tools with observability and evaluation features. But has poor langchain.js and other TypeScript framework integrations and focused on ML in general, not just LLM, so it's making UI less intuitive when develop chatbots or LLM agents.

Summary: Choosing an LLMOps Platform

Best for Observability:
- LangFuse: Free self-hosted option.
- Lunary: Affordable cloud-based solution.
Best for Full Features:
- LangSmith: Comprehensive LLMOps suite, ideal for LangChain/LangGraph users.
- LangFuse: Good for other frameworks.

DEV Community