DEV Community

Joel Etse
Joel Etse

Posted on

Introducing Humiris MoAI Basic : A New Way to Build Hybrid AI Models

Today, we’re excited to introduce Humiris MoAI Basic, an AI infrastructure designed to help AI engineers and developers seamlessly mix multiple LLMs into tailored, high-performance AI solutions. With MoAI Basic, you’re not constrained to a single model’s strengths or weaknesses.
Instead, you can tune your AI by mixing models that excel in speed, cost-efficiency, quality, sustainability, or data privacy enabling you to create a uniquely optimized model for your organization’s needs.

Modern AI applications often face complex and shifting requirements. Some projects demand near-instant responses at scale, while others need to adhere to strict data compliance laws or curb computational overhead for environmental responsibility.
Traditional single-model approaches often force trade-offs, but MoAI Basic changes the equation. By blending and balancing multiple LLMs, you have the freedom to align your model configurations directly with your evolving objectives, all without getting locked into a single provider or architectural limitation.

Why MoAI Basic?

Existing LLMs are powerful but come with trade-offs. High-end models deliver remarkable depth but can be expensive and slower, while lightweight, open-source models offer speed and affordability at the expense of sophistication. MoAI Basic bridges these gaps by orchestrating a diverse set of models behind the scenes.
It selects the right combination at the right moment, optimizing for your chosen criteria without locking you into a single model’s limitations.

Image description

How It Works

At its core is a “gating model” a specialized AI model trained to evaluate each incoming query and decide which LLMs to involve. For example, a complex research request might tap into a more advanced model, while a quick, routine query might lean on a cost-efficient one. Over time, this system refines its approach based on real world performance data, making your AI experience progressively more aligned with your goals.

When a query is received, the gating model begins by analyzing its characteristics to understand its requirements. This process involves:

Intent Recognition: Identifying the type of task (e.g., creative writing, technical analysis, summarization).
Complexity Assessment: Determining how complex the query is and whether it requires deep reasoning or factual precision.
Domain Identification: Understanding the subject matter to ensure the query is routed to a model with expertise in that field.

For example:
A query like “What is the capital of France?” is classified as simple factual retrieval.
A query like “Analyze the economic implications of AI adoption on labor markets.” is marked as complex and multidisciplinary.

Mix-Tuning: Customizing Model Behavior with Mix-Instruction Parameters

Image description

Mix-Tuning (or mix instructions) in MoAI Basic allows users to define how the gating model select and orchestrates models based on their specific goals. This feature empowers the gating model to prioritize and balance parameters such as cost, speed, quality, privacy, and environmental impact.

Through mix instructions, users can fine-tune how queries are processed, ensuring that the system adapts to both the complexity of the task and the operational priorities.

Core Parameters for Mix-Tuning

  • Cost Optimization

Objective: Minimize expenses while maintaining acceptable response quality.
Use Case: Applications with budget constraints or large-scale deployments.
Behavior:Simple queries are routed to lightweight, cost-efficient models.
Complex queries may involve higher-cost models but with a trade-off against quality thresholds.

Example Instruction:
"Minimize cost by 50% while keeping 70% response quality."

  • Performance

Objective: Achieve the highest-quality and most accurate responses.
Use Case: Research, critical decision-making, or high-stakes applications.
Behavior:
Prioritizes high-performance models, regardless of cost or speed.
Aggregates responses from multiple models to ensure depth and precision.
Example Instruction:

"Optimize for 90% performance, regardless of cost."

  • Speed

Objective: Minimize latency for time sensitive tasks.
Use Case: Real-time applications such as customer support or emergency systems.
Behavior:Routes queries to the fastest models, even at the expense of quality or cost.
Limits the involvement of models with high latency.
Example Instruction:

"Maximize speed to 80%, even if it sacrifices 20% performance."

  • Privacy

Objective: Ensure secure handling of sensitive data.
Use Case: Healthcare, finance, and confidential data processing.
Behavior:Utilizes secure, open-source models or private servers.
Excludes external APIs for privacy-critical queries.
Example Instruction:
"Guarantee 100% privacy, even if speed and cost are compromised."

  • Environmental Impact

Objective: Reduce energy consumption and carbon footprint.
Use Case: Green AI initiatives or sustainability-focused organizations.
Behavior:Prefers energy-efficient models and infrastructure.
Avoids models with a high computational load.
Example Instruction:
"Reduce carbon footprint by 70% while maintaining 60% performance."

Customizable Mix-Instructions

  • Simple Mix-Instructions: Single parameter optimization directives that focus on one priority.
    "Minimize cost by 50%."
    "Ensure responses within 100 milliseconds."
    "Optimize for performance at 85% quality."

  • Compound Mix-Instructions: Complex directives that balance multiple parameters.
    "Optimize for 60% speed and 70% privacy."
    "Minimize cost by 50% while maintaining 80% performance."
    "Ensure 90% privacy and 70% speed, even at increased costs."

Examples of Mix-Tuning in Action

Scenario 1: Speed-Centric Query
Mix Instruction: "Maximize speed at 80%, allow up to 20% quality reduction."
Gating System Action:
Selects fast models like Llama 3.1 8B.
Avoids slower, high-quality models like Claude 3.5 Sonnet.

Scenario 2: Privacy-First Query
Mix Instruction: "Ensure 100% privacy with 60% performance."
Gating System Action:
Routes queries to secure, open-source models like Gemma 2B on private infrastructure.
Excludes external APIs or commercial closed models.

Scenario 3: Balanced Optimization
Mix Instruction: "Reduce costs by 40%, improve speed by 60%, and maintain 70% quality."
Gating System Action:
Combines a lightweight proposer model (e.g., Llama 3.1 8B) with a high-quality aggregator (e.g., Claude 3.5 Sonnet).
Dynamically adjusts resource allocation to achieve the balance.

Real-World Applications

  • Cost-Effective AI for Enterprises
    A customer support platform uses MoAI Basic to handle common queries with lightweight models, reducing operational costs while reserving powerful models for complex issues.

  • Real-Time Decision-Making
    In financial trading, MoAI Basic leverages fast models for instant responses, ensuring latency doesn’t impact profitability.

  • Privacy-First Healthcare Solutions
    A telemedicine provider routes patient data exclusively to secure, open-source models, ensuring compliance with strict privacy regulations.

  • Green AI Initiatives
    MoAI Basic powers applications that minimize energy usage, contributing to corporate sustainability goals.

Image description

Looking Ahead: MoAI Advanced

For organizations with even more demanding needs, MoAI Advanced takes the concept further. It enables collaborative interactions between multiple LLMs for highly nuanced outputs. With features like parallel processing, sequential thought chains, and iterative refinement, MoAI Advanced opens new horizons in AI capabilities.

*Join the Revolution
*

With MoAI Basic, Humiris is democratizing access to customizable, efficient, and sustainable AI. Whether you’re a startup looking to optimize costs or an enterprise aiming for cutting-edge performance, MoAI Basic is your gateway to the next generation of AI solutions.

Learn more about how you can harness the power of MoAI Basic and redefine what’s possible with AI at humiris.ai.

Top comments (0)