Forem

Cover image for How to Easily Share OpenLLM API Online
Lightning Developer
Lightning Developer

Posted on

How to Easily Share OpenLLM API Online

Deploying and Exposing Self-Hosted AI Models with OpenLLM and Pinggy:

As generative AI adoption grows, developers increasingly seek ways to self-host large language models (LLMs) for enhanced control over data privacy and model customization. OpenLLM is an excellent framework for deploying models like Llama 3 and Mistral locally, but exposing them over the internet can be challenging. Enter Pinggy, a tunneling solution that allows secure remote access to self-hosted LLM APIs without complex infrastructure.

This guide walks you through the process of deploying an OpenLLM instance and sharing it with a public URL using Pinggy—making your AI services accessible in just a few minutes.

Why Self-Host LLMs?

  • The Rise of Local AI Deployment
  • Many developers prefer to host LLMs locally due to:
  • Data Privacy: Avoid sending sensitive data to third-party API providers.
  • Cost Efficiency: Reduce API usage costs associated with cloud-based services.
  • Customization: Fine-tune and optimize models based on specific needs.

However, a major drawback of self-hosting is that the models remain confined to a local machine, limiting access for collaboration, integration, and testing. This is where Pinggy simplifies remote exposure.

Why Use Pinggy for Tunneling?

Pinggy provides a lightweight, secure, and efficient solution for exposing local services over the internet. Compared to other tunneling tools, it offers:

  • Free HTTPS URLs with minimal setup
  • No rate limits on free-tier usage
  • Persistent URLs with the Pinggy Pro plan
  • Built-in web debugger to monitor incoming requests

By integrating Pinggy, you can share your OpenLLM API remotely without complex networking configurations.

Step-by-Step Guide to Deploy and Share OpenLLM

Step 1: Install OpenLLM & Deploy a Model

Prerequisites:

Python installed

pip package manager

Install OpenLLM:

pip install openllm
Enter fullscreen mode Exit fullscreen mode

Installation

Start a Model Server:

To launch an LLM, use the following command (replace llama3.2:1b-instruct-ggml-fp16-linux with your preferred model):

openllm serve llama3.2:1b-instruct-ggml-fp16-linux

Enter fullscreen mode Exit fullscreen mode

How to start a Model Server
Supported Models: Mistral, Falcon, Qwen, Dolly-v2, and more.

At this point, OpenLLM is running on localhost:3000 but inaccessible outside your machine. Let’s expose it using Pinggy.

Step 2: Expose OpenLLM API via Pinggy

Create a Secure Tunnel:

Run the following command to create a secure remote tunnel:

ssh -p 443 -R0:localhost:3000 a.pinggy.io
Enter fullscreen mode Exit fullscreen mode

Upon execution, Pinggy will generate a public URL that allows remote access to your model. For example:

https://xyz123.pinggy.link
Enter fullscreen mode Exit fullscreen mode

Access API Endpoints:

Once exposed, use the provided URL to interact with OpenLLM:

  • Check API Status:

    curl https://xyz123.pinggy.link/
    

OpenLLM

  • Access OpenLLM WebUI:

     curl https://xyz123.pinggy.link/chat
    

Chat

  • List Available Models:

     curl https://xyz123.pinggy.link/v1/models
    

Models

Advanced Configuration and Security

Secure Your API with Authentication:

To restrict access, append a username and password to your SSH command:

ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password
Enter fullscreen mode Exit fullscreen mode

This adds an authentication layer, ensuring only authorized users can access the endpoint.

With Pinggy Pro, you can configure a custom domain for your LLM service, improving branding and ease of access.

Real-World Use Cases

  1. Collaborative AI Development
    Teams can share an OpenLLM instance for testing and model fine-
    tuning.
    Remote developers can integrate AI models into applications without
    local installations.

  2. AI-Powered Customer Support & Content Generation.
    Expose OpenLLM’s API to build chatbots for businesses.
    Use LLMs for automated content creation in marketing and social
    media.

  3. Academic & Research Workflows
    Researchers can collaborate on AI models without exposing internal infrastructure.

OpenLLM can be used for real-time experiments and AI benchmarking.

Troubleshooting & Optimization

Model Loading Issues?

Ensure your machine meets the hardware requirements (RAM/GPU availability).

Try using a lower-precision model:

openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4
Enter fullscreen mode Exit fullscreen mode

Connection Timeouts?

For unstable networks, use Pinggy’s persistent tunnel mode:

while true; do
  ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io;
  sleep 10;
done
Enter fullscreen mode Exit fullscreen mode

Conclusion:
Combining OpenLLM for model deployment with Pinggy for secure remote access creates a straightforward and effective solution for AI developers. It enables full control over models, remote access without infrastructure complexity, and enhanced security with authentication and custom domains.

Top comments (0)