Deploying and Exposing Self-Hosted AI Models with OpenLLM and Pinggy:
As generative AI adoption grows, developers increasingly seek ways to self-host large language models (LLMs) for enhanced control over data privacy and model customization. OpenLLM is an excellent framework for deploying models like Llama 3 and Mistral locally, but exposing them over the internet can be challenging. Enter Pinggy, a tunneling solution that allows secure remote access to self-hosted LLM APIs without complex infrastructure.
This guide walks you through the process of deploying an OpenLLM instance and sharing it with a public URL using Pinggy—making your AI services accessible in just a few minutes.
Why Self-Host LLMs?
- The Rise of Local AI Deployment
- Many developers prefer to host LLMs locally due to:
- Data Privacy: Avoid sending sensitive data to third-party API providers.
- Cost Efficiency: Reduce API usage costs associated with cloud-based services.
- Customization: Fine-tune and optimize models based on specific needs.
However, a major drawback of self-hosting is that the models remain confined to a local machine, limiting access for collaboration, integration, and testing. This is where Pinggy simplifies remote exposure.
Why Use Pinggy for Tunneling?
Pinggy provides a lightweight, secure, and efficient solution for exposing local services over the internet. Compared to other tunneling tools, it offers:
- Free HTTPS URLs with minimal setup
- No rate limits on free-tier usage
- Persistent URLs with the Pinggy Pro plan
- Built-in web debugger to monitor incoming requests
By integrating Pinggy, you can share your OpenLLM API remotely without complex networking configurations.
Step-by-Step Guide to Deploy and Share OpenLLM
Step 1: Install OpenLLM & Deploy a Model
Prerequisites:
Python installed
pip
package manager
Install OpenLLM:
pip install openllm
Start a Model Server:
To launch an LLM, use the following command (replace llama3.2:1b-instruct-ggml-fp16-linux
with your preferred model):
openllm serve llama3.2:1b-instruct-ggml-fp16-linux
Supported Models: Mistral, Falcon, Qwen, Dolly-v2, and more.
At this point, OpenLLM is running on localhost:3000
but inaccessible outside your machine. Let’s expose it using Pinggy.
Step 2: Expose OpenLLM API via Pinggy
Create a Secure Tunnel:
Run the following command to create a secure remote tunnel:
ssh -p 443 -R0:localhost:3000 a.pinggy.io
Upon execution, Pinggy will generate a public URL that allows remote access to your model. For example:
https://xyz123.pinggy.link
Access API Endpoints:
Once exposed, use the provided URL to interact with OpenLLM:
-
Check API Status:
curl https://xyz123.pinggy.link/
-
Access OpenLLM WebUI:
curl https://xyz123.pinggy.link/chat
-
List Available Models:
curl https://xyz123.pinggy.link/v1/models
Advanced Configuration and Security
Secure Your API with Authentication:
To restrict access, append a username and password to your SSH command:
ssh -p 443 -R0:localhost:3000 -t a.pinggy.io b:username:password
This adds an authentication layer, ensuring only authorized users can access the endpoint.
With Pinggy Pro, you can configure a custom domain for your LLM service, improving branding and ease of access.
Real-World Use Cases
Collaborative AI Development
Teams can share an OpenLLM instance for testing and model fine-
tuning.
Remote developers can integrate AI models into applications without
local installations.AI-Powered Customer Support & Content Generation.
Expose OpenLLM’s API to build chatbots for businesses.
Use LLMs for automated content creation in marketing and social
media.Academic & Research Workflows
Researchers can collaborate on AI models without exposing internal infrastructure.
OpenLLM can be used for real-time experiments and AI benchmarking.
Troubleshooting & Optimization
Model Loading Issues?
Ensure your machine meets the hardware requirements (RAM/GPU availability).
Try using a lower-precision model:
openllm run llama3.2:1b-instruct-ggml-fp16-linux --quantize int4
Connection Timeouts?
For unstable networks, use Pinggy’s persistent tunnel mode:
while true; do
ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:3000 a.pinggy.io;
sleep 10;
done
Conclusion:
Combining OpenLLM for model deployment with Pinggy for secure remote access creates a straightforward and effective solution for AI developers. It enables full control over models, remote access without infrastructure complexity, and enhanced security with authentication and custom domains.
Top comments (0)