Using Cloudflare Tunnel to Public Ollama on the Internet

Using Cloudflare Tunnel to Public Ollama on the Internet

Daily short news for you
  • Privacy Guides is a non-profit project aimed at providing users with insights into privacy rights, while also recommending best practices or tools to help reclaim privacy in the world of the Internet.

    There are many great articles here, and I will take the example of three concepts that are often confused or misrepresented: Privacy, Security, and Anonymity. While many people who oppose privacy argue that a person does not need privacy if they have 'nothing to hide.' 'This is a dangerous misconception, as it creates the impression that those who demand privacy must be deviant, criminal, or wrongdoers.' - Why Privacy Matters.

    » Read more
  • There is a wonderful place to learn, or if you're stuck in the thought that there's nothing left to learn, then the comments over at Hacker News are just for you.

    Y Combinator - the company behind Hacker News focuses on venture capital investments for startups in Silicon Valley, so it’s no surprise that there are many brilliant minds commenting here. But their casual discussions provide us with keywords that can open up many new insights.

    Don't believe it? Just scroll a bit, click on a post that matches your interests, check out the comments, and don’t forget to grab a cup of coffee next to you ☕️

    » Read more
  • Just got played by my buddy Turso. The server suddenly crashed, and checking the logs revealed a lot of errors:

    Operation was blocked LibsqlError: PROXY_ERROR: error executing a request on the primary

    Suspicious, I went to the Turso admin panel and saw the statistics showing that I had executed over 500 million write commands!? At that moment, I was like, "What the heck? Am I being DDoSed? But there's no way I could have written 500 million."

    Turso offers users free monthly limits of 1 billion read requests and 25 million write requests, yet I had written over 500 million. Does that seem unreasonable to everyone? 😆. But the server was down, and should I really spend money to get it back online? Roughly calculating, 500M would cost about $500.

    After that, I went to the Discord channel seeking help, and very quickly someone came in to assist me, and just a few minutes later they informed me that the error was on their side and had restored the service for me. Truly, in the midst of misfortune, there’s good fortune; what I love most about this service is the quick support like this 🙏

    » Read more

Problem

Hello readers of 2coffee.dev. Tet is just around the corner, have you prepared anything for yourself and your family yet? It seems to me that as the year ends, everyone gets busier. Since the beginning of the month, the traffic to the blog has decreased significantly. Sometimes it makes me anxious because I don't know where my readers have gone. Maybe they are taking an early Tet break, or the chatbot is too strong, or it could be due to the content not being engaging enough anymore. 😥

I must admit that in these last few weeks, I have been in the mindset of a busy person, not having much time to write regularly. It could be due to the nature of the job, combined with many issues to handle, so I no longer have the mental space to relax. But it’s okay; today I successfully configured Cloudflare Tunnel in conjunction with Ollama to "public" an API endpoint on the Internet - something I couldn't do a few weeks ago. I thought many people would need this, so I decided to write an article about it right away.

At first, I intended to write a short post in the Threads section, but then I realized it had been too long since I wrote a lengthy article, so I changed my mind. Can you believe it? A long article can be condensed into just a few short lines. Conversely, a short article can easily be made "flowery" enough to turn into a lengthy piece that many might dread. So why should one strive to write longer?

Wow! If I didn't say it, no one might know the reason. Writing is a way for me to relieve stress. By writing, I can connect with my readers, share, chat, or weave in stories and lessons I have learned. In other words, writing serves both as a form of relaxation and a means to interact with everyone.

Since launching the short article section Threads, I never expected so many people would be interested in it. Oh, but to say I didn't expect would be an exaggeration because I did a lot of research before implementing this feature. "Coding" a feature isn't hard; the challenge lies in how to operate it. Threads must ensure that the frequency of posts isn't interrupted; if I write an article infrequently, would anyone even come back to check for updates? This inadvertently creates pressure on how to both gather and summarize interesting and prominent news for readers. Many days I got too busy and forgot to write, and sure enough, the next day I had to publish a make-up post to keep my credibility intact. 😆

I know that many people enjoy reading, and I am one of those who loves writing. Sometimes reading isn't always in the mindset of being "chased by a deadline," on the way to find a solution, or learning something new... I believe that for many people, reading is similar to writing: it is for relaxation. Relaxing while gaining knowledge and experience is indeed a two-for-one deal, isn't it? 😁

I've talked too much already; let's get to the main point. Today I successfully configured Cloudflare Tunnel along with Ollama to publicize an API endpoint on the Internet. From there, anyone can access it without being confined to the local server (localhost) anymore. After reviewing the documentation for Ollama, it turned out to be simpler than I thought!

Cloudflare Tunnel & Ollama

If you don't know about Cloudflare Tunnel, please refer back to the article Adding a "Tunnel Locally" Tool - Bringing Local Servers to the Internet. This is a tool that helps us map local servers to the Internet, effectively turning your computer into a server that anyone with an IP address or domain name can access.

Ollama is a tool that allows us to run some large language models (LLMs) on our computers with just a single command. It simplifies the installation and usage of models. The standout feature is that it supports APIs compatible with OpenAPI.

In a previous article, I mentioned creating a Tunnel through a six-step process - a bit lengthy, right? In fact, Cloudflare Tunnel has a much quicker startup process, requiring only the installation of cloudflared and then using a single command:

$ cloudflared tunnel --url http://localhost:11434  
...  
Your quick Tunnel has been created! Visit it at (it may take some time to be reachable):  
https://tennis-coordination-korea-wv.trycloudflare.com  
....  

Immediately, you will see a random address that cloudflared has generated. It maps to the address http://localhost:11434 on your computer. When accessed from another machine at https://tennis-coordination-korea-wv.trycloudflare.com, we see the same result as accessing http://localhost:11434 on the local machine.

The above is just an example of mapping any port on your machine to the Internet; for Ollama or many other tools, additional configuration for the hostname in the headers is required. In the Ollama documentation, it instructs:

$ cloudflared tunnel --url http://localhost:11434 --http-host-header="localhost:11434"  

After that, try calling the API using the new URL. Note that you must run the llama3.2 model from Ollama beforehand.

curl https://tennis-coordination-korea-wv.trycloudflare.com/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?"  
}'  

Wonderful! At this point, everything is done, and you have an API endpoint pointing to Ollama on the local server that anyone can access. However, if you have a domain in Cloudflare and want to maintain a fixed address like api-ollama.2coffee.dev, you need to configure it according to the six steps.

Keeping a Fixed Domain

It's very simple; after completing step 4 in the article Adding a "Tunnel Locally" Tool - Bringing Local Servers to the Internet, modify the contents of the config.yml file as follows:

tunnel: <tunnel-uuid>  
credentials-file: path/to/.cloudflared/.json  

ingress:  
  - hostname: api-ollama.2coffee.dev  
    service: http://localhost:11434  
    originRequest:  
      httpHostHeader: "localhost:11434"  
  - service: http_status:404  

Then run:

$ cloudflared tunnel run <tunnel-uuid>  

Although this method can help you create an API address similar to OpenAI's ChatGPT, it has many limitations, such as depending on the machine configuration and the model being used. Ollama can only handle one query at a time, so making continuous or simultaneous requests will not be efficient.

Premium
Hello

Me & the desire to "play with words"

Have you tried writing? And then failed or not satisfied? At 2coffee.dev we have had a hard time with writing. Don't be discouraged, because now we have a way to help you. Click to become a member now!

Have you tried writing? And then failed or not satisfied? At 2coffee.dev we have had a hard time with writing. Don't be discouraged, because now we have a way to help you. Click to become a member now!

View all

Subscribe to receive new article notifications

or
* The summary newsletter is sent every 1-2 weeks, cancel anytime.

Comments (1)

Leave a comment...
Avatar
Ẩn danh2 weeks ago
hay bạn, đúng cái mình đang cần, many thanks
Reply
Avatar
Xuân Hoài Tống2 weeks ago
Rất vui vì nó giúp ích được cho bạn 🙏