Using GPT-4 for Free Through Github Models (with Limitations)

Using GPT-4 for Free Through Github Models (with Limitations)

Daily short news for you
  • countless.dev is quite an interesting website as it compares the costs of using LLM models from different providers.

    Here, you can see all the popular large language models from providers such as OpenAI, Azure, Mistral... The pricing for each 1M tokens in input/output. Or you can even compare them with each other to find the cheapest provider or model depending on your usage purpose.

    » Read more
  • 1-2 years ago, Kubernetes (k8s) was suddenly mentioned as a phenomenon, probably because it was so powerful that everyone wanted to learn and use it. It is a tool for "Automating deployment, scaling, and management of containerized applications" - Yes! Sounds impressive, right? 🤤

    Back then, I was passionate about Docker, especially Docker Swarm, which is similar to k8s but on a smaller scale. Docker Swarm seems much less complex than k8s. And that was good because it was already meeting my usage needs at that time, while also reducing complexity and hassle.

    However, in the last 1-2 months, articles titled "Do you really need Kubernetes?" have been popping up more frequently. Indeed, k8s is very powerful but also overly complex. Why bother using a knife "to butcher an ox to kill a chicken"? Unless you anticipate the complexity when wanting to apply a technology. Another thing is that k8s consumes resources and power tremendously; getting it up and running is not just about setting it up, but you also need a lot of knowledge 😨.

    Oh, perhaps part of it is because the "big players" are focusing on Serverless, reducing the complexity in operations and instead focusing on application development.

    Besides that, the name WASM is also being mentioned a lot 🤔

    Do you really need Kubernetes in your company/startup? | dev.to

    Do You Really Need Kubernetes?

    » Read more
  • I used to praise Serverless endlessly, claiming that it optimizes costs down to 0 VND to maintain blogs and such. It's true! But alongside that, Serverless also has its dark sides that are worth paying attention to!

    The other day, I spent a whole day tracking down and fixing an issue just because I called the built-in function of Cloudflare KV. Specifically, the list function with a limit of 1000 - which means that a single call returns 1000 keys of KV. However, life is not as dreamy as it seems. The number 1000 is only theoretical. Sometimes it returns a few hundred, sometimes just a few dozen, and at times it barely returns a handful. This caused a congestion in the entire system. Well, it wasn't exactly congestion; rather, the system was "idle," having nothing to do, when in reality it should have been processing hundreds of thousands of keys 🥲

    » Read more

The Issue

As a social media user, it is not difficult to see posts asking if anyone wants to share a ChatGPT Plus account. By subscribing to the Plus version, you can use the latest and most powerful models, such as gpt-4, gpt-4o, or o1-preview... This shows that the demand for using these models is very high. Many users do so for research and study purposes. Or simply to experience something new, to see what is outstanding compared to what is already known.

With a price of $20 a month, I am sure many people will hesitate. We have too many bills to pay in a month, and there are many more important things to spend on. Therefore, if someone is willing to "chip in" to share the costs, it would be great; the cost would be reduced exponentially. If five people share, each person only needs to pay $4. However, this also creates many new problems. Currently, ChatGPT limits the number of queries within a certain time frame. Therefore, if someone "accidentally" uses up all the queries, the next user must wait until the limit resets. Not to mention the financial risks of sharing with strangers. Thus, it is best to only share with people you know.

Previously, I wrote an article about how to use GPT-4 for free through Github Copilot. But in the end, this was just an unofficial "trick," and I warned readers that there was a risk of their Copilot accounts being "flagged." As of now, it is uncertain whether this method still works, but readers should not follow it anymore. According to statistics, that article still has many readers. Wow! The allure of GPT-4 is remarkable!

On the occasion that Github just public previewed Github Models for anyone with a Github account to access their models, including many models of GPT and other well-known models like Llama, Mistral... In this article, I will guide everyone on how to use GPT-4 for free in a way that is as similar to ChatGPT as possible. And of course, it is not as good as the official version, with many limitations included, but it is still worth experiencing.

Github Models

Github Models is a feature that Github supports developers to use large language models (LLMs) for free for testing purposes - as they say. After completing the testing phase, users must deploy their applications with other providers because the testing models have many limitations in speed and context. However, if only used for personal purposes, it is acceptable.

We all know the website ChatGPT is for everyone - especially free users - to chat with gpt-4o, 4o-mini... After a few exchanges, you will run out of queries and be "downgraded" to the gpt-3.5 model. ChatGPT saves the history of exchanges, allowing users to review past conversations. This is also the easiest way for users to interact with GPT models.

If you are not a developer, or perhaps are a developer but have not noticed, OpenAI - the company behind ChatGPT provides an API to interact with their models. Simply put, instead of using the web chat, you can completely call the API to do the same thing. For example, a conversation in the API would look like this:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      },
      {
        "role": "assistant",
        "content": "Hi there! How can I assist you today?"
      },
      {
        "role": "user",
        "content": "What is the capital of Vietnam?"
      }
    ]
  }'

You will see that an API call includes all the content of the conversation. This means that to continue the conversation, you need to send back all the previous content between you and GPT. This explains that it is the context of the conversation. GPT does not store state, so the best way to help it understand the content being discussed is to resend all previous content.

The special thing is that by calling the API, you can use the latest models without necessarily subscribing to the $20 GPT Plus package. However, API calls cost money, and the billing is very different from the web version. You pay more if you call the API more often, and the length of the conversation also incurs more costs since the API charges based on input/output tokens. Not to mention, the input/output costs differ depending on the model you use.

"Wow, this is really confusing. Just calculating the costs is already so complicated that it might be easier to just subscribe to Plus 😆." Surely many people will think like that after reading this, but it's true! You spend money to buy satisfaction; that is the pinnacle of sales art. If you spend a little time and get to use it for free, then it forces us to read and experiment more.

Below is the pricing table based on input/output tokens for the gpt-4o mini model.

gpt-4o mini pricing

Input tokens are the number of characters sent, while Output tokens are the number of characters that the model responds with. This means that the more you send or the more the model responds, the more money you spend. Tokens are calculated based on a few principles; basically, you can think of them as roughly equivalent to words (1 token = 3/4 word), meaning 3 words will correspond to 3 tokens. A more standard formula can be found at Tokenizer | OpenAI Platform.

After registering an account on Github, access Marketplace Models and you will see a list of models available for use. Click on any model, for example, here is OpenAI GPT-4o mini. Click on the Playground button at the top right of the screen. Here you will see a chat interface very similar to ChatGPT. Try starting a conversation as usual. Congratulations, you are now using the gpt-4o mini for free.

gtp-4o mini Playground

However, this method is a bit inconvenient as it does not save the conversation history, and the interface is not very user-friendly. Next, I will guide you through a much more "professional" way.

Lobe Chat

Lobe Chat is an open-source project that provides an interface and features to interact with large language models. Simply put, Lobe Chat provides a chat interface similar to ChatGPT.

Lobe Chat helps us manage conversations. In addition, it also tries to integrate many other features, but in the scope of this article, I will not mention them yet. Lobe Chat interacts with LLMs via API, so you need to configure the connection for calls to the corresponding model server. Here I will guide you to configure for Github Models.

First, you need to obtain Github Tokens for basic authentication. Go to Personal access tokens | Github. Click on "Generate new token (classic)." Enter a name for the token and choose the expiration time for the token as you wish. You do not need to select any scopes. Click on Generate token at the bottom and save the token you just received.

Github Tokens

Next, install Lobe Chat on your computer. You can deploy Lobe Chat to many cloud services like Vercel, Zeabur... with just one click because Lobe is a web-based application. Or if using Docker, simply run one command.

$ docker run -d -p 3210:3210 --name lobe-chat lobehub/lobe-chat

Access http://localhost:3210 to see the interface of Lobe Chat. Next, we need to set up the API configuration for it to work.

Lobe Chat Welcome

Click on the Lobe Chat avatar at the top left, select "Settings," click on the "Language Model" tab, and you will see the OpenAI setup screen. You need to fill in "API Key," "API Proxy Address," and "Model List." The API Key is the Github token you just created above, and the API Proxy Address at the time of writing this article is:

https://models.inference.ai.azure.com

In the "Model List" box, enter the names of the models you want to use. To know the exact name of the model, visit each model on Github Models, click on Playground, and switch to the "Code" tab. For example, here I enter "gpt-4o," "gpt-4o-mini." Click on the "Check" button to verify if the configuration is correct. If "Check Passed" appears, it has been successful.

Lobe Chat Settings

Now let's try to create a new conversation.

Chattings

Wonderful!

Limitations

There are a few notable limitations that you need to be aware of while using it.

First is the limitation on chat speed and the number of API calls per day. Github labels each model to determine these limitations. For example, looking at the table below.

Models Limits

These labels are found in the details of the model. For example, for gpt-4o labeled "High," it means that in one day, you can only make 50 requests - equivalent to 50 questions, and you are limited to 10 requests per minute as well as a maximum of 2 requests concurrently.

Labels Limits

Additionally, pay attention to the "Tokens per request" which is the amount of tokens in/out for each request. For example, with gpt-4o it is "8000 in, 4000 out," meaning the maximum number of words sent is 8000 words, while the number of words that the model responds with is only a maximum of 4000. Meanwhile, gpt-4o supports input tokens up to 131k and output up to 16k. Clearly, there are too many limitations. Hopefully, in the future, Github will "expand" these limits.

You also cannot create images or audio. Lobe Chat only stores conversations on your computer, so you can only review the conversation created on your machine or switch to configuration on a cloud service.

Although there are still many limitations, for those who want to experience or use it infrequently, this is a "lifesaver" solution. If you exhaust the limits of this model, you can switch to another model to continue using. Besides GPT, there are still many powerful models waiting for you to explore.

Do you find this method interesting, or do you have any methods you want to share? Please leave your comments below the article. Thank you.

Premium
Hello

5 profound lessons

Every product comes with stories. The success of others is an inspiration for many to follow. 5 lessons learned have changed me forever. How about you? Click now!

Every product comes with stories. The success of others is an inspiration for many to follow. 5 lessons learned have changed me forever. How about you? Click now!

View all

Subscribe to receive new article notifications

or
* The summary newsletter is sent every 1-2 weeks, cancel anytime.
Author

Hello, my name is Hoai - a developer who tells stories through writing ✍️ and creating products 🚀. With many years of programming experience, I have contributed to various products that bring value to users at my workplace as well as to myself. My hobbies include reading, writing, and researching... I created this blog with the mission of delivering quality articles to the readers of 2coffee.dev.Follow me through these channels LinkedIn, Facebook, Instagram, Telegram.

Did you find this article helpful?
NoYes

Comments (1)

Leave a comment...
Avatar
Trịnh Cường1 week ago
hay fade này, cảm ơn cậu đã chia sẻ
Reply
Avatar
Xuân Hoài Tống6 days ago
Ui cảm ơn bạn Cường nha 😚