Searching fulltext in redisearch

Daily short news for you

A library brings a lot of motion effects to your website: animejs.com

Go check it out, scroll a bit and your eyes will be dazzled 😵‍💫

» Read more
A repository that compiles a list of system prompts that have been "leaked" on the Internet. Very useful for anyone researching how to write system prompts. I must say they are quite meticulous 😅

jujumilk3/leaked-system-prompts

» Read more
For over a week now, I haven't posted anything, not because I have nothing to write about, but because I'm looking for ways to distribute more valuable content in this rapidly exploding AI era.

As I shared earlier this year, the number of visitors to my blog is gradually declining. When I looked at the statistics, the number of users in the first six months of 2025 has dropped by 30% compared to the same period last year, and by 15% compared to the last six months of 2024. This indicates a reality that users are gradually leaving. What is the reason for this?

I think the biggest reason is that user habits have changed. They primarily discover the blog through search engines, with Google being the largest. Almost half of the users return to the blog without going through the search step. This is a positive signal, but it's still not enough to increase the number of new users. Not to mention that now, Google has launched the AI Search Labs feature, which means AI displays summarized content when users search, further reducing the likelihood of users accessing the website. Interestingly, when Search Labs was introduced, English articles have taken over the rankings for the most accessed content.

My articles are usually very long, sometimes reaching up to 2000 words. Writing such an article takes a lot of time. It's normal for many articles to go unread. I know and accept this because not everyone encounters the issues being discussed. For me, writing is a way to cultivate patience and thoughtfulness. Being able to help someone through my writing is a wonderful thing.

Therefore, I am thinking of focusing on shorter and medium-length content to be able to write more. Long content will only be used when I want to write in detail or delve deeply into a particular topic. So, I am looking for ways to redesign the blog. Everyone, please stay tuned! 😄

» Read more

Problem

The problem of searching data in the fastest and most accurate way has always been a timeless question for developers. Depending on the purpose, the problem, and the available resources, we can choose different tools and methods.

For example, when working with a small dataset, you can use the LIKE operator in SQL. However, when the data grows larger, LIKE is no longer an optimal method. At that point, you can switch to using the fulltext search modules available in the database system you are using. However, these modules are only temporary solutions as they may not provide all the powerful features compared to other fulltext search tools available in the world.

Elasticsearch and Apache Solr are two very powerful libraries that are widely used by the community. However, they require "modest" hardware and are not easily accessible for those who develop projects with limited budgets.

Redisearch, based on Redis, provides a very powerful search engine with minimal resource consumption, which you can easily integrate into your projects. For those who don't know, Redis is a key-value database where data is stored in random-access memory (RAM) for fast access, often used as a cache. Although it is small, Redisearch is no less powerful than its predecessors. In this article, let's see what Redisearch can do.

Creating an index

The first thing to do is to create an index for searching. The index serves as a way to declare to the search engine how your data should be processed for optimal performance.

Creating an index is very simple in Redisearch. For example, I create an index for searching articles with three fields: title, content, and created_at, corresponding to the title, content, and creation date of the article.

FT.CREATE article ON HASH PREFIX 1 article: SCHEMA title TEXT WEIGHT 5.0 content TEXT created_at NUMERIC SORTABLE

My index is named "article". In the "title" field, I set WEIGHT = 5 to prioritize search results in the "title" field over the "content" field. "created_at" is declared as SORTABLE to enable sorting of search results. If SORTABLE is not declared, you won't be able to sort the search results.

Alright, after creating the index, let's learn how to search the data.

Search principles

Before getting into the search syntax, you need to know some search principles in Redisearch:

When searching for a phrase, for example "hello world", you are simply looking for sections that contain both words "hello" AND "world".
If you want to search for the exact phrase "hello world", you need to put it in double quotes (""), for example, "hello world".
When you want to search for the form that contains either "hello" OR "world", you separate them with the | character, for example hello|world.
When you want to search for NOT, you use the - character. For example, searching for items that contain "hello" but not "world" would be hello -world. You can also combine multiple NOT words by combining with OR, for example, searching in the title field for items that do not contain "hello" or "world": -@title:(hello|world).
By default, if no specific field is specified for searching, Redisearch will search in all fields of the index. To specify the field, you use the syntax @field:query, for example, @title:hello world.
Searching on a NUMERIC field must use the [min max] syntax.
Searching on a TAG field must use the {tag1 | tag2 | ...} syntax.
Fuzzy matching is a search suggestion feature. For example, when you type a word on Google, it suggests the next words. The syntax is %text.
...

There are a few more principles that you can refer to at Search Query Syntax.

And finally, a holy cheatsheet to compare some data search commands between SQL and Redisearch:

Search syntax

First, let's add some data to Redis using the index "article" we created above. For simplicity and better visibility, I will add some small data for easy observation.

HSET article:1 url "url-1" title "article number one" content "content of article number one" created_at 1630245601
HSET article:2 url "url-2" title "article number two" content "content of article number two" created_at 1630245602
HSET article:3 url "url-3" title "article number three" content "content of article number three" created_at 1630245603

Search for all records containing the term "article":

FT.SEARCH article "article"

Search for all records where the content contains the term "content":

FT.SEARCH article "@content:content"

Search for all records where the title contains the term "article" and the content does not contain the term "number one":

FT.SEARCH article "@title:article -@content:number one"

Search for all records where the title contains the terms "number one" or "number two" and the content does not contain the term "number three", sorted in descending order of "created_at":

FT.SEARCH article "@title:(number one | number two) -@content:number three" SORTBY created_at DESC

Stop Words

Stop words are terms that Redisearch will ignore in the search as they are too common and do not provide value in the search. For example, a, is, the... If these words are indexed, they take up a lot of storage space and consume CPU resources during search.

Because Redisearch is designed for all users, it only includes a default set of English stop words. However, you can translate them to Vietnamese and add them to the dictionary, or you can also add words that you don't want to use for search.

Stop words are declared when creating the index. In the example below, I'm adding 2 words "thì" and "là" to the stop words of the "article" index:

FT.CREATE article STOPWORDS 2 thì là ON HASH PREFIX 1 article: SCHEMA title TEXT WEIGHT 5.0 content TEXT created_at NUMERIC SORTABLE

Note: Since stop words must be added when creating the index, if you already have an index, you must delete it before adding them again. Use the FT.DROPINDEX command to delete the index. By default, when deleting an index, the data of the index is not deleted. Then we proceed to re-create the index as usual.

If you no longer want to use stop words, set STOPWORDS 0 in the index creation command.

Tokenization and Escaping

Tokenization and escaping are understood as encoding the input and query characters. The data when passed to Redisearch must go through a processing step, such as removing whitespace, special characters... Here are some tokenization rules in Redisearch:

Characters ,.<>{}[]"':;!@#$%^&*()-+=~ and whitespace (space) will break the text into tokens for indexing. For example, a text "hello-world...1" will be encoded as [hello world 1].
If you want to bypass the above rules, i.e., you want Redisearch to index special characters and whitespace, you need to add a backslash () before each special character. For example, if I want to include the phrase "hello-world" in Redisearch, I need to modify the text to hello\-world, and when searching, I also have to use hello\-world to search.
The underscore (_) character is not affected by tokenization and escaping.
Repeated whitespace and characters in section one are removed during the query. If you want to use them, you must add a backslash before them.
Latin characters (A-Z a-z) are converted to lowercase.

Those are some principles of the TEXT data field. For TAG data field, there are some differences which I will discuss in a future article.

Highlighting Result

The Highlighting API allows us to manipulate the discovered areas of data in Redisearch, such as inserting additional characters to highlight the results...

To wrap the search result in a HTML tag, for example, opening/closing tags around it, we use the HIGHLIGHT option:

FT.SEARCH article "article" HIGHLIGHT TAGS <b> </b>

The search result, if found in all fields, will be inserted into the <b> </b> tag. If you want to specify a specific field to use HIGHLIGHT, you can add the FIELDS option:

FT.SEARCH article "article" HIGHLIGHT FIELDS 1 title TAGS <b> </b>

In addition, Redisearch also supports displaying the context of the content we are searching for. For example, the original sentence "estacks is a programming blog", when searching for the word "blog", Redisearch will display "...estacks is a blog about programming...".

FT.SEARCH article "article" SUMMARIZE FIELDS 1 content

You can also combine both HIGHLIGHT and SUMMARIZE in one query.

Conclusion

Through this article, I hope you understand what Redisearch is used for, whether it is suitable or necessary for your upcoming projects, and the basic commands to get started. Keep learning new tools to have more ways to solve problems!

Premium

Me & the desire to "play with words"

Have you tried writing? And then failed or not satisfied? At 2coffee.dev we have had a hard time with writing. Don't be discouraged, because now we have a way to help you. Click to become a member now!

View all

Searching fulltext in redisearch

Problem

Creating an index

Search principles

Search syntax

Stop Words

Tokenization and Escaping

Highlighting Result

Conclusion

Upgrade to Premium

Premium

Premium Plus

Premium