The problem of searching data in the fastest and most accurate way has always been a timeless question for developers. Depending on the purpose, the problem, and the available resources, we can choose different tools and methods.
For example, when working with a small dataset, you can use the LIKE operator in SQL. However, when the data grows larger, LIKE is no longer an optimal method. At that point, you can switch to using the fulltext search modules available in the database system you are using. However, these modules are only temporary solutions as they may not provide all the powerful features compared to other fulltext search tools available in the world.
Elasticsearch and Apache Solr are two very powerful libraries that are widely used by the community. However, they require "modest" hardware and are not easily accessible for those who develop projects with limited budgets.
Redisearch, based on Redis, provides a very powerful search engine with minimal resource consumption, which you can easily integrate into your projects. For those who don't know, Redis is a key-value database where data is stored in random-access memory (RAM) for fast access, often used as a cache. Although it is small, Redisearch is no less powerful than its predecessors. In this article, let's see what Redisearch can do.
The first thing to do is to create an index for searching. The index serves as a way to declare to the search engine how your data should be processed for optimal performance.
Creating an index is very simple in Redisearch. For example, I create an index for searching articles with three fields: title, content, and created_at, corresponding to the title, content, and creation date of the article.
FT.CREATE article ON HASH PREFIX 1 article: SCHEMA title TEXT WEIGHT 5.0 content TEXT created_at NUMERIC SORTABLE
My index is named "article". In the "title" field, I set WEIGHT = 5
to prioritize search results in the "title" field over the "content" field. "created_at" is declared as SORTABLE to enable sorting of search results. If SORTABLE is not declared, you won't be able to sort the search results.
Alright, after creating the index, let's learn how to search the data.
Before getting into the search syntax, you need to know some search principles in Redisearch:
hello|world
. hello -world
. You can also combine multiple NOT words by combining with OR, for example, searching in the title field for items that do not contain "hello" or "world": -@title:(hello|world). @field:query
, for example, @title:hello world
. [min max]
syntax. {tag1 | tag2 | ...}
syntax. %text
. There are a few more principles that you can refer to at Search Query Syntax.
And finally, a holy cheatsheet to compare some data search commands between SQL and Redisearch:
First, let's add some data to Redis using the index "article" we created above. For simplicity and better visibility, I will add some small data for easy observation.
HSET article:1 url "url-1" title "article number one" content "content of article number one" created_at 1630245601
HSET article:2 url "url-2" title "article number two" content "content of article number two" created_at 1630245602
HSET article:3 url "url-3" title "article number three" content "content of article number three" created_at 1630245603
Search for all records containing the term "article":
FT.SEARCH article "article"
Search for all records where the content contains the term "content":
FT.SEARCH article "@content:content"
Search for all records where the title contains the term "article" and the content does not contain the term "number one":
FT.SEARCH article "@title:article -@content:number one"
Search for all records where the title contains the terms "number one" or "number two" and the content does not contain the term "number three", sorted in descending order of "created_at":
FT.SEARCH article "@title:(number one | number two) -@content:number three" SORTBY created_at DESC
Stop words are terms that Redisearch will ignore in the search as they are too common and do not provide value in the search. For example, a, is, the... If these words are indexed, they take up a lot of storage space and consume CPU resources during search.
Because Redisearch is designed for all users, it only includes a default set of English stop words. However, you can translate them to Vietnamese and add them to the dictionary, or you can also add words that you don't want to use for search.
Stop words are declared when creating the index. In the example below, I'm adding 2 words "thì" and "là" to the stop words of the "article" index:
FT.CREATE article STOPWORDS 2 thì là ON HASH PREFIX 1 article: SCHEMA title TEXT WEIGHT 5.0 content TEXT created_at NUMERIC SORTABLE
Note: Since stop words must be added when creating the index, if you already have an index, you must delete it before adding them again. Use the FT.DROPINDEX
command to delete the index. By default, when deleting an index, the data of the index is not deleted. Then we proceed to re-create the index as usual.
If you no longer want to use stop words, set STOPWORDS 0
in the index creation command.
Tokenization and escaping are understood as encoding the input and query characters. The data when passed to Redisearch must go through a processing step, such as removing whitespace, special characters... Here are some tokenization rules in Redisearch:
hello\-world
, and when searching, I also have to use hello\-world
to search. Those are some principles of the TEXT data field. For TAG data field, there are some differences which I will discuss in a future article.
The Highlighting API allows us to manipulate the discovered areas of data in Redisearch, such as inserting additional characters to highlight the results...
To wrap the search result in a HTML tag, for example, opening/closing tags around it, we use the HIGHLIGHT
option:
FT.SEARCH article "article" HIGHLIGHT TAGS <b> </b>
The search result, if found in all fields, will be inserted into the <b> </b>
tag. If you want to specify a specific field to use HIGHLIGHT, you can add the FIELDS option:
FT.SEARCH article "article" HIGHLIGHT FIELDS 1 title TAGS <b> </b>
In addition, Redisearch also supports displaying the context of the content we are searching for. For example, the original sentence "estacks is a programming blog", when searching for the word "blog", Redisearch will display "...estacks is a blog about programming...".
FT.SEARCH article "article" SUMMARIZE FIELDS 1 content
You can also combine both HIGHLIGHT and SUMMARIZE in one query.
Through this article, I hope you understand what Redisearch is used for, whether it is suitable or necessary for your upcoming projects, and the basic commands to get started. Keep learning new tools to have more ways to solve problems!
The secret stack of Blog
As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!
Subscribe to receive new article notifications
Comments (0)