While Our Elasticsearch Queries Gently Weep

Published in

Trendyol Tech

7 min readMar 29, 2022

Hello everyone! Here we, Melek Bilgin Tamtürk and I are going to tell the mistakes we’ve made when using Elasticsearch. We hope we can enlighten your journey and prevent you to do the same mistakes we did. So, let’s jump in!

Journey of using Elasticsearch

As Trendyol Order Management Team, our journey with Elasticsearch goes back to ancient times, like 2017. In a fast-paced e-commerce environment like Trendyol, four years are enough time for a code to become legacy. We had a legacy project that served a process of an order, and that’s where we used Elasticsearch as our main data source. Unfortunately, the experience of ES usage was not satisfying when the time came for the maintenance. Each year, order count was increasing exponentially and maintenance became harder and longer. However, our document model consisted of nested JSON, which reduced our search performance. In our new projects, we turned to Couchbase as a data source, but as our data increased, the load on the index nodes increased and we decided to do our filtering from Elasticsearch. Using Couchbase’s Elasticsearch connector, we transferred the data we need from Couchbase to Elasticsearch for filtering. Friends who are curious about the details of this process can also take a look at this article :)

Full-text search failure

When we decided to use CBES and created an index in ES, we got exactly the same settings as in our legacy application. Since we will be working with much smaller documents, we did not think that we would experience any performance loss. This was a huge mistake.

At first, everything was fine. The performance of the service was increased and we were all satisfied. Not much later, a new feature was requested by our business team. With this feature, our customers could search their orders either by the product’s name, brand’s name, or seller’s name. Since we were using Elasticsearch already, a full-text search feature like this pleased us.

First, we looked for searching in multiple fields. At that time, the most suitable option seemed like using multi-match queries. A sprint later, the development was completed, and tests were passed. So, we deployed to the production environment. And voila! We were able to search for a specific product, seller, or brand in past orders … in one and half seconds.

Even though throughput was low, response times were quite late. That was a huge disappointment. We didn’t expect to see that result. In the stage environment, everything was working so smoothly. What could have changed in the prod environment and made it worse? THE DOCUMENT COUNT! Order counts were highly differing between stage and prod environments. Because of the huge difference, we couldn’t experience this low performance on the stage.

Improvements

After a full-text search failure, we questioned both our queries and indexes. Since we were using the old index pattern we didn’t think that maybe it wasn’t fulfilling our needs. But that was a problem.

As we shared above, we were using an edge_n_gram analyzer. After some research, it was understood that it wasn’t the best choice for a full-text search. It was mostly used for autocompletion and in our case it was not only a good choice it was the wrong choice, but why?

Turkish is an agglutinative language. So when you agglutinate a letter to a word, that word becomes a completely different meaning. Let’s examine the word “kolluk” meaning inflatable armbands, and the adjective “kollu” meaning “sleeved”.

Analyzation of the world “kolluk” by edge_n_gram analyzer

So when one looking for an inflatable armband, she could find long-sleeved t-shirts… Yes, my friend, we learned a hard way.

After coming to agree about our analyzer choice were wrong, we made a custom analyzer with shingle filters.

the new index settings with shingle filter

Shingle filters work like edge_n_grams the only difference is while edge_n_grams create new tokens by letters, shingles create tokens by words. Using the shingle filter not only allowed us to search correctly but also helped us to give more efficient answers to searches containing adjective phrases.

More Improvements for Product Search

After the first improvement, the business team requested a new feature for customers to filter their orders. So, we had to create a new index while the other one was still working on the production environment. But there another problem occurred! There was no disk for the new index. Naturally, we requested more sources from our SRE team but they gently refused our request since the data center that we requested source from was not available for giving any more sources.
Since the additional source was not an option anymore, we had to give up on the shingle filter on the new index. We created the new index without any replica. Later, we took some risks and made a nighttime operation, first, we deleted the replicas of the old indexes later than, when the disk was available, changed the replica count of the new index to one, after replicas were created, we changed the alias and the operation completed successfully. As you can see my friend, that was stressful! While we were happy that any incident hadn’t occurred on nighttime operation, we faced the truth of without shingle filters our search performance was awful, again.

There were 2 areas that we could improve. These are product search in orders and filtering orders by their statuses. Let’s continue with product search improvement.

As we shared above, we started using the shingle filter as an analyzer. Shingle filter creates tokens by words like this. That means more data at indexing time and more words to search.

Analyzation of the “Cleanmaximizer Teknolojili Yedek Fırça Başlığı” by shingle analyzer

We decided not to use an analyzer and solve the problem in the search query. To do that, we changed our query to wildcard query. This query returns documents that contain terms matching the pattern.

When you type “yed fır”, it matches with ‘yedek fırça’ which is included in the product name. We don’t use an analyzer, so we saved gigabytes of data and improved our search performance!

The moment of index & search query change

Improving order status filter using Mapping & Query Change

Another improvement besides product search is the status filter of orders. Customers can filter their orders by the statuses of the orders like “shipping orders, cancels”.

We keep fields such as customerId, packet status to filter orders by status. Here we have seen 2 improvements to filtering that we can implement.

First, we learned that we need to query the filter context instead of the query context for the fields whose answer is yes/no and which do not need to calculate the score. In Elasticsearch, queries run in 2 contexts, Filter Context and Query Context. We can improve performance by running queries in the filter context that do not require score calculation. In addition, by default, queries made in the filter context are cached for performance. For a more detailed explanation, official documentation.

changing query from query context to filter context

Secondly, we learned that it is better for performance to use only the fields we will filter as ‘keyword’ instead of ‘text’. Text fields are analyzed before it is stored in the inverted index. If you do not search the field, we do not need to analyze so changing it to ‘keyword’ improves performance.

After changing query and mapping, status endpoint response time was reduced by about %40.

Conclusion

Just like great guitarist George Harrison said: “with every mistake, we must surely be learning!”, We learned from our mistakes and improved Elasticsearch queries by iterations. As data grows, the more problems we may encounter. We know that it will not be the last improvement. Hope our experience has been useful for you.