Elasticsearch Highlighting with Kotlin

Manserpatrice
Nerd For Tech
Published in
4 min readOct 6, 2022

--

This week, I had to add highlighting functionality to the search field of my project. Currently, whenever I click on a search result on the frontend, it opens the corresponding documentation. These documentations are really long and therefore it is taking a long time to find the section in the documentation that you are looking for. To solve this issue I needed to highlight the best matches in the documentation, so that you can see it on first sight.

Highlighting with KT-Search

KT-Search is a great Kotlin client for ElasticSearch. Even if it is not officially supported, it provides most features of the official Java client. Highlighting, sadly, is not supported with kt-search. Therefore, we need to write the functionality ourselves.

The end result of this tutorial will be a ElasticSearch Hit that contains informations about the highlighted terms.

The solution to this problem is structured in three parts:

  1. Creating the Query
  2. Receiving the Response
  3. Deserializing the SearchResponse

Creating the Query

First of all, we need to find a way to send a request to ElasticSearch that contains the query with highlighting. Since there is no functionality for highlighting itself, we have to use the rawBody feature of kt-search.

Since this query can get pretty long, I created a separate function for it.

Please keep an eye on the lower part with the highlight block:

  • The pre- and post-tags wrap around the matching terms. If you don’t define them, ElasticSearch uses <em> and </em> for it.
  • The fields block defines, from which field, the best matches should be returned.
    ElasticDocument is a data class with a value called content . I use this, because I can then refactor via IntelliJ and don’t have to change the String manually, every time i change the name of this attribute.
  • fragment_size defines the maximum length of the fragments. Fragments are a combination of the matched term with some context around it.
  • number_of_fragments defines how many of these fragment Strings are getting returned in one request.
  • The fragmenter and type should be chosen based on this information.

Receiving the response

Since we now have the query for highlighting, we could use the search() function of the library right? No.

The SearchResponse of the search function does not contain highlighting. This means that we have to do it ourselves.

This function only returns the JSON as a String. It uses the restClient of the SearchClient to create a new post request. The path is set together by the name of the index and the function that we want to execute on ElasticSearch. In our case, it is _search .

The rawBody uses the JSON String that we defined in the createQuery() function.

Deserializing the SearchResponse

Lastly, we need to deserialize the String with the search results into a list of hits. Normally this would be handled by kt-search but since the default SearchResponse of kt-search doesn’t contain the highlighting info, we need to write it ourselves.

These are the Serializable classes that are needed for the deserialization.

Note that the Hit class filters out the highlighted terms, so that you don’t have the <highlighted> wrappers in your search results.

Now we can combine all these steps into our own search function, so that we can call it just like we would normally do.

Since we annotated all the classes from above with @Serializable , we can now call the decodeFromString method of the DEFAULT_JSON object provided by kt-search.

This gives us back the SearchResponse with its nested hits. We can access them by calling searchResult.hits .

Finally we have hits that contain highlighting information.

Reflection

What went good

In my opinion, the actual implementation of the self written logic to retrieve highlighting information went pretty good. I did not have that much problems since I could test the behaviour with unit tests.

What needs improvement

The biggest problem that I had was understanding that the SearchResponse of kt-search really does not contain any information about highlighting. I was confused, because I sent a query with a highlighting request, but I didn’t get any result from it. I finally understood it when I sent a request via curl and saw, that it has something todo with the SearchResponse of kt-search. Next time, I would read the JavaDoc of the response object first, so that I can see on first sight that the field I’m looking for is not contained in it.

--

--