Cold Start In Search

Anshu Kumar
4 min readSep 10, 2022

--

Cold Start and Approaches to handle cold start in Search

In this article I will try to cover following

  • What and why its a problem.
  • What are the sources of Cold Start.
  • Approaches to handle Cold Start

Let me set the context before jumping on Cold Start definition

ML based solutions try to model either Users(actions/behaviors) or Item(features). Learning is always dominated by majority signal in data.(If trained properly 😉). Cold start items have weak impressions(views, added to cart, orders)

Cold Start : When a User does not carry enough actions, or an Item does not have enough impressions, We call it Cold Start User or Cold Start Item respectively. A Search System would not be able to surface these items. Why??

Search System is composed of Recall(Solr/Elastic) + Ranker(LTR). Cold Start items can be selected at Recall state based on content relevance. But lack of enough impressions(views, added to cart, orders), Ranker would keep these item low in the search results.

Now can you smell a bigger problem?Feedback Loop (Rich Getting Richer) (Causality Dilemma), These items would never make to top results and hence never get more impressions. Rest you know 😉

Cold Start is broadly classified into these type

  • Item Cold Start (Product) : New item listed onto the platform.
  • User Cold Start: New users signed-up, not able to show personalized search results.
  • New Company/Community : Both items and Users are new to platform.

Sources of Cold Start:

  • Listing of New Items.
  • Absence of Semantic Search: Poor item description may lead to discoverability problem. Because of weak syntactical content relevance these item neither make to recall and hence nor relevance.
  • Feedback Loop : No behavioral data causes poor ranking which in turn results in new products having a reduced likelihood of accruing behavioral data.

Now we know cold start can happen either of new User/Item or the whole platform is new.

Interestingly Cold start can also happen if user interest keep changing frequently. Ever changing interests becomes difficult to model because of not enough user impressions across items or item categories.

User interest is frequently changing

Approaches to handle Cold Start

Beauty of Cold Start problem is that, we can use 3 flavors of Machine Learning to address it.

The thing that we have to keep in mind is, cold start items are new and do not have enough impressions, but we still want to show them into search results.

Unsupervised Techniques

Item Side Information (Similar Item)

Reinforcement Learning Techniques

Explore-Exploit: Cold start has been tackled using this category of techniques. Idea is to mix-up cold start item into search results. Using RL we try to find what % of cold items should be mixed.

Too Less % would not solve the cold start, Too High % cold items would result into bad user experience and revenue.

Popular RL Algorithms that has been used are following

  • Multi-Arm bandit algorithms to balance Explore-Exploit trade-off.
  • Epsilon Greedy, Contextual Epsilon Greedy.
  • UCB, Thompson Sampling

Supervised Technique

Train a model on current items features. Learned model would predict the relevant features for new/long-tail products.

Given item features X we want to learn y(ranking features).

X = [Title, Product description, Category, content quality…]

y = [predicted conversion, predicted impression, predicted revenue]

y would be the input features to the ranker model.

Trained Model will predict the feature for Ranking.

Some Implementation I found useful

Final Words:

Cold start problem is well researched problem, and almost always exist at any organization.

It has huge impact in Customer Experience, Merchant Retention and Revenue.

I find this problem interesting, Have you solved Cold Start differently? Let me know in the comments.

Thank You for reading this far!!

--

--

Anshu Kumar
Anshu Kumar

Written by Anshu Kumar

Data Scientist, Author. Building Semantic Search and Recommendation Engine.

No responses yet