Cold Start In Search
Cold Start and Approaches to handle cold start in Search
In this article I will try to cover following
- What and why its a problem.
- What are the sources of Cold Start.
- Approaches to handle Cold Start
Let me set the context before jumping on Cold Start definition
ML based solutions try to model either Users(actions/behaviors) or Item(features). Learning is always dominated by majority signal in data.(If trained properly 😉). Cold start items have weak impressions(views, added to cart, orders)
Cold Start : When a User does not carry enough actions, or an Item does not have enough impressions, We call it Cold Start User or Cold Start Item respectively. A Search System would not be able to surface these items. Why??
Search System is composed of Recall(Solr/Elastic) + Ranker(LTR). Cold Start items can be selected at Recall state based on content relevance. But lack of enough impressions(views, added to cart, orders), Ranker would keep these item low in the search results.
Now can you smell a bigger problem? — Feedback Loop (Rich Getting Richer) (Causality Dilemma), These items would never make to top results and hence never get more impressions. Rest you know 😉
Cold Start is broadly classified into these type
- Item Cold Start (Product) : New item listed onto the platform.
- User Cold Start: New users signed-up, not able to show personalized search results.
- New Company/Community : Both items and Users are new to platform.
Sources of Cold Start:
- Listing of New Items.
- Absence of Semantic Search: Poor item description may lead to discoverability problem. Because of weak syntactical content relevance these item neither make to recall and hence nor relevance.
- Feedback Loop : No behavioral data causes poor ranking which in turn results in new products having a reduced likelihood of accruing behavioral data.
Now we know cold start can happen either of new User/Item or the whole platform is new.
Interestingly Cold start can also happen if user interest keep changing frequently. Ever changing interests becomes difficult to model because of not enough user impressions across items or item categories.
Approaches to handle Cold Start
Beauty of Cold Start problem is that, we can use 3 flavors of Machine Learning to address it.
The thing that we have to keep in mind is, cold start items are new and do not have enough impressions, but we still want to show them into search results.
Unsupervised Techniques
- Treating Cold Start in Product Search by Priors : This approach predicts “prior” values for behavioral features for new products. These priors are initial values of behavioral features at the time of new product’s introduction to the search index. [https://www.amazon.science/publications/treating-cold-start-in-product-search-by-priors]
- Item Side Information : For a given long-tail product pre-populate the features from most similar items.[https://ieeexplore.ieee.org/document/9178343]
Reinforcement Learning Techniques
Explore-Exploit: Cold start has been tackled using this category of techniques. Idea is to mix-up cold start item into search results. Using RL we try to find what % of cold items should be mixed.
Too Less % would not solve the cold start, Too High % cold items would result into bad user experience and revenue.
Popular RL Algorithms that has been used are following
- Multi-Arm bandit algorithms to balance Explore-Exploit trade-off.
- Epsilon Greedy, Contextual Epsilon Greedy.
- UCB, Thompson Sampling
Supervised Technique
Train a model on current items features. Learned model would predict the relevant features for new/long-tail products.
Given item features X we want to learn y(ranking features).
X = [Title, Product description, Category, content quality…]
y = [predicted conversion, predicted impression, predicted revenue]
y would be the input features to the ranker model.
Some Implementation I found useful
- Thompson Sampling: https://analyticsindiamag.com/thompson-sampling-explained-with-python-code/
Final Words:
Cold start problem is well researched problem, and almost always exist at any organization.
It has huge impact in Customer Experience, Merchant Retention and Revenue.
I find this problem interesting, Have you solved Cold Start differently? Let me know in the comments.
Thank You for reading this far!!