Cohort Analysis for Marketplace Brands Without First-Party Data
Marketplaces hide the customer from you, but they cannot hide every signal of whether that customer came back.
Every retention playbook written for D2C assumes one thing you do not have on a marketplace. The customer. On your own store you have an email, a phone number, an order history tied to a person. On Amazon or Flipkart you have a settlement report and a wall. The platform owns the buyer, guards the identity, and hands you aggregates. So most marketplace brands quietly conclude that cohort analysis is a luxury for people with first-party data, and they stop looking.
That conclusion is wrong, and it is expensive. You cannot run a textbook cohort table keyed to individual customers. But the marketplace leaks enough repeat-behaviour signal that you can build cohorts that are directionally true and good enough to change what you spend, what you stock, and what you launch. The trick is to stop trying to reconstruct the customer and start reading the signals the platform cannot hide.
What you are actually missing, and what you are not
Be precise about the gap. The thing you lose on a marketplace is identity resolution. You cannot reliably say buyer 4471 purchased in January and again in April. Amazon’s brand tooling gives you a new-to-brand flag and some repeat-purchase aggregates, Flipkart gives you less, quick commerce gives you almost nothing at the person level. None of it is a clean customer ID you can build a classic cohort grid on.
What you keep is more than people assume. You keep the new-to-brand percentage on advertised sales. You keep subscribe-and-save enrolment and churn if you sell consumables. You keep total repeat-purchase rate at the account or SKU level where the platform reports it. You keep your own units-per-order and the gap between gross units sold and unique-ish demand. And you keep time. Every one of those is a retention signal. Cohort analysis without first-party data is the discipline of assembling those signals into a defensible view of whether buyers come back, even when you can never name a single one.
You are not reconstructing the customer. You are reconstructing the curve. The curve is what actually drives the decision.
Build cohorts on proxies, not people
If you cannot cohort by customer, cohort by the next best thing. The most useful unit is the acquisition window. Group everything by the month a buyer most plausibly entered the brand, then watch how the brand’s behaviour evolves against that window. You will not have per-person retention, but you will have a brand-level repeat signal that moves with the cohort.
Three proxy cohorts do most of the work for marketplace brands in India.
- Launch-month cohorts. Tag the month a SKU or variant went live. Track repeat-purchase rate and subscribe enrolment for that SKU over the following months. A consumable launched in March that shows a rising repeat rate by June is building a base. One that sells hard on launch and flatlines is renting demand from ads.
- Promo cohorts versus organic cohorts. Split the months you ran a deep Big Billion or Great Indian Festival push from the quiet months. Buyers acquired during a heavy discount window almost always repeat worse than buyers acquired at full price. The platform will not tell you this per customer, but the repeat-rate trend across promo-heavy and promo-light periods will.
- Subscribe cohorts. If you sell anything repeatable, subscribe-and-save is the closest thing to a real customer cohort the marketplace will ever give you. Enrolments by month, and how many are still active three and six months later, is a near-clean retention curve. Guard it like the asset it is.
Each of these is a cohort in the way that matters. It groups demand by when and how it was acquired, then measures whether it persisted. That is the entire point of cohort analysis. Identity is a convenience, not a requirement.
The new-to-brand number is your acquisition denominator
Amazon’s new-to-brand metric is underused as a cohort input. Read alongside total orders, it tells you what share of this period’s sales came from buyers the brand had probably never seen. A high new-to-brand share with flat total sales means you are acquiring and leaking in equal measure. A falling new-to-brand share with rising sales means existing demand is carrying you, which is healthy until it is stagnant. Tracked monthly, new-to-brand becomes the front edge of every acquisition cohort you build.
Repeat-purchase rate is the signal to defend
If you only instrument one thing, instrument repeat-purchase rate at the SKU level and watch its trend. It is the cleanest retention proxy a marketplace gives most brands, and it answers the question that actually pays. Are we building something, or are we buying the same sale again every month.
The danger is reading it as a snapshot. A 22 percent repeat rate means nothing in isolation. The same number rising across three launch cohorts means your product and your post-purchase experience are earning a second order. The same number falling while ad spend climbs is the warning that you are acquiring worse buyers, or that a competitor undercut the reorder. This is the same trap we flag when teams stare at a flattering blended figure instead of the trend, and it is why a per-SKU read matters more than an account average. We have argued the same logic from the cost side in looking at profitability one SKU at a time, and retention and margin are the two halves of whether a SKU deserves its shelf.
Repeat rate also reframes acquisition. A SKU with a strong, rising repeat signal can justify a worse acquisition cost, because the second and third orders pay it back. A SKU that never repeats has to win on the first order or not at all. Without cohorts you cannot tell these two apart, and you end up funding both as if they were the same business.
From cohorts to the number that matters
A repeat curve is not the destination. It is the input to value. Once you have a defensible repeat-purchase trend and an average order value, you can build a rough estimate of what a marketplace buyer is worth over time, even though you can never see that buyer again. The estimate will be a range, not a point, and that is correct. It is still enough to decide whether a category is worth scaling.
This is where cohorts feed directly into estimating customer LTV on marketplaces. The repeat rate gives you the probability of a next order, the order value gives you its size, and the cohort trend tells you whether that probability is improving or decaying. Stack those and you have a lifetime-value band built entirely from signals the platform did not mean to give you. It will be coarser than a D2C model. It will also be the difference between guessing and reasoning.
Cohorts also tell you where to spend retargeting effort
The cohorts that repeat well are the audiences worth chasing back. Even without customer identity, the platforms let you reach lookalikes and prior viewers, and knowing which cohort actually returns tells you which intent is worth paying to re-engage. That feeds straight into retargeting marketplace shoppers when you do not own their data, where the whole game is spending re-engagement budget on the demand most likely to convert again rather than spraying it evenly.
What changed recently
The case for reading retention sideways got stronger over the last year, because the platforms themselves stopped pretending a subscription badge equals loyalty. In February 2026 Zepto quietly shut down Zepto Daily, its loyalty and subscription programme, ahead of its IPO, and Swiggy Instamart’s own chief described the market as so irrationally competitive that customers switch platforms without any real loyalty, per Inc42. The lesson for a brand is direct. A subscribe enrolment is still your cleanest cohort, but enrolment is not retention. Track how many of each month’s subscribers are still active at three and six months, because the platform will happily sign people up and just as happily let them lapse.
The second shift is where the platforms are actually investing their reporting effort, which is the ad side. Quick commerce ad spend on Blinkit, Zepto and Instamart jumped to roughly 4,000 Cr in 2025 and is projected near 6,000 Cr in 2026, and the platforms now hand brands granular shopper signals like basket mix, order timing, locality and purchase frequency to feed that machine, as Inc42 documents. Purchase-frequency reporting is a cohort input the moment you stop reading it as a vanity number and start grouping it by acquisition window. The data exists. Whether it changes a decision is on you.
The third shift is cost. Blinkit and Zepto hiked commissions through 2025 to push toward profitability, which means the bar for a SKU to deserve its slot just went up, per Business Standard. Higher take rates make the repeat curve the deciding variable, not a nice-to-have. A SKU that only ever wins the first order cannot absorb a richer commission. One with a real, rising repeat signal can. The math we used to treat as analysis is now the difference between a profitable line and a subsidised one, which is exactly the discipline behind quick-commerce unit economics after platform fees.
Make it survive contact with leadership
The honest weakness of marketplace cohorts is that they are proxies, and proxies invite an easy dismissal. Someone in the room will say this is not real cohort data, and they will be technically correct and practically unhelpful. Pre-empt it. State the proxy plainly, show the trend over enough months that noise washes out, and tie every cohort to one decision it changed. A cohort view that does not move a budget or a launch is decoration.
Three habits keep these cohorts credible.
- Always show the trend, never the single number. One repeat rate is an opinion. Six months of the same cohort metric is evidence.
- Name the proxy out loud. Say repeat-purchase rate as a stand-in for retention, not retention. The honesty is what makes leadership trust the rest.
- Cut to the SKU and the channel. Blended cohorts hide the variance that is the entire reason to look. A healthy account average can sit on a hero SKU that repeats and a long tail that never does.
Getting this in front of decision-makers without burying them in tabs is its own craft, and it is exactly what we mean by a dashboard leadership will actually read. The cohort table is useless if it lives in a spreadsheet nobody opens.
The short version
Not owning the customer does not mean you cannot see retention. It means you have to read it sideways, from new-to-brand share, subscribe curves, and repeat-purchase trends grouped by how and when demand was acquired. Those proxies will never be as clean as a D2C cohort grid. They are clean enough to tell you which products earn a second order, which promos buy disposable buyers, and which categories deserve more money.
Our Analytics & Reporting work exists to assemble exactly these signals into cohorts a brand can act on, and our Marketplace Performance teams use them to decide where acquisition spend actually compounds. The customer is hidden from you. The curve is not. Build on the curve.