Vernacular and Voice Search: How Real India Searches Marketplaces

Sit next to a first-time online shopper in a tier-2 town and watch how she actually finds a product. She does not type the clean, dictionary English your keyword sheet assumes. She types kurti for ladies cotton, or she taps the mic and says the word out loud in a mix of Hindi and English, or she half-spells a brand the way she heard it, not the way it is registered. The intent is real. The wallet is open. And on most listings, that search returns the wrong product or nothing at all.

This is the quiet gap in Indian catalogue work. Brands and agencies build keyword sets in fluent, urban English, then wonder why a chunk of obvious demand never converts. The demand did not vanish. It is searching in a language your listing does not speak. Bharat shops in Hinglish and increasingly by voice, and an English-only listing is invisible to a pool of buyers who are ready to buy right now.

How real India actually types and talks

The mental model of a single clean search term is wrong for most of the country. A buyer searching for sandals might phrase it in several ways in the same week, and the platform treats each as a different query. You are not optimising for one keyword. You are optimising for the messy reality of how the word arrives.

  • Hinglish, transliterated. Hindi or regional words typed in Roman script. Chappal, jhumka, chunni, kadai, dupatta. The buyer never reaches for the English equivalent because the Hindi word is the real word in her head.
  • English spelled by ear. Lehnga, kurtis, payjama, nighty. Spellings that no style guide approves but that thousands of buyers actually enter.
  • Voice, in full sentences. Spoken queries are longer and more conversational. Show me red cotton saree under 500 arrives whole, not as tidy tokens. Voice is how a buyer who is not confident typing on a small keyboard gets to the product.
  • Mixed-script and code-switched. Cotton wali saree, kids ke liye shoes. English nouns wrapped in Hindi connective tissue. This is the default register for a huge number of shoppers, not an edge case.

None of this is sloppy searching. It is the natural language of the buyer. Treating it as noise to be ignored is how you hand that demand to whoever bothered to capture it.

The buyer is not searching wrong. Your keyword set is listening in the wrong language.

Why English-only keyword sets bleed demand

Marketplace search is a literal matching engine before it is anything clever. If the buyer types chappal and your title, bullets, and backend terms only ever say slippers, the platform has little reason to surface you for that query. You are not outranked. You are absent. And absence does not show up in your reports as a loss, which is exactly why it goes unfixed for so long.

This is the same trap we keep flagging in listing keyword research for Indian marketplaces. Borrowing a Google SEO mindset, or worse a global English keyword set, produces clean terms that read well to a brand manager in a metro office and miss how the country actually searches. The platform does not reward grammar. It rewards match.

The cost compounds in exactly the markets you most want to grow. A confident urban buyer will often code-switch into English to get a result. A first-time buyer in a smaller town will not. She searches in her own words and accepts the first relevant thing she finds. If your listing does not speak her language, you are systematically losing the newer, faster-growing customer. That is the precise demand we map in tapping tier-2 and tier-3 demand, and vernacular coverage is the unglamorous mechanism that unlocks it. The scale here is no longer marginal. Bain’s How India Shops Online 2026 report puts tier-2 and smaller cities at roughly half of all incremental orders in 2025, even though shopper penetration there still trails the metros. The next wave of buyers is already in the funnel, and most of them do not search in textbook English.

Voice search changes the shape of the query, not just the input

Voice is not typing with your mouth. It changes what the query looks like. Spoken searches are longer, more natural, and often phrased as a full request with constraints baked in. A typed query might be two words. The voice version of the same intent is a sentence with a colour, a fabric, and a price ceiling.

That has a direct consequence for catalogue work. Listings optimised only for short head terms will under-match long, conversational queries. The fix is not to stuff your title. It is to make sure the natural-language phrases a buyer would speak appear somewhere the platform indexes, in backend search terms, in bullets, in honest descriptive copy. You are widening the surface area of how the listing can be matched, not making it louder.

What this looks like in practice

Concretely, vernacular and voice coverage means a few disciplined habits applied to every SKU.

  • Map the local word for the product, not just the catalogue word. If buyers say jhumka, the listing needs jhumka, not only drop earrings.
  • Capture the common misspellings and ear-spellings in backend terms where they do no harm to the visible copy.
  • Write at least one bullet in the plain, spoken phrasing a real buyer would use, so conversational and voice queries find a match.
  • Include the Hinglish connectors buyers actually attach, like wali, ke liye, for ladies, where they read naturally.

The line between coverage and keyword stuffing

This is where it goes wrong if you are careless. Vernacular coverage is not licence to cram a hundred transliterated terms into a title. A title that reads like a search dump kills trust the instant a buyer sees it, and it drags down the one thing that actually closes the sale. We are blunt about this in the catalogue mistakes that quietly kill conversion. A keyword the buyer finds but then bounces from is worth less than no keyword at all.

The discipline is simple to state and harder to hold. Visible copy stays clean, human, and readable. The vernacular and long-tail breadth lives in the backend search fields, in genuinely useful bullets, and in honest description text. You expand what the listing can be found for without degrading what the buyer sees when they arrive. Coverage and conversion are not in tension when you put each in its right place.

And coverage only matters if the landing experience holds up. Surfacing for cotton wali saree is wasted if the image and the first bullet do not immediately confirm the buyer found the right thing. The same instinct behind testing the image, not the bullet applies here. Vernacular search gets the right buyer to the door. The listing still has to close.

How to find the words your buyers actually use

You do not guess vernacular terms from a metro desk. You harvest them from where buyers already reveal them. The raw material is sitting in plain sight if you go looking.

  1. Your own search-term reports. The platform tells you the exact strings that led to a sale or a click. The Hinglish and misspelled entries are right there, already proven to convert. Start with what the data hands you.
  2. The platform search bar autosuggest. Type the product and watch what India is already searching. The suggestions are a free, live map of real phrasing, including the vernacular forms.
  3. Question and review language. How buyers describe the product in their own reviews and questions is how they will search for it. Mine that vocabulary directly.
  4. Read it aloud. Say the product the way a buyer would speak it into the mic. If your listing contains none of those spoken phrases, you have found your gap.

This is patient, unglamorous catalogue work, and it is exactly the kind of edge that does not show up in a flashy deck but does show up in sales from markets your competitors wrote off. The brands that win the next wave of Indian buyers are not the ones with the cleanest English. They are the ones whose listings answer the buyer in her own words.

What changed recently: AI discovery raises the stakes

The vernacular gap used to be a search-bar problem. It is now a discovery problem across a wider surface, because the buyer increasingly asks a chatbot instead of typing two words into a marketplace box. India is now the second-largest market for ChatGPT, with the user base growing roughly four and a half times in 2025 to more than 160 million monthly users, and Bain’s How India Shops Online 2026 report names conversational commerce one of the two trends reshaping Indian e-retail, alongside quick commerce. Early use is still mostly research and comparison rather than checkout, but the discovery moment is already moving.

The platforms have noticed. Per Business Standard, Amazon India began testing chatbot-focused search optimisation in select categories after the Diwali sale, and Flipkart has been in talks with firms specialising in generative engine optimisation, the practice of structuring listings so AI assistants like ChatGPT, Perplexity and Gemini select and recommend them. The thing those models reward is not new to anyone who has done this work properly. It is buyer-language alignment, contextual completeness and honest, verifiable detail. In other words, the same discipline that wins vernacular and voice search wins AI discovery too.

This is the part to internalise. A listing written only in clean metro English was already invisible to a chunk of typed and spoken demand. Now it is also thin material for the AI layer a growing share of buyers ask first. The fix does not change. Cover the words real buyers actually use, place breadth where the index lives and not where the buyer is put off, and keep the visible copy human. The brands doing that today were ready for voice, and they are ready for whatever asks the question next.

The short version

Real India searches in Hinglish, in ear-spelled English, and increasingly by voice in full spoken sentences, and a growing share now asks an AI assistant before they ever touch the marketplace search bar. An English-only keyword set cannot see most of that demand, and the loss never appears in your reports because absence is invisible. The fix is not louder titles. It is deliberate coverage of the words buyers actually use, placed where the platform indexes but the buyer is not put off, paired with a listing that still converts once they arrive.

This is the heart of our Catalog & Listing Optimization work, and it sits alongside Marketplace Performance and Conversion Rate Optimization for a reason. Getting found in vernacular, voice and now AI-led discovery is only half the job. Closing the buyer once she lands is the other half. Speak the buyer’s language, then earn the click. The demand has been waiting for a listing that listens.

A Catalog Data Quality Score Your Whole Team Can Rally Around

Ask three people on a brand team how good the catalog is and you get three answers. The category manager says it is fine. The performance lead says it is the reason ads underperform. The founder has not looked in months. Everyone has an opinion and nobody has a number. That gap is where listing debt lives, quietly, for quarters at a time. The fix is not another audit deck that gets read once and forgotten. It is a single score, calculated the same way every week, that the whole team can rally around.

We are not talking about a vanity metric. We mean a catalog data quality score that is decomposable into fixable parts, owned by named people, and tracked over time like any other operating number. Once you have it, vague complaints about the catalog turn into a backlog with line items. That shift, from feeling to figure, is the entire point.

Why listing debt stays invisible

The trouble with a broken catalog is that nothing throws an error. A listing with a blank material field is live. A product with three images instead of seven still ranks, just lower. A size chart that does not match Indian fit still sells, just with more returns. None of this trips an alarm. The dashboard says complete. So the debt compounds in silence, and the only signal you get is a slow, unattributable drag on conversion and discoverability.

We have walked through this in detail before, because so much of the damage hides in fields buyers never consciously read. If you have not seen how backend attributes and image order quietly bleed performance, start with our breakdown of the catalog mistakes that kill conversion. The scoring system in this piece is the operational answer to that diagnosis. It takes the qualitative problems and makes them countable.

A catalog without a score is not a healthy catalog. It is an unmeasured one, which is a very different thing.

What a good score actually measures

A score is only useful if it maps to things a person can change this week. We avoid a single opaque number that nobody can decompose. Instead we build the score from weighted components, each one a concrete dimension of listing health. The weights shift by category, but the skeleton holds across Amazon, Flipkart, Myntra, and the quick-commerce platforms.

  • Attribute completeness. What share of the category’s available structured fields are filled and valid. This is the engine of on-platform discovery, so it carries heavy weight.
  • Image coverage and sequence. Whether the listing has enough images, in the right order, obeying the platform’s main-image rules. A hero shot plus six supporting frames scores higher than two stray photos.
  • Content depth. Title, bullets, and description present, on-spec, and free of the obvious failures like missing keywords or banned characters.
  • Variation integrity. Whether parent-child structure is correct so reviews and ranking signals pool instead of fragmenting.
  • Compliance and stability. GST and GTIN configured, MRP consistent, inventory signals reliable, no suppression flags.
  • Enhanced content presence. A plus content or rich media where the category and margin justify it.

Each listing gets a sub-score per component, and the components roll up into one catalog-level number. The detail is what makes it actionable. A catalog at 72 is not just a 72. It is 94 on content, 51 on attributes, and 60 on images, which tells you exactly where the week’s work goes.

Keep the rubric ruthlessly objective

The fastest way to kill a scoring system is to make it subjective. If two reviewers can look at the same listing and disagree on its score, the number is dead on arrival. So every check must be binary or counted, never judged. Attribute filled or blank. Image present or not. Seven images or four. Resist the urge to score copy quality on a feel-based scale. You can grade whether keywords from your research are present, which is checkable, but not whether the prose is elegant. Note that on-platform keyword logic is its own discipline, distinct from web search, and your scoring rules should reflect that as we argue in our piece on listing keyword research for Indian marketplaces.

From score to assignable backlog

A number on a slide changes nothing. The score earns its keep when it generates a queue of work. The mechanism is simple. Every listing below the threshold on a given component produces a task, and that task has an owner, a fix, and a point value equal to the score it will recover.

This reframes the whole conversation. Instead of a manager saying the catalog needs improvement, the standup says there are forty listings missing the occasion attribute, that is six points of catalog health, and it is assigned to the content team for Thursday. Listing debt becomes a sprint backlog. People can see what they own and what it is worth. The score moving up each week is the proof that the work mattered.

Prioritisation falls out naturally too. You do not fix the catalog alphabetically. You fix the highest-revenue listings with the lowest scores first, because that is where recovered points convert to recovered sales fastest. A cheap, low-traffic SKU at 40 can wait. A hero product at 65 cannot.

How the score connects to revenue

The objection we hear is fair. Is a catalog score just hygiene theatre, or does the number actually move money. The honest answer is that the score is a leading indicator, not a guarantee. A higher score does not promise more sales the way a discount does. What it does is remove the structural reasons a listing cannot convert, which is a precondition for everything downstream.

This is why the catalog score and your conversion work belong on the same table. Once a listing is structurally sound, the real optimisation begins, and that is a different experiment entirely. We are firm that the highest-leverage test is usually the image, not the bullet, which we make the case for in our argument on conversion rate optimisation for listings. The score gets you to the start line. CRO is the race after it.

The connection to revenue becomes legible when you put the score next to outcomes leadership already watches. Track catalog health alongside conversion rate and ad efficiency on the same view, and the correlation tells its own story over a few months. If you are building that view, the principles for a report executives will actually open carry over directly from our take on a marketplace reporting dashboard leadership will read.

Running the score as a habit, not a project

The most common failure is treating the score as a one-time cleanup. The team rallies, the number jumps from 68 to 88 over a month, everyone celebrates, and then it drifts back down. New SKUs launch with half their attributes blank. Platform schema changes add fields nobody fills. Entropy is the default state of a catalog.

So the score has to be a recurring measurement with a standing owner, not a quarterly heroics exercise. The cadence that holds in practice:

  1. Recalculate the catalog score on a fixed weekly schedule, automatically where the platform data allows.
  2. Set a non-negotiable launch threshold so no new listing goes live below a minimum score.
  3. Review the component breakdown in the weekly operating meeting, not a separate catalog meeting nobody attends.
  4. Convert every gap into an owned task with a point value and a due date.
  5. Track the trendline, not the snapshot, so you catch drift before it becomes debt again.

What changed recently

Two shifts in the last year make the score harder to treat as optional. The first is on Amazon itself. The platform has moved required attributes from a soft suggestion to an enforced gate, expanding the structured fields you must supply to create or edit a listing and tightening attribute usage and enumeration values across product types in its listing requirement changes. Translation for your rubric: attribute completeness is no longer just a discovery lever you choose to pull. It is increasingly a precondition for the listing existing in valid form at all, which means the weight you put on that component should go up, not down.

The second shift is where the money is moving, and it is the stronger argument for taking catalog quality seriously this year. Quick-commerce platforms have turned into serious ad networks, and a listing that is not structurally complete cannot earn the placements brands are now paying hard for. Zepto’s advertising revenue grew about 151 percent to roughly Rs 1,636 crore in FY26, per figures in its draft prospectus reported by Storyboard18, and a Datum Intelligence estimate cited by Storyboard18 projects Blinkit, Zepto, and Instamart together could pull nearly Rs 4,900 crore in advertising revenue in 2026, with FMCG and impulse brands said to be shifting between 10 and 25 percent of their digital performance budgets onto these platforms. When that much spend rides on a listing, a blank attribute or a missing image is not a hygiene problem. It is wasted media against an incomplete product page. The catalog score is what stops you from buying traffic to a listing that was never ready to convert it. If you are deciding where that spend goes first, our view on quick-commerce unit economics after platform fees is the companion read.

This is the unglamorous discipline behind Catalog & Listing Optimization, and it is deliberately mechanical. The score does not need to be clever. It needs to be consistent, objective, and visible enough that the whole team trusts it. Pair it with steady Marketplace Account Management so the launch threshold actually gets enforced, and with Marketplace SEO so the discoverability gains from a complete catalog show up where buyers search.

The teams that win at marketplace catalogs are not the ones with the most opinions about quality. They are the ones who turned quality into a number, gave the number an owner, and watched it climb. Give your catalog a score this week. The debt you have been ignoring will finally have a name.

Book a meeting