Store URL Architecture Overview

Here’s a scenario a lot of store owners recognize: you’ve put real work into your product pages, your copy is solid, and you’ve done everything right on the content side. And yet your rankings just won’t stabilize. Pages you want to rank keep getting overlooked while Google seems strangely fixated on URL versions you never intended to prioritize.

The frustrating truth is that this usually isn’t a content problem at all. It’s a structure problem. When your store generates multiple URLs that point to the same or very similar content, Google has to make judgment calls about which version to treat as the “real” one. And Google doesn’t always guess in your favor. When it consolidates those duplicates, it splits your ranking signals across multiple competing pages instead of concentrating them where you want them.

There’s another quiet issue that makes this worse: inconsistency. URLs are actually case-sensitive, meaning /Apple and /apple are two completely different URLs in Google’s eyes. At scale, that alone can create a mess of duplicate candidates without you realizing it. On top of that, Google recommends hyphens (-) instead of underscores (_) in URLs because hyphens are read as word separators while underscores aren’t. Add filters, sort options, and tracking parameters into the mix and you’ve got a catalog where ranking authority fragments instead of builds.

The good news is that all of this is fixable once you understand what’s happening. Think of your URL structure as an SEO system rather than a cosmetic detail, and the goal becomes simple: make one URL the obvious choice every single time so Google never has to guess.

How Google Actually Crawls Your Store

One thing worth understanding right away is that Google crawls URL by URL, not store by store. It doesn’t see your brand vision or your intended hierarchy. It sees a list of addresses, and it has to decide which ones to visit, how often, and which ones actually matter.

The problem for eCommerce stores is that a typical platform quietly multiplies URLs through categories, product variants, search pages, marketing tags, and filter states. Every one of those extra URLs is competing for Google’s attention, even when they’re all pointing at essentially the same inventory.

Your site navigation and internal linking work like a priority map for Google. Pages that are easy to reach through strong, consistent links get crawled more often and are more likely to make it into the index. Pages that are buried deep or only reachable through weak paths get crawled less and struggle to compete. On a large catalog, this matters enormously.

Google has what’s called a crawl budget: basically a limit on how much of your site it will crawl in a given period. Bigger and more frequently updated sites feel this constraint most. If Google is burning through your crawl budget on hundreds of near-identical filter URLs, it’s spending less time on the product and category pages you actually want ranked.

XML sitemaps are one of the best tools you have here. They don’t replace good internal linking, but they give Google a clear list of the URLs you consider important, which helps cut through the noise when your platform is generating lots of alternatives.

One more thing to watch for: orphan pages. These are URLs that exist on your site but have no internal links pointing to them. Google effectively can’t find them, which means revenue-driving pages can be technically live but practically invisible in search.

Three things to focus on:

  1. Reinforce internal links to your highest-revenue categories and core product pages from your navigation, category hubs, and relevant on-page content.
  2. Audit your XML sitemaps to make sure they only list the URLs you actually want indexed, not duplicates or dead ends.
  3. Reduce duplicate URL generation by limiting how many crawlable versions filter, sort, and tracking parameters create.

Once you start looking at your store through that lens, one culprit shows up more than almost anything else: faceted navigation that turns a small set of category pages into thousands of URL variations.

Filters, Facets, and the Duplicate URL Problem

Faceted navigation (those filter sidebars that let shoppers narrow by size, color, price, brand, and so on) is genuinely useful for shoppers. For SEO, it can be a disaster if left unmanaged. A modest product catalog with a handful of filter options can quietly generate millions of unique URLs, and Google will treat all of them as part of your architecture unless you take control.

Googlebot Crawling Paths

The failure pattern is predictable: your category page gets re-published in thousands of slightly different versions depending on which filters are active. Sort orders, filter combinations, and view toggles all reshuffle the same product set. Google recognizes them as near-duplicates, picks one to index, and consolidates the rest. You lose control over which version gets the credit.

It’s also worth knowing that the old Search Console URL Parameters tool that used to help manage this is gone. Google deprecated it in March 2022 and it was fully shut down by late April 2022. Any strategy that still references it is out of date.

The approach that works is a whitelist: only allow a facet combination to be indexed if it genuinely earns it. That means checking all four of these boxes:

  • Real search demand: people are actually searching for this specific combination (like “men’s waterproof hiking boots”).
  • Unique inventory cut: the filtered results meaningfully change what’s being shown, not just the order.
  • Stable over time: the combination won’t empty out or change dramatically from week to week.
  • Merchandisable: you can add real content to the page, like a heading, a short description, or an FAQ, that makes it a genuine destination rather than just a thin product grid.

Most filter combinations don’t meet this bar. Price sliders, multi-select size and color mixes, and hyper-specific attribute stacks create near-duplicate pages at massive scale without adding any real search value. These are typically set to noindex, while the small number of high-intent facet pages are made indexable intentionally.

Here’s a breakdown of the different levers you can use and what they actually do:

LeverIndexation outcomeCrawl impact (what actually changes)
rel=canonical targetingConsolidates duplicates to a preferred URL when pages are truly similar; dissimilar pages will be ignored.Keeps alternates discoverable; does not, by itself, reduce crawling of heavily linked variants.
meta robots noindexPrevents the URL from being indexed.Noindex does not prevent crawling. If variants stay prominent in internal links or XML sitemaps, they still consume crawl budget.
robots.txt disallowBlocks crawling of matching URL patterns.Direct crawl-budget lever for parameter patterns you never want crawled; use carefully because blocked URLs cannot pass on-page signals via crawling.
Internal link pruningReduces discovery and importance of low-value variants.Directly reduces crawl demand by removing repeated links to endless permutations (filters, sort options, “view all” states).
Sitemap inclusion or exclusionSignals what you consider canonical, index-worthy inventory.Keeps crawl focus on whitelisted facet templates; sitemap-listed facet URLs will keep getting crawled even if they are noindex.
Site design constraintsPrevents creation of junk URLs in the first place.Limits combinatorial explosion by restricting multi-select facets, collapsing “sort” to client-side state, or enforcing a small set of prebuilt filtered landing pages.

The policy that works is straightforward: index a small, intentional set of facet pages and treat everything else as throwaway.

  1. Whitelist the few facet combinations with proven, stable demand and real merchandising potential, and include only those in your XML sitemap.
  2. Canonicalize near-duplicates back to the closest true category or approved facet page, and noindex low-value filters that need to exist for users but don’t deserve to rank.
  3. Remove internal links to non-whitelisted variants and keep them out of your sitemap so crawl budget stays focused on products and revenue-driving categories.

With facets under control, the next thing to check is pagination, which is simpler but just as easy to get wrong.

Pagination: Making Sure Google Can Reach Your Full Catalog

Pagination sounds boring, but it has a direct impact on how many of your products Google can actually find. If Google can’t reach page 5 of a category, it can’t consistently discover and index the products living there. Those SKUs simply stop contributing to your visibility.

Facets and Duplicate URL Chaos

One thing to clear up right away: the old rel=prev/rel=next tags that used to signal a paginated series to Google are no longer a reliable primary indexing signal. Google has moved on from them, so paginated pages need to earn their place through proper crawlable links and self-contained page structure.

The standard to aim for: every page in a paginated series needs its own unique URL, and that URL must be reachable through a real HTML link, not just a JavaScript button. If page 3 can only be reached by clicking something that triggers a script, Google may not treat it as a real, discoverable page at all.

Each paginated page should also carry enough context on its own. Keep the same template, headings, and category navigation across all pages so Google understands what it’s looking at without needing to have read page 1 first.

When your paginated pages are meant to be indexed, use a self-referencing canonical on each one. That means page 2’s canonical points to page 2, not page 1. Canonicalizing every page in a series to page 1 is essentially telling Google to ignore pages 2, 3, and 4, which is usually not what you want.

Infinite scroll is another common issue. Google can’t scroll the way a human does, so if your products only load as users scroll down, many items simply won’t get found. The fix is to implement infinite scroll as a progressive enhancement: users can still scroll, but the site also maintains a proper paginated series with individual URLs and crawlable links underneath. Google specifically recommends this hybrid approach using the HTML5 History API so the URL updates to a paginated state as content loads.

  1. Verify that page-to-page pagination links exist in the HTML and are crawlable.
  2. Confirm self-referencing canonicals on each indexable paginated URL.
  3. Ensure infinite scroll updates to real paginated URLs that crawlers can fetch directly.

Getting Your Technical Signals to Agree

Here’s something that trips up even well-managed stores: having your canonicals, redirects, sitemaps, and robots rules all pointing in slightly different directions. When that happens, Google follows the mess rather than your intent, and crawl effort gets wasted on duplicates instead of the pages you care about.

Pagination and Canonical Signals

Canonicals are hints, not commands. They work well when everything else reinforces the same choice: your internal links, your canonical tags, and your sitemap all nominate the same preferred URL. When they disagree (for example, your canonical says one thing but your navigation keeps linking to a different version), Google gets conflicting instructions and often makes its own call.

Redirects are how you clean up old URLs during migrations, redesigns, and catalog changes without throwing away the authority those pages built up over time. Good redirect hygiene means avoiding chains (where URL A redirects to URL B which redirects to URL C) and loops. Every extra hop wastes crawl resources, and loops waste them entirely. The cleanest redirect is always a single step from old URL directly to the final preferred URL.

Your XML sitemap is a strong signal to Google about which URLs you consider primary. If your sitemap includes parameterized or alternate versions while your canonicals and internal links point elsewhere, you’re sending Google contradictory instructions. Keep your sitemaps clean, current, and consistent with the rest of your signals.

Robots directives are a more blunt instrument, but a useful one. Blocking the wrong URLs can cut off discovery, but leaving everything open means crawlers get flooded with low-value variants. Use robots rules intentionally to protect crawl resources for your most important templates.

Don’t overlook server performance either. Slow response times and server errors can reduce how often Google is willing to crawl your site. If your category or product pages time out intermittently, Google backs off and your consolidation signals take longer to settle.

  1. Pick one preferred URL per template (category, product, brand) and keep it stable.
  2. Enforce that preference in internal links so navigation and modules never promote alternates.
  3. Match canonicals to the same preferred URL on every variant that still resolves.
  4. Publish only the preferred URLs in XML sitemaps and remove retired variants.
  5. Redirect old and duplicate URLs in a single hop using 301s, with zero chains or loops.
  6. Monitor uptime, latency, and server errors so crawling doesn’t throttle during peak traffic.

BigCommerce vs Shopify: What Your Platform Can and Can’t Do

Your platform isn’t just where your store lives. It also defines the boundaries of what URL decisions are even possible. Understanding those constraints upfront saves a lot of frustration and explains why some SEO recommendations are easy to implement and others require custom work.

Shopify’s fixed URL prefixes like /products/ and /collections/ are baked into the platform and can’t be changed. That’s mostly fine, but it creates friction when your merchandising strategy wants a clean category hierarchy and your URLs don’t naturally mirror it. Shopify also generates duplicate URLs when a product is accessed through a collection path (for example, /collections/{collection}/products/{product} versus the primary /products/{product} URL). Shopify’s canonical behavior typically defaults to the clean /products/ version, which is the right call, but the problem is that internal links from apps, themes, and navigation can keep pushing Google toward the collection-product version. Canonical tags help but aren’t a guaranteed fix if Google decides to override your declared preference. For a fuller picture of how these constraints play out, the broader Shopify SEO constraints guide covers collection handling in more depth.

BigCommerce gives you more flexibility. URL structure settings let you customize how product URLs are generated, and you can set custom URLs on a per-product basis. That makes it much easier to align your URLs with your actual category structure. The tradeoff is that the flexibility cuts both ways: BigCommerce will auto-generate URLs across content types, and bulk changes are easy via export and re-import, so sprawl becomes a process problem rather than a technical one. If you’re also working through broader BigCommerce optimization considerations, URL governance fits naturally into that work.

On either platform, apps and integrations are a consistent source of new duplicate paths. Filter UIs, campaign tracking, wishlist endpoints, quickview modals, and search results pages all have the potential to create parameterized URLs that pollute your sitemap and confuse your canonical setup. The three things you can always control are where your internal links point, whether your canonicals stay consistent with that intent, and whether app-generated URLs end up in your sitemaps.

  1. Identify the URL patterns your platform imposes that you can’t change.
  2. Standardize internal linking destinations so navigation, breadcrumbs, and widgets always point to the URL you want indexed.
  3. Audit apps and customizations for duplicate paths, then adjust settings or templates to stop generating indexable copies.
  4. Validate canonical outputs and sitemap entries against your chosen preferred URLs so everything points the same direction.

A Practical URL Audit You Can Actually Follow

Knowing the theory is one thing. Turning it into a prioritized fix list is another. Here’s a practical process that produces real outcomes rather than an overwhelming spreadsheet of edge cases. For a real-world example of what this looks like in practice, this case study walks through how URL architecture issues were identified and resolved after a problematic eCommerce migration.

Step 1: Pull your crawl data and group by template. Export a full crawl and group URLs by template type, then look for duplicate signals: canonical mismatches, duplicate titles or descriptions, parameter patterns, orphan URLs, and redirect chains. Rank the clusters by business impact (top category and product templates first) and by crawl cost (how many URLs the pattern is generating).

Step 2: Apply your facet whitelist. Map out which facet combinations meet your earlier framework and have real search demand. Everything else gets blacklisted by pattern so you can fix at the template level instead of URL by URL.

Step 3: Validate canonicals at scale. Sample each cluster and check for mismatches between the canonical you declared and the canonical Google actually selected. In Google Search Console, URL Inspection gives you per-URL diagnostics including index status, last crawl date, and canonical selection, which makes it the fastest way to spot intent conflicts on representative URLs.

Step 4: Clean up redirects and internal links. Collapse any redirect chains to a single hop and eliminate loops. Then update internal links in navigation, breadcrumbs, faceted links, and product grids so they point directly to preferred URLs, not variants.

Step 5: Rebuild and validate your sitemaps. Align your XML sitemaps to the preferred URL set only. Compare submitted versus discovered URLs in Search Console to catch leakage from parameters, legacy paths, and internal linking that’s still pointing the wrong direction.

In Search Console, use Crawl Stats to confirm crawl demand is shifting, Page Indexing to track coverage changes, the Sitemaps report to monitor submitted versus discovered URLs, and URL Inspection to confirm canonical and indexing signals per template. Treat “Crawled – currently not indexed” entries with some caution: they can spike for a variety of reasons and are best judged as trends rather than isolated data points, ideally corroborated with crawl exports and log data.

If you want a rough timeline, a focused sprint looks something like this:

  • Day 1: Pull crawl data, GSC exports, and a revenue-ranked template list.
  • Days 2-3: Fix signal conflicts first (canonical intent, redirect chains, internal links) on the top templates.
  • Days 4-5: Lock facet rules to your whitelist and remove parameter-driven index bloat.
  • Days 6-7: Rebuild sitemaps to the preferred set, then re-check Search Console reports for directional movement before expanding.

Wrapping Up

Getting your URL architecture right is one of those investments that keeps paying off over time. It improves how efficiently Google crawls your store, stabilizes which pages get indexed, and concentrates ranking signals on the URLs you actually want to win. The goal is simple: one preferred hierarchy, one set of indexable URLs, and no ambiguity about which version should rank.

Start with the highest-leverage fixes: enforce one hierarchy with consistent casing and separators, govern your facets by whitelisting only the combinations worth indexing, keep pagination fully discoverable so deeper products get found, and align your canonicals, redirects, and sitemaps so every signal points the same direction. Work within your platform constraints rather than against them, because consistency beats clever workarounds every time.

After major URL changes, don’t expect overnight results. Google commonly recrawls and reprocesses over days to weeks, and sometimes longer. The right way to track progress is through Search Console URL Inspection and indexing reports, not by refreshing your rankings every morning.

  1. Implement your hierarchy and parameter rules, then lock them in with consistent canonicals, redirects, and sitemap outputs.
  2. Submit and validate priority templates and representative URLs in Search Console until Google is consistently selecting your intended canonicals.
  3. Monitor three KPIs weekly: fewer “Crawled – not indexed” entries, fewer parameter URLs in Crawl Stats, and a higher indexation rate for intended category and product URLs.
  4. Iterate based on what Google is actually crawling and indexing, not what you hoped it would do.

If you want an independent architecture audit and an implementation plan tailored to BigCommerce or Shopify, MAK Digital Design can help.

Written by Marina Lippincott
Written by Marina Lippincott

Tech-savvy and innovative, Marina is a full-stack developer with a passion for crafting seamless digital experiences. From intuitive front-end designs to rock-solid back-end solutions, she brings ideas to life with code. A problem-solver at heart, she thrives on challenges and is always exploring the latest tech trends to stay ahead of the curve. When she's not coding, you'll find her brainstorming the next big thing or mentoring others to unlock their tech potential.

Ask away, we're here to help!

Here are quick answers related to this post to clarify key points and help you apply the ideas.

  • Why is Google indexing "weird" URL versions instead of my main product or category pages?

    When multiple URLs serve the same or very similar content, Google clusters them and chooses a canonical using roughly 40 signals. If your store is generating lots of URL variations through filters, sorting, or tracking parameters, that consolidation process can split your internal link and relevance signals across competing versions, leaving your intended pages under-crawled and less trusted than they should be.

  • Do uppercase and lowercase URLs create duplicate content issues for eCommerce SEO?

    Yes. URLs are case-sensitive, so /Apple and /apple are treated as two different pages by Google, which can create duplicate candidates across a large catalog without you realizing it. Hyphens (-) are also recommended over underscores (_) in URLs because Google recognizes hyphens as word separators while underscores are not treated the same way.

  • What is crawl budget and what affects it on large eCommerce sites?

    Crawl budget is essentially the limit on how much of your site Google will crawl in a given period. It's made up of two things: crawl rate limit (influenced by server performance) and crawl demand (driven by your architecture and internal linking). On a large catalog, burning crawl budget on hundreds of near-identical filter URLs means less time spent on the product and category pages you actually want ranked.

  • How should I control faceted navigation so filters don't create thousands of duplicate URLs?

    Use a whitelist approach: only allow a facet combination to be indexed if it has clear search demand, a unique inventory cut, stable results over time, and enough content to make it a real destination page. Everything else should be canonicalized to a primary page, noindexed where needed, and removed from internal links and XML sitemaps so it stops consuming crawl budget.

  • Does adding noindex to filtered or parameter URLs stop Google from crawling them?

    No, and this is a common misconception. Noindex prevents a URL from being indexed, but it doesn't stop Google from crawling it. If those noindex variants are still prominent in your internal links or included in your sitemaps, they'll keep consuming crawl budget. To actually reduce crawling, you need to remove internal links pointing to those variants and, for patterns you never want crawled at all, use robots.txt disallow.

  • How should eCommerce pagination be set up so Google can discover deeper product pages?

    Every paginated page needs its own unique URL and has to be reachable through a real HTML link, not just a JavaScript button or scroll trigger. If the pages are meant to be indexed, each one should have a self-referencing canonical pointing to itself, not to page 1. Canonicalizing all paginated pages to page 1 is effectively asking Google to ignore everything past the first page of results, which is usually the opposite of what you want.

  • BigCommerce vs Shopify: what URL limitations affect SEO and duplicate URLs?

    Shopify uses fixed URL prefixes like /products/ and /collections/ that can't be changed, and it can generate duplicate URLs when a product is accessed through a collection path versus the primary product path. BigCommerce gives you more flexibility through URL structure settings and per-product custom URLs, but that flexibility means sprawl becomes a process and governance problem rather than a technical one. On both platforms, apps and integrations are a consistent source of new duplicate paths that need to be audited regularly.

  • How long does it take to see results after fixing URL architecture issues?

    There's no guaranteed timeline, and anyone who gives you a specific number is guessing. After major URL changes, Google typically recrawls and reprocesses over a period of days to weeks, but it can take longer on large catalogs or sites with a complicated history. The best way to track progress is through Google Search Console: watch for fewer "Crawled - not indexed" entries, fewer parameter URLs showing up in Crawl Stats, and a higher indexation rate for your intended category and product URLs. Judge trends over time rather than checking daily for movement.