lower total cost, despite the vendor's 40% per-row discount
A real estate data company, aggregating property listings for brokers, investment platforms, and market intelligence tools, was paying for far more data than they could use. Their vendor's volume discount made the per-row price look 40% cheaper than ours. But the vendor was billing for 6.7× more rows than the data actually contained, so their total came to roughly 4× what we charge. Cheaper per row, far more expensive in practice.
Property listing directories commonly surface results through zip code proximity searches. When a scrape iterates across a dense grid of zip codes, listing coverage areas overlap. The same property appears in results for a dozen adjacent zip codes.
The vendor delivered every occurrence and charged for every row. Vendors incentivised by row count have no reason to deduplicate before delivery. In fact, deduplication reduces their billing.
Here is the counterintuitive part. Because the row count was so high, the vendor offered a volume discount that made their per-row rate look about 40% cheaper than ours. But the high row count was the problem, not a benefit: 6.7× more rows than the data actually held. At a 40% lower rate on 6.7× the volume, their total came to roughly 4× what we charge. The discount was real. It just applied to rows the client should never have been billed for.
We deduplicate before delivery and bill on unique records. The client paid for the ~60,000 listings that existed, not the ~400,000 the search traversal produced. Same data, about 75% lower cost. Storage, compute, and billing all scale to what the data is actually worth.
A commercial property listings database was being scraped nationwide. The client needed data for specific regional markets only. The majority of what was delivered covered markets outside their operating area and had no use in their product. The client was billed by row count for all of it.
The structural incentive is the same: a vendor charging per row has no reason to narrow scope. A scoped, regional scrape produces fewer rows and less revenue for them. We scoped to the relevant markets. The client received in-region listings only.
This pattern is not unique to real estate. Any site where a nationwide scrape is run for a regionally-focused use case has the same structural waste embedded in it.
Row counts dropped. Costs dropped. Data relevance increased. The client was no longer paying to receive, store, and process data they couldn't use.
Are you paying for rows you can't use?
Start a Gap Audit →