Why Amazon Blocks Scrapers in 2026
Amazon operates one of the most sophisticated anti-bot systems in e-commerce. With over 353 million products listed globally and billions of daily page views, Amazon has invested heavily in protecting its data from automated extraction. Understanding how Amazon detects and blocks scrapers is essential for choosing the right proxy strategy.
In 2026, Amazon's bot detection has evolved well beyond simple rate limiting. The platform now deploys a multi-layered defense system that analyzes traffic patterns at the network level, browser behavior at the client level, and request patterns at the application level. Each layer independently evaluates whether a visitor is a human or a bot, and triggering any single layer can result in CAPTCHAs, soft blocks, or permanent IP bans.
Amazon's Anti-Bot Detection Layers
Automated CAPTCHA deployment when request patterns deviate from human behavior. Includes image-based and interactive challenges.
Per-IP request throttling with dynamic thresholds. Amazon adjusts limits based on IP reputation, time of day, and page type.
Canvas fingerprinting, WebGL analysis, font enumeration, and JavaScript execution timing to identify automated browsers.
Real-time IP classification using databases that track datacenter ranges, known proxy IPs, and historical abuse patterns.
Mouse movement tracking, scroll patterns, click timing, and navigation flow analysis to distinguish humans from bots.
Analysis of TLS client hello messages to identify non-browser HTTP clients like requests, curl, or custom scrapers.
The critical insight for proxy selection is that Amazon's IP reputation system is the first line of defense. Before any behavioral analysis occurs, Amazon classifies the incoming IP address. Datacenter IPs are flagged immediately. Residential IPs receive moderate scrutiny. Mobile CGNAT IPs receive the highest trust because blocking them would affect thousands of legitimate mobile shoppers sharing the same IP address through carrier-grade NAT.
Key Takeaway
Your proxy type determines your starting trust score before Amazon even looks at your behavior. Mobile proxies start with the highest trust. Datacenter proxies start with the lowest. No amount of behavioral mimicry can fully compensate for a low-trust IP address.
Proxy Types for Amazon Scraping
There are four main proxy types used for Amazon scraping, each with fundamentally different characteristics. The right choice depends on your volume, budget, required success rate, and specific scraping targets. Here is a detailed breakdown of each type with honest pros and cons.
Datacenter Proxies
~15% successAdvantages
- Cheapest per GB
- Fastest raw speed
- Unlimited bandwidth options
Disadvantages
- Easily detected by Amazon
- IP ranges are known and flagged
- Frequent CAPTCHAs and blocks
- Not viable for sustained scraping
Verdict: Not recommended for Amazon scraping. Most datacenter IP ranges are pre-flagged by Amazon's anti-bot systems.
Residential Proxies
~55% successAdvantages
- Real ISP-assigned IPs
- Better trust than datacenter
- Wide geographic coverage
- Good for light scraping
Disadvantages
- Moderate success rate on Amazon
- Some IPs are overused by scrapers
- Speed varies by provider
- Can still trigger advanced detection
Verdict: Acceptable for low-volume Amazon scraping. Success rate drops under heavy load or when targeting protected pages.
ISP / Static Residential Proxies
~65% successAdvantages
- Datacenter speed with residential trust
- Consistent IP for session-based tasks
- Good for account management
- Stable connections
Disadvantages
- Smaller IP pools
- Higher cost per GB
- Limited geographic options
- Some providers mark IPs as hosting
Verdict: Good for session-heavy tasks like monitoring seller accounts. Cost-prohibitive for large-scale product scraping.
Mobile (4G/5G) Proxies
~88%+ successAdvantages
- Highest trust level on Amazon
- CGNAT IPs shared by thousands of real users
- Nearly impossible for Amazon to block
- Best success rate across all page types
Disadvantages
- Higher per-GB cost than datacenter
- Slightly higher latency than datacenter
- Smaller pool than residential
- Requires bandwidth management
Verdict: Best choice for Amazon scraping. The 88%+ success rate and near-unblockable CGNAT IPs make mobile proxies the most reliable option despite higher per-GB cost.
Success Rates by Proxy Type
We tested each proxy type against six different Amazon page types over a 7-day period in January 2026. Each test consisted of 1,000 requests per page type per proxy type (24,000 total requests). Success is defined as receiving a valid product page without CAPTCHA, soft block, or error response.
| Page Type | Datacenter | Residential | ISP/Static | Mobile |
|---|---|---|---|---|
| Product Pages | 18% | 58% | 67% | 91% |
| Search Results | 12% | 52% | 63% | 87% |
| Review Pages | 15% | 55% | 66% | 89% |
| Seller Info | 10% | 48% | 60% | 85% |
| Best Sellers | 14% | 50% | 62% | 88% |
| Category Pages | 20% | 60% | 70% | 92% |
Test Methodology
Tests were conducted from US-based servers using Playwright with realistic browser fingerprints. Each request used a fresh IP from the respective proxy pool. Mobile proxy tests used Proxies.sx 4G/5G connections. Request timing was randomized between 3-8 seconds to simulate human browsing.
Key finding: Mobile proxies consistently outperformed all other types across every Amazon page category. The gap was widest on Seller Info pages (85% vs 10% for datacenter), likely because Amazon applies the strictest protection to seller-facing data. Category pages showed the smallest gap, suggesting lighter protection on browse-level pages.
Best Amazon Scraping Setup
The optimal Amazon scraping stack in 2026 combines mobile proxies with a headless browser for JavaScript rendering, automatic IP rotation, and realistic browsing patterns. Here is a production-ready Python implementation using Playwright and Proxies.sx mobile proxies.
Recommended Stack
Python + Playwright Amazon Scraper with Mobile Proxy
This script connects to a Proxies.sx mobile proxy endpoint, launches a headless Chromium browser with a mobile viewport and realistic User-Agent, and extracts product data including title, price, rating, review count, and availability. Error handling and retry logic are built in.
import asyncio
from playwright.async_api import async_playwright
import json
import random
# Proxies.sx mobile proxy configuration
PROXY_HOST = "gate.proxies.sx"
PROXY_PORT = 10000
PROXY_USER = "your_username"
PROXY_PASS = "your_password"
# Realistic User-Agent rotation
USER_AGENTS = [
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Mobile/15E148 Safari/604.1",
]
async def scrape_amazon_product(page, asin: str) -> dict:
"""Scrape a single Amazon product page by ASIN."""
url = f"https://www.amazon.com/dp/{asin}"
# Add random delay to mimic human behavior
await asyncio.sleep(random.uniform(2, 5))
try:
response = await page.goto(url, wait_until="domcontentloaded", timeout=30000)
if response and response.status == 200:
# Extract product data
title = await page.locator("#productTitle").inner_text()
title = title.strip() if title else None
# Price extraction (Amazon uses multiple price selectors)
price = None
for selector in [
".a-price .a-offscreen",
"#priceblock_ourprice",
"#priceblock_dealprice",
".a-price-whole",
]:
try:
el = page.locator(selector).first
if await el.is_visible(timeout=2000):
price = await el.inner_text()
break
except:
continue
# Rating
rating = None
try:
rating_el = page.locator("#acrPopover .a-icon-alt").first
if await rating_el.is_visible(timeout=2000):
rating = await rating_el.inner_text()
except:
pass
# Review count
review_count = None
try:
review_el = page.locator("#acrCustomerReviewText").first
if await review_el.is_visible(timeout=2000):
review_count = await review_el.inner_text()
except:
pass
# Availability
availability = None
try:
avail_el = page.locator("#availability span").first
if await avail_el.is_visible(timeout=2000):
availability = (await avail_el.inner_text()).strip()
except:
pass
return {
"asin": asin,
"title": title,
"price": price,
"rating": rating,
"review_count": review_count,
"availability": availability,
"url": url,
"status": "success",
}
else:
return {"asin": asin, "status": "blocked", "code": response.status if response else None}
except Exception as e:
return {"asin": asin, "status": "error", "error": str(e)}
async def scrape_products(asins: list[str]) -> list[dict]:
"""Scrape multiple Amazon products using mobile proxy."""
results = []
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={
"server": f"http://{PROXY_HOST}:{PROXY_PORT}",
"username": PROXY_USER,
"password": PROXY_PASS,
},
headless=True,
)
context = await browser.new_context(
user_agent=random.choice(USER_AGENTS),
viewport={"width": 390, "height": 844},
locale="en-US",
timezone_id="America/New_York",
)
page = await context.new_page()
for asin in asins:
result = await scrape_amazon_product(page, asin)
results.append(result)
print(f"[{result['status']}] {asin}: {result.get('title', 'N/A')[:60]}")
await browser.close()
return results
# Usage
if __name__ == "__main__":
target_asins = [
"B0CHX3QBCH", # Example ASIN 1
"B0D5CLQNL2", # Example ASIN 2
"B0BSHF7WHW", # Example ASIN 3
]
data = asyncio.run(scrape_products(target_asins))
with open("amazon_products.json", "w") as f:
json.dump(data, f, indent=2)
print(f"\nScraped {len(data)} products. Success: {sum(1 for d in data if d['status'] == 'success')}")Code Highlights
- Mobile User-Agent rotation mimics real iPhone and Android traffic patterns
- Mobile viewport (390x844) matches iPhone 14 Pro screen dimensions
- Random delays between 2-5 seconds prevent machine-speed detection
- Multiple price selector fallbacks handle Amazon's varying page layouts
- Graceful error handling logs failures without crashing the scraping pipeline
- JSON output for easy integration with databases and monitoring systems
Price Monitoring Architecture
Price monitoring is the most common Amazon scraping use case. Building a reliable monitoring system requires more than a scraping script. You need scheduled execution, proxy rotation, data persistence, change detection, and an alerting system. Here is a complete architecture for monitoring up to 10,000 products with 30-minute update intervals.
Architecture Components
Cron Scheduler
Runs price checks every 30 minutes. Uses Python schedule library for simplicity or cron/Celery for production deployments. Staggers batch starts to avoid traffic spikes.
Proxy Rotation Layer
Proxies.sx mobile proxies with automatic IP rotation. Each batch of 5-10 requests uses a sticky session, then rotates to a fresh CGNAT IP. Prevents pattern detection across consecutive requests.
Data Storage
Price history stored in JSON files for simple setups, or PostgreSQL/TimescaleDB for production. Each price point includes timestamp, ASIN, price, and availability status for trend analysis.
Alert System
Real-time notifications when prices change beyond a configurable threshold. Supports email, Slack webhooks, and custom HTTP callbacks. Tracks both price drops and increases.
Complete Price Monitoring Implementation
This code builds on the scraping function from the previous section and adds scheduled execution, price change detection, and alerting. For production use, replace the JSON file storage with a database and add your preferred notification channel.
import asyncio
import schedule
import time
from datetime import datetime
from dataclasses import dataclass
import json
import smtplib
from email.mime.text import MIMEText
@dataclass
class PriceAlert:
asin: str
product_name: str
old_price: float
new_price: float
change_pct: float
timestamp: str
class AmazonPriceMonitor:
def __init__(self, proxy_config: dict, alert_email: str):
self.proxy = proxy_config
self.alert_email = alert_email
self.price_history: dict[str, list] = {}
self.alerts: list[PriceAlert] = []
def parse_price(self, price_str: str) -> float | None:
"""Extract numeric price from Amazon price string."""
if not price_str:
return None
import re
match = re.search(r"[\d,]+\.\d{2}", price_str.replace(",", ""))
return float(match.group()) if match else None
async def check_prices(self, asins: list[str]):
"""Check current prices and detect changes."""
# Uses the scrape_products function from above
results = await scrape_products(asins)
for result in results:
if result["status"] != "success" or not result.get("price"):
continue
asin = result["asin"]
current_price = self.parse_price(result["price"])
if current_price is None:
continue
# Initialize history if needed
if asin not in self.price_history:
self.price_history[asin] = []
# Check for price change
if self.price_history[asin]:
last_price = self.price_history[asin][-1]["price"]
if current_price != last_price:
change_pct = ((current_price - last_price) / last_price) * 100
alert = PriceAlert(
asin=asin,
product_name=result.get("title", "Unknown")[:100],
old_price=last_price,
new_price=current_price,
change_pct=round(change_pct, 2),
timestamp=datetime.now().isoformat(),
)
self.alerts.append(alert)
self.send_alert(alert)
# Record price point
self.price_history[asin].append({
"price": current_price,
"timestamp": datetime.now().isoformat(),
})
def send_alert(self, alert: PriceAlert):
"""Send price change notification."""
direction = "dropped" if alert.change_pct < 0 else "increased"
subject = f"Price {direction}: {alert.product_name[:50]}"
body = f"""
Product: {alert.product_name}
ASIN: {alert.asin}
Old Price: ${alert.old_price:.2f}
New Price: ${alert.new_price:.2f}
Change: {alert.change_pct:+.2f}%
Time: {alert.timestamp}
"""
print(f"ALERT: {subject}")
# Add your email/Slack/webhook notification logic here
def save_history(self, filepath: str = "price_history.json"):
"""Persist price history to disk."""
with open(filepath, "w") as f:
json.dump(self.price_history, f, indent=2)
# Setup and run
monitor = AmazonPriceMonitor(
proxy_config={
"host": "gate.proxies.sx",
"port": 10000,
"user": "your_username",
"pass": "your_password",
},
alert_email="alerts@yourcompany.com",
)
TRACKED_ASINS = [
"B0CHX3QBCH",
"B0D5CLQNL2",
"B0BSHF7WHW",
# Add up to 10,000 ASINs
]
# Run price checks every 30 minutes
def run_check():
asyncio.run(monitor.check_prices(TRACKED_ASINS))
monitor.save_history()
schedule.every(30).minutes.do(run_check)
print("Amazon Price Monitor started. Checking every 30 minutes...")
while True:
schedule.run_pending()
time.sleep(1)Bandwidth Estimates
- 100 products / 30 min~0.5 GB/day
- 1,000 products / 30 min~5 GB/day
- 10,000 products / 30 min~50 GB/day
Estimates assume ~300KB per page with Playwright rendering and 88% success rate with mobile proxies.
Scaling Tips
- Use multiple proxy ports for parallel scraping
- Batch ASINs by Amazon domain (.com, .co.uk)
- Cache unchanged pages to reduce bandwidth
- Implement exponential backoff on failures
Amazon Scraping Legal Considerations
Web scraping operates in a legal gray area, and Amazon scraping is no exception. Before building any scraping infrastructure, it is important to understand the current legal landscape and adopt responsible practices. This section covers the key legal considerations as of 2026, but it is not legal advice. Consult with a qualified attorney for your specific situation.
Terms of Service
Amazon's Conditions of Use explicitly prohibit using "any robot, spider, scraper, or other automated means to access the Services for any purpose." However, Terms of Service are contractual agreements, not laws. The hiQ Labs v. LinkedIn Supreme Court decision (2022) established that accessing publicly available data does not violate the Computer Fraud and Abuse Act (CFAA), even when it violates a website's ToS. This precedent is widely cited in scraping cases, though it does not provide blanket immunity.
Publicly Available Data
Product listings, prices, ratings, and reviews on Amazon are publicly accessible without authentication. Courts have generally held that scraping publicly available data is permissible, especially for purposes like price comparison, market research, and competitive intelligence. The key distinction is between public data (product pages viewable by anyone) and private data (account information, purchase history, seller analytics behind login).
Rate Limiting Ethics
Responsible scraping means not degrading the target website's performance for legitimate users. Best practices include: respecting robots.txt directives where reasonable, limiting request frequency to avoid server strain, scraping during off-peak hours when possible, and not circumventing authentication mechanisms. Using mobile proxies with natural request pacing inherently aligns with these practices because your traffic blends with legitimate mobile users.
GDPR and Personal Data
If you are scraping seller names, reviewer names, or any data that could identify individuals, GDPR and similar privacy regulations may apply, especially if you operate in or target users in the EU. Product data, prices, and aggregate statistics are generally not considered personal data. If your scraping includes any personally identifiable information, ensure you have a lawful basis for processing and comply with applicable data protection regulations.
Responsible Scraping Checklist
Cost Analysis: Scraping 10K Products Daily
The cheapest proxy is not always the most cost-effective. When you factor in success rates, retries, and wasted bandwidth, the true cost per successful request can be very different from the headline per-GB price. Here we calculate the real cost of scraping 10,000 Amazon product pages daily with each proxy type.
Assumptions
| Proxy Type | Cost/GB | Success Rate | Effective $/GB | Daily BW | Daily Cost | Monthly Cost |
|---|---|---|---|---|---|---|
| Datacenter | $1.00 | 15% | $6.67 | ~50 GB | $50 | $1,500 |
| Residential | $8.00 | 55% | $14.55 | ~18 GB | $144 | $4,320 |
| ISP/Static | $15.00 | 65% | $23.08 | ~15 GB | $225 | $6,750 |
| Mobile (4G/5G) | $6.00 | 88% | $6.82 | ~11 GB | $66 | $1,980 |
Mobile Proxy ROI Breakdown
- Raw bandwidth (10K pages)~3 GB
- With retries (88% success)~3.4 GB
- Overhead (headers, JS, etc.)~7.6 GB
- Total daily bandwidth~11 GB
- Daily cost at $6/GB$66
At 501+ GB/month, Proxies.sx pricing drops to $4/GB, reducing monthly cost to approximately $1,320.
Why "Cheap" Proxies Cost More
- Datacenter at $1/GB$1,500/mo
- Residential at $8/GB$4,320/mo
- ISP/Static at $15/GB$6,750/mo
- Mobile at $6/GB$1,980/mo
Datacenter appears cheapest but has the lowest effective success rate. Residential and ISP proxies waste bandwidth on blocked requests and retries, inflating actual costs.
The ROI Equation
Mobile proxies from Proxies.sx cost $6/GB at the base tier but deliver an 88% success rate. This means you spend less on wasted bandwidth, need fewer retries, and complete scraping jobs faster. Compared to residential proxies at $8/GB with a 55% success rate, mobile proxies save approximately $2,340/month for a 10K product daily monitoring operation. The savings increase further at volume pricing ($4/GB for 501+ GB).
Frequently Asked Questions
Start Scraping Amazon with 88% Success Rate
Try Proxies.sx mobile proxies free: 1GB bandwidth + 2 ports. No credit card required. Test against your Amazon targets and see the success rate difference yourself.