What is a crawler proxy?

A crawler proxy is an intermediary server used to fetch content from a target website during web scraping. When a request is sent, the proxy server receives it first, forwards it to the destination site, retrieves the response, and then returns the content to the crawler.

Why should you use a proxy?

Proxies help you avoid common anti-scraping protections such as IP bans and request-rate limits. They also mask your real IP address and improve privacy while you collect data.

How do you test whether a proxy works?

You can test a proxy by configuring it in a browser and loading a website, or by using Python tools such as requests. Services like httpbin.org can show the IP address used by the request, which makes it easy to confirm that the proxy is active.

How do you get working proxies from a proxy pool?

You can pull usable proxies from public proxy lists or from your own managed proxy pool. In a typical workflow, one function scrapes a public source to build a candidate list, while another function retrieves a validated proxy from your own internal pool.

How can you maintain proxy quality?

Maintain proxy quality by testing availability regularly, assigning quality scores, using higher-rated proxies first, and rotating them instead of relying on the same IP for too long.

How to Use Proxy IPs in Python 3 Crawlers

This guide explains how proxy IPs help Python 3 crawlers scrape more privately, avoid IP bans caused by frequent requests, and work across libraries such as `urllib`, `requests`, `Selenium`, and `Playwright`. The sections below organize the proxy setup steps for each library in one place.

In Python 3 crawling workflows, proxies are commonly used to prevent IP bans and improve scraping throughput by distributing requests across multiple IP addresses. Proxies generally fall into two groups: free proxies, which are usually unstable, and paid proxies, which are typically more reliable.

Common Python 3 crawler proxy use cases include:

Preventing IP bans: many websites enforce request-rate limits, and when a single IP crosses that limit it can be blocked. Proxies reduce that risk.

Increasing crawl speed: proxies let you open multiple connections in parallel so you can collect data faster.

Bypassing geo restrictions: some sites expose different content in different regions. If you need region-specific data, a proxy can help you reach it.

In short, proxy IPs are an important part of Python 3 crawling. Because proxy use also introduces security considerations, you should choose providers carefully and follow applicable security and compliance rules.

Preparation

First, you need a working proxy. A proxy is simply an IP address and port combined in the format ip:port. If the proxy requires authentication, you will also need a username and password.

On my machine, a local proxy tool exposes an HTTP proxy on port 7890, which means the proxy is `127.0.0.1:7890`. It also exposes a SOCKS proxy on port 7891, so that proxy is `127.0.0.1:7891`. Once either proxy is configured, my machine routes traffic through the connected upstream server IP instead of the local IP.

In the examples below, I use those local proxies to demonstrate the setup. You can replace them with your own working proxy details.

After configuring a proxy, use http://httpbin.org/get as a quick test URL. The response includes request metadata, and the `origin` field shows the client IP so you can verify whether the proxy is active.

With that ready, let’s walk through proxy configuration for each request library.

Get Python 3 Crawler Proxies

Some websites monitor repeated access and actively block suspicious traffic. A proxy server distributes request sources, reduces the chance of detection, and improves crawl success rates.

2. urllib

Let’s start with the most basic option, `urllib`, and look at how proxy configuration works there:

 from urllib.error import URLError
 from urllib.request import ProxyHandler, build_opener

 proxy = '127.0.0.1:7890'
 proxy_handler = ProxyHandler({
    'http': 'http://' + proxy,
    'https': 'http://' + proxy
 })
 opener = build_opener(proxy_handler)
 try:
    response = opener.open('https://httpbin.org/get')
    print(response.read().decode('utf-8'))
 except URLError as e:
    print(e.reason)

The output looks like this:

 {
  "args": {},
  "headers": {
    "Accept-Encoding": "identity",
    "Host": "httpbin.org",
    "User-Agent": "Python-urllib/3.7",
    "X-Amzn-Trace-Id": "Root=1-60e9a1b6-0a20b8a678844a0b2ab4e889"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

Here we use `ProxyHandler` to configure the proxy. Its argument is a dictionary where the keys are protocols and the values are proxy addresses. You must include the scheme in the proxy value, such as http:// or https://. When the target URL is HTTP, `urllib` uses the `http` key. When the target URL is HTTPS, it uses the `https` key. In this example, both keys point to an HTTP proxy, so both HTTP and HTTPS traffic are routed through that proxy.

After creating the `ProxyHandler`, pass it into `build_opener()` to create an opener that already knows how to route requests through the proxy. Then call `open()` on that opener to fetch the target URL.

The response body is JSON, and the `origin` field shows the client IP. Because that IP matches the proxy instead of the real local IP, the proxy was configured successfully.

If the proxy requires authentication, configure it like this:

 from urllib.error import URLError
 from urllib.request import ProxyHandler, build_opener

 proxy = 'username:password@127.0.0.1:7890'
 proxy_handler = ProxyHandler({
    'http': 'http://' + proxy,
    'https': 'http://' + proxy
 })
 opener = build_opener(proxy_handler)
 try:
    response = opener.open('https://httpbin.org/get')
    print(response.read().decode('utf-8'))
 except URLError as e:
    print(e.reason)

The only change is the `proxy` variable. Add the username and password before the host. For example, if the username is `foo` and the password is `bar`, the proxy string becomes foo:bar@127.0.0.1:7890.

If the proxy is SOCKS5, configure it like this:

 import socks
 import socket
 from urllib import request
 from urllib.error import URLError

 socks.set_default_proxy(socks.SOCKS5, '127.0.0.1', 7891)
 socket.socket = socks.socksocket
 try:
    response = request.urlopen('https://httpbin.org/get')
    print(response.read().decode('utf-8'))
 except URLError as e:
    print(e.reason)

This example requires the `socks` module, which you can install with:

 pip3 install PySocks

This assumes a local SOCKS5 proxy is running on port `7891`. When it works correctly, the output matches the HTTP proxy example above:

 {
  "args": {},
  "headers": {
    "Accept-Encoding": "identity",
    "Host": "httpbin.org",
    "User-Agent": "Python-urllib/3.7",
    "X-Amzn-Trace-Id": "Root=1-60e9a1b6-0a20b8a678844a0b2ab4e889"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

Again, the `origin` field shows the proxy IP, so the proxy setup is working.

3. Proxy Setup in `requests`

In `requests`, proxy setup is straightforward. You only need to pass the proxies parameter.

Using the local proxy from this machine as an example, the HTTP proxy configuration looks like this:

 import requests

 proxy = '127.0.0.1:7890'
 proxies = {
    'http': 'http://' + proxy,
    'https': 'http://' + proxy,
 }
 try:
    response = requests.get('https://httpbin.org/get', proxies=proxies)
    print(response.text)
 except requests.exceptions.ConnectionError as e:
    print('Error', e.args)

The output looks like this:

 {
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.22.0",
    "X-Amzn-Trace-Id": "Root=1-5e8f358d-87913f68a192fb9f87aa0323"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

Like `urllib`, `requests` uses the `http` proxy for HTTP URLs and the `https` proxy for HTTPS URLs. In this example, both are routed through the same HTTP proxy.

If the `origin` value in the response matches the proxy IP, the proxy is configured correctly.

If the proxy requires authentication, prepend the username and password like this:

 proxy = 'username:password@127.0.0.1:7890'

Just replace username and password with your own credentials.

If you need a SOCKS proxy, use this configuration instead:

 import requests

 proxy = '127.0.0.1:7891'
 proxies = {
    'http': 'socks5://' + proxy,
    'https': 'socks5://' + proxy
 }
 try:
    response = requests.get('https://httpbin.org/get', proxies=proxies)
    print(response.text)
 except requests.exceptions.ConnectionError as e:
    print('Error', e.args)

For this, you need to install the extra package requests[socks]:

 pip3 install "requests[socks]"

The output is the same:

 {
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.22.0",
    "X-Amzn-Trace-Id": "Root=1-5e8f364a-589d3cf2500fafd47b5560f2"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

There is also another approach that uses the `socks` module directly. It requires the same `socks` dependency installed earlier:

 import requests
 import socks
 import socket

 socks.set_default_proxy(socks.SOCKS5, '127.0.0.1', 7891)
 socket.socket = socks.socksocket
 try:
    response = requests.get('https://httpbin.org/get')
    print(response.text)
 except requests.exceptions.ConnectionError as e:
    print('Error', e.args)

This method also works for SOCKS proxies and produces the same result. Compared with the first method, this one changes socket behavior globally, so choose based on your use case.

4. Proxy Setup in `httpx`

`httpx` works a lot like `requests`, so it also uses a `proxies` argument. The main difference is that the keys must be http:// and https:// instead of just http and https.

For an HTTP proxy, use this setup:

 import httpx

 proxy = '127.0.0.1:7890'
 proxies = {
    'http://': 'http://' + proxy,
    'https://': 'http://' + proxy,
 }

 with httpx.Client(proxies=proxies) as client:
    response = client.get('https://httpbin.org/get')
    print(response.text)

If the proxy requires authentication, just change the `proxy` value:

 proxy = 'username:password@127.0.0.1:7890'

Replace username and password with your actual credentials.

The output is similar to the `requests` example:

 {
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "python-httpx/0.18.1",
    "X-Amzn-Trace-Id": "Root=1-60e9a3ef-5527ff6320484f8e46d39834"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

For SOCKS proxies, install the `httpx-socks` package:

 pip3 install "httpx-socks[asyncio]"

This installs support for both synchronous and asynchronous usage.

For synchronous mode, configure it like this:

 import httpx
 from httpx_socks import SyncProxyTransport

 transport = SyncProxyTransport.from_url(
    'socks5://127.0.0.1:7891')

 with httpx.Client(transport=transport) as client:
    response = client.get('https://httpbin.org/get')
    print(response.text)

Here we create a `transport` object, point it at the SOCKS proxy, and pass that transport into `httpx.Client()`. The result is the same as before.

For asynchronous mode, use this version:

 import httpx
 import asyncio
 from httpx_socks import AsyncProxyTransport

 transport = AsyncProxyTransport.from_url(
    'socks5://127.0.0.1:7891')

 async def main():
    async with httpx.AsyncClient(transport=transport) as client:
        response = await client.get('https://httpbin.org/get')
        print(response.text)

 if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

The only difference from synchronous mode is that we use `AsyncProxyTransport` instead of `SyncProxyTransport`, and `AsyncClient` instead of `Client`. Everything else stays the same.

5. Proxy Setup in `Selenium`

`Selenium` can also use proxies. Here the examples use Chrome.

For a non-authenticated proxy, configure it like this:

 from selenium import webdriver

 proxy = '127.0.0.1:7890'
 options = webdriver.ChromeOptions()
 options.add_argument('--proxy-server=http://' + proxy)
 browser = webdriver.Chrome(options=options)
 browser.get('https://httpbin.org/get')
 print(browser.page_source)
 browser.close()

The output looks like this:

 {
  "args": {},
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Encoding": "gzip, deflate",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-5e8f39cd-60930018205fd154a9af39cc"
  },
  "origin": "210.173.1.204",
  "url": "http://httpbin.org/get"
 }

The `origin` field again matches the proxy IP, so the proxy is configured correctly.

If the proxy requires authentication, the setup is more involved:

 from selenium import webdriver
 from selenium.webdriver.chrome.options import Options
 import zipfile

 ip = '127.0.0.1'
 port = 7890
 username = 'foo'
 password = 'bar'

 manifest_json = """{"version":"1.0.0","manifest_version": 2,"name":"Chrome Proxy","permissions": ["proxy","tabs","unlimitedStorage","storage","<all_urls>","webRequest","webRequestBlocking"],"background": {"scripts": ["background.js"]
    }
 }
 """
 background_js = """
 var config = {
        mode: "fixed_servers",
        rules: {
          singleProxy: {
            scheme: "http",
            host: "%(ip) s",
            port: %(port) s
          }
        }
      }

 chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

 function callbackFn(details) {
    return {
        authCredentials: {username: "%(username) s",
            password: "%(password) s"
        }
    }
 }

 chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
 )
 """ % {'ip': ip, 'port': port, 'username': username, 'password': password}

 plugin_file = 'proxy_auth_plugin.zip'
 with zipfile.ZipFile(plugin_file, 'w') as zp:
    zp.writestr("manifest.json", manifest_json)
    zp.writestr("background.js", background_js)
 options = Options()
 options.add_argument("--start-maximized")
 options.add_extension(plugin_file)
 browser = webdriver.Chrome(options=options)
 browser.get('https://httpbin.org/get')
 print(browser.page_source)
 browser.close()

This approach creates a local `manifest.json` file and a `background.js` script to handle proxy authentication. When the code runs, it packages the configuration into `proxy_auth_plugin.zip`.

The result is the same as the previous example: the `origin` field shows the proxy IP.

SOCKS proxy setup is simpler. Change the protocol to socks5, like this non-authenticated example:

  from selenium import webdriver

 proxy = '127.0.0.1:7891'
 options = webdriver.ChromeOptions()
 options.add_argument('--proxy-server=socks5://' + proxy)
 browser = webdriver.Chrome(options=options)
 browser.get('https://httpbin.org/get')
 print(browser.page_source)
 browser.close()

The result is the same.

6. Proxy Setup in `aiohttp`

In `aiohttp`, you can configure a proxy directly through the proxy parameter. For an HTTP proxy:

 import asyncio
 import aiohttp

 proxy = 'http://127.0.0.1:7890'

 async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://httpbin.org/get', proxy=proxy) as response:
            print(await response.text())


 if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

If the proxy needs a username and password, use the same pattern as `requests`:

 proxy = 'http://username:password@127.0.0.1:7890'

Replace username and password as needed.

For SOCKS proxies, install the `aiohttp-socks` helper library:

 pip3 install aiohttp-socks

You can then use `ProxyConnector` from that package to configure the SOCKS proxy:

 import asyncio
 import aiohttp
 from aiohttp_socks import ProxyConnector

 connector = ProxyConnector.from_url('socks5://127.0.0.1:7891')

 async def main():
    async with aiohttp.ClientSession(connector=connector) as session:
        async with session.get('https://httpbin.org/get') as response:
            print(await response.text())


 if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

The result is the same.

This library also supports SOCKS4, HTTP proxies, and proxy authentication. See its official docs for the full set of options.

7. Proxy Setup in `Pyppeteer`

`Pyppeteer` uses a Chromium browser similar to Chrome, so its setup looks much like Selenium’s. For a non-authenticated HTTP proxy, pass the proxy via args:

 import asyncio
 from pyppeteer import launch

 proxy = '127.0.0.1:7890'

 async def main():
    browser = await launch({'args': ['--proxy-server=http://' + proxy], 'headless': False})
    page = await browser.newPage()
    await page.goto('https://httpbin.org/get')
    print(await page.content())
    await browser.close()


 if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

The output looks like this:

 {
  "args": {},
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3494.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-5e8f442c-12b1ed7865b049007267a66c"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

Again, the proxy is clearly active.

SOCKS proxies work the same way. Just change the scheme to socks5:

 import asyncio
 from pyppeteer import launch

 proxy = '127.0.0.1:7891'

 async def main():
    browser = await launch({'args': ['--proxy-server=socks5://' + proxy], 'headless': False})
    page = await browser.newPage()
    await page.goto('https://httpbin.org/get')
    print(await page.content())
    await browser.close()

 if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(main())

The result is the same.

8. Proxy Setup in `Playwright`

Compared with Selenium and Pyppeteer, `Playwright` makes proxy configuration easier because it exposes a dedicated `proxy` parameter when launching the browser.

For an HTTP proxy, use this setup:

 from playwright.sync_api import sync_playwright

 with sync_playwright() as p:
    browser = p.chromium.launch(proxy={
        'server': 'http://127.0.0.1:7890'
    })
    page = browser.new_page()
    page.goto('https://httpbin.org/get')
    print(page.content())
    browser.close()

When calling `launch()`, pass a `proxy` dictionary. The required field is `server`, which should contain the HTTP proxy address.

The output looks like this:

 {
  "args": {},
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Host": "httpbin.org",
    "Sec-Ch-Ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"92\"",
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4498.0 Safari/537.36",
    "X-Amzn-Trace-Id": "Root=1-60e99eef-4fa746a01a38abd469ecb467"
  },
  "origin": "210.173.1.204",
  "url": "https://httpbin.org/get"
 }

For a SOCKS proxy, the setup is identical. Just change the `server` value to the SOCKS proxy address:

 from playwright.sync_api import sync_playwright

 with sync_playwright() as p:
    browser = p.chromium.launch(proxy={
        'server': 'socks5://127.0.0.1:7891'
    })
    page = browser.new_page()
    page.goto('https://httpbin.org/get')
    print(page.content())
    browser.close()

The output is the same as before.

If the proxy requires authentication, Playwright also keeps that simple. Add `username` and `password` to the `proxy` object. For example, if the credentials are `foo` and `bar`:

 from playwright.sync_api import sync_playwright

 with sync_playwright() as p:
    browser = p.chromium.launch(proxy={
        'server': 'http://127.0.0.1:7890',
        'username': 'foo',
        'password': 'bar'
    })
    page = browser.new_page()
    page.goto('https://httpbin.org/get')
    print(page.content())
    browser.close()

That’s all you need to enable authenticated proxies in Playwright.

9. Summary

This guide covered proxy configuration across several common request libraries. The setup patterns are similar, and once you understand them, adding proxies becomes an easy way to handle IP bans and rate limits in future scraping work.

By routing traffic through proxies in different locations, you can simulate requests from specific regions and collect localized data. Proxies also hide the crawler’s real IP address, which helps reduce blocking, protect privacy, and improve success rates when sites try to detect repeated requests from a single source.

Featured: 24 Top Global Proxy Providers

How to Use Proxy IPs in Python 3 Crawlers

Related Reading

What is a crawler proxy?

Why should you use a proxy?

How do you test whether a proxy works?

How do you get usable proxies from a proxy pool?

How can you maintain good proxy quality?

How do you prevent proxy abuse?

How do you set a proxy request timeout in a Python 3 crawler?

Sponsor

Blog

Popular Blog

Types of Proxies

Related Reading

What is a crawler proxy?

Why should you use a proxy?

How do you test whether a proxy works?

How do you get usable proxies from a proxy pool?

How can you maintain good proxy quality?

How do you prevent proxy abuse?

How do you set a proxy request timeout in a Python 3 crawler?

Sponsor

Blog

Popular Blog

Types of Proxies

Related Reading

What is a crawler proxy?

Why should you use a proxy?

How do you test whether a proxy works?

How do you get usable proxies from a proxy pool?

How can you maintain good proxy quality?

How do you prevent proxy abuse?

How do you set a proxy request timeout in a Python 3 crawler?

Sponsors

Blog

Popular Blog Posts

Proxy Categories

How to Use Proxy IPs in Python 3 Crawlers

Preparation

Get Python 3 Crawler Proxies

Best US Static Proxy IP

Cheapest Static Proxy

Best-Value Static Proxy

2. urllib

3. Proxy Setup in `requests`

4. Proxy Setup in `httpx`

5. Proxy Setup in `Selenium`

6. Proxy Setup in `aiohttp`

7. Proxy Setup in `Pyppeteer`

8. Proxy Setup in `Playwright`

9. Summary

Related Reading

European Static Residential Proxy IP

German Static Residential Proxy IP

Best UK Static Residential Proxy (UK ISP Proxy)

What is a crawler proxy?

Why should you use a proxy?

How do you test whether a proxy works?

How do you get usable proxies from a proxy pool?

How can you maintain good proxy quality?

How do you prevent proxy abuse?

How do you set a proxy request timeout in a Python 3 crawler?

Related Reading

European Static Residential Proxy IP

German Static Residential Proxy IP

Best UK Static Residential Proxy (UK ISP Proxy)

What is a crawler proxy?

Why should you use a proxy?

How do you test whether a proxy works?

How do you get usable proxies from a proxy pool?

How can you maintain good proxy quality?

How do you prevent proxy abuse?

How do you set a proxy request timeout in a Python 3 crawler?

Related Reading

European Static Residential Proxy IP

German Static Residential Proxy IP

Best UK Static Residential Proxy (UK ISP Proxy)

What is a crawler proxy?

Why should you use a proxy?

How do you test whether a proxy works?

How do you get usable proxies from a proxy pool?

How can you maintain good proxy quality?

How do you prevent proxy abuse?

How do you set a proxy request timeout in a Python 3 crawler?