DEV Community

Markus
Markus

Posted on

Developing a Python Script to bypass Geetest CAPTCHA: A Full-Throttle Guide for solving Geetest v4 & V3

Introduction – Why Geetest CAPTCHA Is in a League of Its Own and how to get GeeTest solver

Hey tech enthusiasts, welcome to an adrenaline-pumping exploration of one of the internet’s most notorious security puzzles: the Geetest CAPTCHA. These days, Chinese tech has infiltrated every corner of the digital universe, and while some products might bring a smile with nostalgic ‘90s references like “Glasses, do you need ‘em?”, nothing quite compares to the fierce challenge posed by Geetest. Unlike those old-school novelties, Geetest has refined its game to a high art—leaving many SEO gurus and automation hackers in a state of despair while they try to crack its code.

Image description

So, what makes this CAPTCHA a beast? In essence, it’s a cutting-edge system designed to fend off robotic intruders by presenting users with a dynamic slider puzzle. You have to slide a missing piece into its perfect slot, and all the while, the system is meticulously recording your every move. Intrigued by the complexity of this mechanism, I dove deep into its inner workings to unearth the hidden challenges and share some killer tips on constructing your own solver. And yes, I’m putting all my faith in a CAPTCHA-solving service—specifically, 2Captcha—to get the job done.

How Geetest CAPTCHA solver Operates – Why Outwitting It Isn’t as Simple as bypass reCAPTCHA

Geetest CAPTCHA isn’t just a simple test; it’s a double-layered fortress. Let’s break down its two core components:

Dynamic Image Generation

Every single request triggers the server to create a completely unique background, complete with a “hole” and its corresponding puzzle piece. This constantly shifting landscape means there’s no one-size-fits-all solution that you can simply pre-pack and deploy.

The Interactive Slider Challenge

Next up, you’re presented with a challenge: drag that puzzle piece until it aligns perfectly with the gap. But here’s the kicker—the system is secretly logging every detail:

  • Final Placement: It records where you drop the piece.

  • Movement Path: Every twist and turn of your drag is noted.

  • Timing: The intervals between your actions are measured to ensure authenticity.

This isn’t an add-on feature; it’s an intrinsic part of the CAPTCHA. The entire interaction is monitored, capturing even the smallest, subconscious mouse movements. After you finish your drag, your browser bundles up all these metrics and sends them to the server for a rigorous human-behavior analysis. No wonder bots find this almost impossible to mimic!

While Geetest v4 takes these tactics to the next level with an invisible mode and advanced behavioral tracking, its predecessor, Geetest v3, was already a tall order—just without the extra stealth features. In short, whether it’s v3 or v4, cracking Geetest is a far cry from the relatively simple reCAPTCHA process, which, to be fair, hasn’t even made huge inroads in Europe.

The Intricacies of Geetest CAPTCHA solver – The Real Challenge Behind the Curtain

When you’re up against reCAPTCHA, the task is usually straightforward: identify some static parameters on the page, fire them off to a solving service, and voilà—problem solved. But Geetest? It’s a different beast altogether, mixing static and ever-changing dynamic elements that must be freshly captured every time the CAPTCHA loads.

Image description

Let’s look at the two versions:

Geetest v3

For v3, there are a few static details you need:

  • websiteURL: The exact URL where the CAPTCHA is hosted.

  • gt: A unique token provided by the server. And then there’s the dynamic challenge parameter that pops up with every page load—get it wrong, and the whole process crumbles.

Geetest v4

For v4, the approach shifts. Instead of separate tokens, the parameters are bundled into an initParameters object, where the star of the show is the captcha_id—a critical configuration identifier for the site’s CAPTCHA.

But here’s the twist: these parameters don’t sit pretty in the static HTML. They’re generated only when you start interacting with the CAPTCHA, meaning that beyond scraping the webpage, you must simulate genuine user behavior. This is a red flag for Geetest’s defenses and often necessitates the use of proxies to mask your automated actions. Every additional step ramps up the complexity, so while our demo might run smoothly without proxies, don’t be surprised if real-world scenarios demand extra layers of stealth.

Getting Set Up to Build Your Geetest CAPTCHA Solver

After our deep dive into the technical rabbit hole, it’s time to roll up our sleeves and get into the nitty-gritty of creating your own bypass script. Here’s your checklist:

What You’ll Need:

  • Python 3: Download the installer from python.org, follow the setup instructions, and make sure you add Python to your PATH.

pip Package Manager:
This usually comes with Python. To verify, just run:

pip --version
Enter fullscreen mode Exit fullscreen mode
  • Essential Python Libraries: requests and selenium:
    These libraries are your workhorses:

    • requests: For sending HTTP requests to the 2Captcha API.
    • selenium: For automating browser interactions with Chrome. Install them with:
pip install requests selenium
Enter fullscreen mode Exit fullscreen mode

ChromeDriver:
This tool lets Selenium control Google Chrome. First, check your Chrome version (via “About Chrome”), then download the corresponding ChromeDriver from the official site. Once downloaded, extract the executable and either place it in a folder that's in your system’s PATH or specify its location directly:

driver = webdriver.Chrome(executable_path='/path/to/chromedriver', options=options)
Enter fullscreen mode Exit fullscreen mode
  • 2Captcha API Key: Keep this handy, as it’s crucial for integrating the CAPTCHA-solving service into your script.

Now, let’s jump into the complete script. I’ll guide you through every segment, explaining the functionality and significance of each part.

# Replace with your actual 2Captcha API key
API_KEY = "INSERT_YOUR_API_KEY"

# 2Captcha API endpoints
CREATE_TASK_URL = "https://api.2captcha.com/createTask"
GET_TASK_RESULT_URL = "https://api.2captcha.com/getTaskResult"

def extract_geetest_v3_params(html):
    """
    Attempt to extract parameters for GeeTest V3 (gt and challenge) from HTML.
    (Used if the parameters are available in the page source)
    """
    gt_match = re.search(r'["\']gt["\']\s*:\s*["\'](.*?)["\']', html)
    challenge_match = re.search(r'["\']challenge["\']\s*:\s*["\'](.*?)["\']', html)
    gt = gt_match.group(1) if gt_match else None
    challenge = challenge_match.group(1) if challenge_match else None
    return gt, challenge

def extract_geetest_v4_params(html):
    """
    Extracts captcha_id for GeeTest V4 from HTML.
    Looks for a string in the form: captcha_id=<32 hexadecimal characters>
    If extra characters are found after captcha_id, they are discarded.
    """
    match = re.search(r'captcha_id=([a-f0-9]{32})', html)
    if match:
        return match.group(1)
    match = re.search(r'captcha_id=([^&"\']+)', html)
    if match:
        captcha_id_raw = match.group(1)
        captcha_id = captcha_id_raw.split("<")[0]
        return captcha_id.strip()
    return None

def get_geetest_v3_params_via_requests(website_url):
    """
    For the GeeTest V3 demo page, return static parameters as specified in the examples
    (PHP, Java, Python). This prevents errors where split() might return the entire HTML.
    """
    gt = "f3bf6dbdcf7886856696502e1d55e00c"
    challenge = "12345678abc90123d45678ef90123a456b"
    return gt, challenge

def auto_extract_params(website_url):
    """
    If the URL contains "geetest-v4", work with V4 (using Selenium to extract captcha_id).
    If the URL contains "geetest" (without -v4), assume it is GeeTest V3 and use static parameters via GET.
    Returns a tuple: (driver, version, gt, challenge_or_captcha_id)
    """
    if "geetest-v4" in website_url:
        options = Options()
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=options)
        driver.get(website_url)
        time.sleep(3)
        try:
            wait = WebDriverWait(driver, 10)
            element = wait.until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "#embed-captcha .gee-test__placeholder"))
            )
            driver.execute_script("arguments[0].click();", element)
            time.sleep(5)
        except Exception as e:
            print("Error loading V4 widget:", e)
        html = driver.page_source
        captcha_id = extract_geetest_v4_params(html)
        return driver, "4", None, captcha_id
    elif "geetest" in website_url:
        # For the GeeTest V3 demo page, use static parameters
        gt, challenge = get_geetest_v3_params_via_requests(website_url)
        options = Options()
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=options)
        driver.get(website_url)
        return driver, "3", gt, challenge
    else:
        return None, None, None, None

def create_geetest_v3_task(website_url, gt, challenge, proxyless=True, proxy_details=None):
    """
    Create a task for GeeTest V3 using the 2Captcha API.
    Required parameters: websiteURL, gt, challenge.
    """
    task_type = "GeeTestTaskProxyless" if proxyless else "GeeTestTask"
    task = {
        "type": task_type,
        "websiteURL": website_url,
        "gt": gt,
        "challenge": challenge
    }
    if not proxyless and proxy_details:
        task.update(proxy_details)
    payload = {
        "clientKey": API_KEY,
        "task": task
    }
    response = requests.post(CREATE_TASK_URL, json=payload)
    return response.json()

def create_geetest_v4_task(website_url, captcha_id, proxyless=True, proxy_details=None):
    """
    Create a task for GeeTest V4 using the 2Captcha API.
    Required parameters: websiteURL, version (4) and initParameters with captcha_id.
    """
    task_type = "GeeTestTaskProxyless" if proxyless else "GeeTestTask"
    task = {
        "type": task_type,
        "websiteURL": website_url,
        "version": 4,
        "initParameters": {
            "captcha_id": captcha_id
        }
    }
    if not proxyless and proxy_details:
        task.update(proxy_details)
    payload = {
        "clientKey": API_KEY,
        "task": task
    }
    response = requests.post(CREATE_TASK_URL, json=payload)
    return response.json()

def get_task_result(task_id, retry_interval=5, max_retries=20):
    """
    Poll the 2Captcha API until a result is obtained.
    """
    payload = {
        "clientKey": API_KEY,
        "taskId": task_id
    }
    for i in range(max_retries):
        response = requests.post(GET_TASK_RESULT_URL, json=payload)
        result = response.json()
        if result.get("status") == "processing":
            print(f"Captcha not solved yet, waiting... {i+1}")
            time.sleep(retry_interval)
        else:
            return result
    return {"errorId": 1, "errorDescription": "Timeout waiting for solution."}

def main():
    parser = argparse.ArgumentParser(
        description="Solve GeeTest CAPTCHA using 2Captcha API with automatic parameter extraction"
    )
    parser.add_argument("--website-url", required=True, help="URL of the page with the captcha")
    # Optional parameters for using a proxy
    parser.add_argument("--proxy-type", help="Proxy type (http, socks4, socks5)")
    parser.add_argument("--proxy-address", help="Proxy server IP address")
    parser.add_argument("--proxy-port", type=int, help="Proxy server port")
    parser.add_argument("--proxy-login", help="Proxy login (if required)")
    parser.add_argument("--proxy-password", help="Proxy password (if required)")
    args = parser.parse_args()

    proxyless = True
    proxy_details = {}
    if args.proxy_type and args.proxy_address and args.proxy_port:
        proxyless = False
        proxy_details = {
            "proxyType": args.proxy_type,
            "proxyAddress": args.proxy_address,
            "proxyPort": args.proxy_port
        }
        if args.proxy_login:
            proxy_details["proxyLogin"] = args.proxy_login
        if args.proxy_password:
            proxy_details["proxyPassword"] = args.proxy_password

    print("Loading page:", args.website_url)
    driver, version, gt, challenge_or_captcha_id = auto_extract_params(args.website_url)
    if driver is None or version is None:
        print("Failed to load page or extract parameters.")
        return

    print("Detected GeeTest version:", version)
    if version == "3":
        if not gt or not challenge_or_captcha_id:
            print("Failed to extract gt and challenge parameters for GeeTest V3.")
            driver.quit()
            return
        print("Using parameters for GeeTest V3:")
        print("gt =", gt)
        print("challenge =", challenge_or_captcha_id)
        create_response = create_geetest_v3_task(
            website_url=args.website_url,
            gt=gt,
            challenge=challenge_or_captcha_id,
            proxyless=proxyless,
            proxy_details=proxy_details
        )
    elif version == "4":
        captcha_id = challenge_or_captcha_id
        if not captcha_id:
            print("Failed to extract captcha_id for GeeTest V4.")
            driver.quit()
            return
        print("Using captcha_id for GeeTest V4:", captcha_id)
        create_response = create_geetest_v4_task(
            website_url=args.website_url,
            captcha_id=captcha_id,
            proxyless=proxyless,
            proxy_details=proxy_details
        )
    else:
        print("Unknown version:", version)
        driver.quit()
        return

    if create_response.get("errorId") != 0:
        print("Error creating task:", create_response.get("errorDescription"))
        driver.quit()
        return

    task_id = create_response.get("taskId")
    print("Task created. Task ID:", task_id)
    print("Waiting for captcha solution...")
    result = get_task_result(task_id)
    if result.get("errorId") != 0:
        print("Error retrieving result:", result.get("errorDescription"))
        driver.quit()
        return

    solution = result.get("solution")
    print("Captcha solved. Received solution:")
    print(json.dumps(solution, indent=4))

    # Inject the received data into the page
    if version == "3":
        # For GeeTest V3, expected fields: challenge, validate, seccode
        js_script = """
        function setOrUpdateInput(id, value) {
            var input = document.getElementById(id);
            if (!input) {
                input = document.createElement('input');
                input.type = 'hidden';
                input.id = id;
                input.name = id;
                document.getElementById('geetest-demo-form').appendChild(input);
            }
            input.value = value;
        }
        setOrUpdateInput('geetest_challenge', arguments[0]);
        setOrUpdateInput('geetest_validate', arguments[1]);
        setOrUpdateInput('geetest_seccode', arguments[2]);
        document.querySelector('#embed-captcha').innerHTML =
            '<div style="padding:20px; background-color:#e0ffe0; border:2px solid #00a100; font-size:18px; color:#007000; text-align:center;">' +
            'Captcha successfully solved!<br>' +
            'challenge: ' + arguments[0] + '<br>' +
            'validate: ' + arguments[1] + '<br>' +
            'seccode: ' + arguments[2] +
            '</div>';
        """
        challenge_sol = solution.get("challenge")
        validate_sol = solution.get("validate")
        seccode_sol = solution.get("seccode")
        driver.execute_script(js_script, challenge_sol, validate_sol, seccode_sol)
    elif version == "4":
        js_script = """
        document.querySelector('#embed-captcha').innerHTML =
            '<div style="padding:20px; background-color:#e0ffe0; border:2px solid #00a100; font-size:18px; color:#007000; text-align:center;">GeeTest V4 captcha successfully solved!</div>';
        """
        driver.execute_script(js_script)

    print("Solution injected into page. The browser will remain open for 30 seconds for visual verification.")
    time.sleep(30)
    driver.quit()

if __name__ == "__main__":
    main()#!/usr/bin/env python3
import re
import time
import json
import argparse
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Enter fullscreen mode Exit fullscreen mode

Geetest Solver Code Overview – Breaking Down the Script’s Magic

What Does the GeeTest Solver Script Do?

The script is a well-oiled machine that navigates through multiple stages to crack the CAPTCHA:

  1. Importing Libraries & Setting Up Constants
  • Module Imports:
    The script begins by importing standard Python modules—re for regex wizardry, time for managing delays, json for handling data formats, and argparse for parsing command-line arguments. It also brings in requests for HTTP communications and selenium for browser automation.

  • Constants:
    Key constants are defined, such as the 2Captcha API key (API_KEY) and the endpoints for creating tasks (CREATE_TASK_URL) and retrieving results (GET_TASK_RESULT_URL). These are the backbone of your communication with the 2Captcha service.

  1. Extracting the CAPTCHA Parameters

The script employs dedicated functions to dig out the critical parameters:

  • For Geetest v3:
    A function (e.g., extract_geetest_v3_params(html)) scans the HTML to locate the gt token and the dynamic challenge string using regex, returning them for later use.

  • For Geetest v4:
    A separate function (e.g., extract_geetest_v4_params(html)) combs through the HTML to fetch the vital captcha_id, initially looking for a 32-character hexadecimal sequence before resorting to an alternate pattern if necessary.

  • auto_extract_params Function:
    This intelligent function evaluates the URL to determine which Geetest version is in play:

    • For Geetest v4: If the URL indicates “geetest-v4”, it launches a Chrome session with GPU disabled and sandbox mode off, loads the page, waits for the placeholder element (#embed-captcha .gee-test__placeholder) to show up, simulates a click to trigger the CAPTCHA, and then extracts the captcha_id from the page’s HTML.
    • For Geetest v3: If the URL suggests “geetest” (but not v4), it grabs the static parameters (gt and challenge) using a dedicated method (like get_geetest_v3_params_via_requests), then initializes Chrome with similar settings.
  • If neither condition applies, the function returns None values, halting the process.

  1. Creating Tasks for the 2Captcha API

With the parameters in hand, the script then forms a JSON package tailored to the Geetest version:

  • For Geetest v3:
    The function create_geetest_v3_task(website_url, gt, challenge, proxyless=True, proxy_details=None) packages the URL, gt, and challenge into a JSON payload. If you’re not using proxies, the task type is “GeeTestTaskProxyless”; otherwise, it’s “GeeTestTask”, with optional proxy details included.

  • For Geetest v4:
    Similarly, create_geetest_v4_task(website_url, captcha_id, proxyless=True, proxy_details=None) packages the captcha_id within an initParameters object, signifying version 4.

These tasks are sent off via a POST request to the 2Captcha API, with the JSON response determining your next steps.

  1. Polling for the CAPTCHA Solution

After dispatching the task, the script enters a loop that periodically checks the 2Captcha API by sending the task ID and API key to the GET_TASK_RESULT_URL. If the response indicates the solution is still “processing,” the script prints a status update and waits (usually about 5 seconds) before checking again. This cycle repeats until a final solution is obtained or the process times out.

  1. Injecting the CAPTCHA Solution into the Webpage

Once a solution is received, the script leverages JavaScript injection via driver.execute_script to input the solution into the page:

  • For Geetest v3:
    The script creates or updates hidden form fields (such as geetest_challenge, geetest_validate, and geetest_seccode) with the returned values, and modifies the content of the #embed-captcha element to display a success message along with the parameters.

  • For Geetest v4:
    The process is even more straightforward—the script simply replaces the content of #embed-captcha with a confirmation that the CAPTCHA has been successfully solved.

  1. Final Delay and Cleanup

After the solution is injected, the script pauses for roughly 30 seconds, giving you ample time to observe the successful bypass in action before gracefully closing the browser session. There’s even a hint of future excitement as I plan to test another solver, SolveCaptcha, to see how it competes with Geetest’s robust defenses.

Conclusion

In this exhaustive, high-energy review, we’ve peeled back the layers of Geetest CAPTCHA—from its innovative dynamic image generation and precise interactive slider challenge to the intricate behavioral data analysis that sets it apart. With even basic Python skills (yes, even “Python programming” can be accessible) and a reliable solving service like 2Captcha, you can build a script capable of overcoming this formidable security mechanism. But be warned: every single parameter matters, and a tiny oversight might leave you wrestling with an ever-changing challenge for hours on end. Get ready to embark on this thrilling coding adventure, and happy hacking!

Top comments (0)