This blog was initially posted to Crawlbase Blog
Temu is a fast-growing e-commerce platform known for its huge selection of products at competitive prices. Covering everything from electronics to fashion and home goods, Temu has become a go-to destination for online shoppers. Its dynamic, JavaScript-rendered pages make data scraping challenging with traditional methods, but with the right tools, it's still achievable.
In this guide, we'll show you how to scrape data from Temu using the Crawlbase Crawling API, designed to handle CAPTCHAs and JavaScript-rendered pages. Whether you're looking to gather product information for analysis, price comparison, or market research, this blog will cover all the essential steps to extract data effectively. You'll learn how to set up your Python environment, create Temu scrapers, handle Temu SERP pagination, and store data in a CSV file for easy access.
By the end of this article, you'll have scrapers ready to scrape valuable data from Temu's listings and product pages. Let's get started!
Complete Code Example to Scrape Temu Search Listings
Here’s the full script for scraping Temu search listings, handling pagination, and saving data to CSV:
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import csv
# Initialize Crawlbase API with your JS Token
crawling_api = CrawlingAPI({ 'token': 'CRAWLBASE_JS_TOKEN' })
# Extract product details from the HTML
def extract_product_info(soup):
products = []
for item in soup.select('div.js-search-goodsList > div.autoFitList > div.EKDT7a3v'):
title = item.select_one('h2._2BvQbnbN').text.strip() if item.select_one('h2._2BvQbnbN') else ''
price = item.select_one('span._2de9ERAH').text.strip() if item.select_one('span._2de9ERAH') else ''
image_url = item.select_one('img.goods-img-external')['src'] if item.select_one('img.goods-img-external') and item.select_one('img.goods-img-external').has_attr('src') else ''
product_url = 'https://www.temu.com' + item.select_one('a._2Tl9qLr1')['href'] if item.select_one('a._2Tl9qLr1') else ''
products.append({
'title': title,
'price': price,
'image_url': image_url,
'product_url': product_url
})
return products
# Function to scrape listings with pagination
def scrape_temu_with_pagination(url):
products = []
response = crawling_api.get(url, {
'ajax_wait': 'true',
'page_wait': '5000',
'css_click_selector': 'div.R8mNGZXv[role="button"]'
})
if response['headers']['pc_status'] == '200':
html_content = response['body'].decode('utf-8')
soup = BeautifulSoup(html_content, 'html.parser')
products.extend(extract_product_info(soup))
return products
# Save scraped data to CSV
def save_to_csv(data, filename='temu_products.csv'):
with open(filename, mode='w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=['title', 'price', 'image_url', 'product_url'])
writer.writeheader()
for item in data:
writer.writerow(item)
# Example usage
products = scrape_temu_with_pagination('https://www.temu.com/search?q=your_search_query')
save_to_csv(products)
temu_products.csv
Snapshot:
Scraping Temu Product Pages
After collecting a list of product URLs from Temu’s search listings, the next step is to scrape details from each product page. This will allow us to gather more specific info such as detailed descriptions, specifications and reviews. Here’s how to do it.
Complete Code Example
Here’s the full script to scrape multiple product pages from Temu, using the URLs from the search listings, and save the data into a CSV file.
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import csv
import re
# Initialize Crawlbase API with your JS Token
crawling_api = CrawlingAPI({ 'token': 'CRAWLBASE_JS_TOKEN' })
# Function to scrape a single product page
def scrape_product_page(url):
response = crawling_api.get(url, {
'ajax_wait': 'true',
'page_wait': '5000'
})
if response['headers']['pc_status'] == '200':
html_content = response['body'].decode('utf-8')
soup = BeautifulSoup(html_content, 'html.parser')
# Extract product details
title = re.sub(r'\s+', ' ', soup.select_one('div._2rn4tqXP').text.strip())
price = soup.select_one('div._1vkz0rqG span:last-child').text.strip()
description = re.sub(r'\s+', ' ', soup.select_one('div.B_OB3uj0').text.strip())
images_url = [img['src'] for img in soup.select('div[role="button"] img.wxWpAMbp')]
# Return product details as a dictionary
return {
'title': title,
'price': price,
'description': description,
'images_url': images_url,
'product_url': url
}
else:
print(f"Failed to fetch page. Status: {response['headers']['pc_status']}")
return None
# Function to save data to CSV
def save_product_data_to_csv(data, filename='temu_product_details.csv'):
with open(filename, mode='w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=['title', 'price', 'description', 'image_url', 'product_url'])
writer.writeheader()
for item in data:
writer.writerow(item)
# Scrape multiple product pages and save to CSV
product_urls = [
'https://www.temu.com/pk-en/goods-detail-g-601099527865713.html',
'https://www.temu.com/pk-en/goods-detail-g-601099537192760.html',
# Add more product URLs here
]
all_products = []
for url in product_urls:
product_data = scrape_product_page(url)
if product_data:
all_products.append(product_data)
# Save all product data to CSV
save_product_data_to_csv(all_products)
temu_product_details.csv
Snapshot:
Final Thoughts
Scraping product data from Temu helps with analyzing market trends, tracking competitors, and studying pricing changes. This guide covered setting up a scraper for search listings and product pages, handling pagination, and saving data to a CSV file.
Using the Crawlbase Crawling API manages JavaScript-heavy content, simplifying data collection. Remember to review Temu’s terms of service to avoid issues, as excessive scraping can impact their servers.
Test and update your code regularly, as website structures can change, requiring adjustments in CSS selectors or logic.
Top comments (0)