DEV Community

Cover image for php code for scrape links
sportsfire
sportsfire

Posted on

php code for scrape links

To scrape links from a webpage using PHP, you can use the file_get_contents function to fetch the HTML content and then parse it using the DOMDocument class. Here's a simple example: Site : SportsFire

<?php

// Function to scrape links from a given URL
function scrapeLinks($url) {
    // Get the HTML content of the webpage
    $html = file_get_contents($url);

    // Create a new DOMDocument instance
    $dom = new DOMDocument();

    // Suppress errors due to malformed HTML
    libxml_use_internal_errors(true);

    // Load the HTML content
    $dom->loadHTML($html);

    // Clear the errors
    libxml_clear_errors();

    // Create an array to hold the links
    $links = [];

    // Get all <a> elements
    $anchors = $dom->getElementsByTagName('a');

    // Loop through the anchors and collect the href attributes
    foreach ($anchors as $anchor) {
        $href = $anchor->getAttribute('href');
        // Add the link to the array if it's not empty
        if (!empty($href)) {
            $links[] = $href;
        }
    }

    return $links;
}

// Example usage
$url = 'https://www.example.com'; // Change this to the URL you want to scrape
$links = scrapeLinks($url);

// Print the scraped links
foreach ($links as $link) {
    echo $link . PHP_EOL;
}
?>

Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
ravavyr profile image
Ravavyr

Make this useful by using recursion and letting it crawl every link it finds too.
Of course, from there it gets to be more fun, as you then have to watch memory limits, write the returned data to files, etc etc, plus you can parse out useful information like seo tags and meta/open graph data.
crawling is fun, until you get blocked :)