php code for scrape links

#webdev #php

To scrape links from a webpage using PHP, you can use the file_get_contents function to fetch the HTML content and then parse it using the DOMDocument class. Here's a simple example: Site : SportsFire

<?php

// Function to scrape links from a given URL
function scrapeLinks($url) {
    // Get the HTML content of the webpage
    $html = file_get_contents($url);

    // Create a new DOMDocument instance
    $dom = new DOMDocument();

    // Suppress errors due to malformed HTML
    libxml_use_internal_errors(true);

    // Load the HTML content
    $dom->loadHTML($html);

    // Clear the errors
    libxml_clear_errors();

    // Create an array to hold the links
    $links = [];

    // Get all <a> elements
    $anchors = $dom->getElementsByTagName('a');

    // Loop through the anchors and collect the href attributes
    foreach ($anchors as $anchor) {
        $href = $anchor->getAttribute('href');
        // Add the link to the array if it's not empty
        if (!empty($href)) {
            $links[] = $href;
        }
    }

    return $links;
}

// Example usage
$url = 'https://www.example.com'; // Change this to the URL you want to scrape
$links = scrapeLinks($url);

// Print the scraped links
foreach ($links as $link) {
    echo $link . PHP_EOL;
}
?>

Top comments (1)

Ravavyr • Oct 17

Make this useful by using recursion and letting it crawl every link it finds too.
Of course, from there it gets to be more fun, as you then have to watch memory limits, write the returned data to files, etc etc, plus you can parse out useful information like seo tags and meta/open graph data.
crawling is fun, until you get blocked :)

DEV Community

php code for scrape links

Top comments (1)

Read next

Create a Slack Bot with NodeJS and Slack Bolt API

How to Secure Static Resources with Open Source WAF, SafeLine

Hoppscotch: The Best Browser-Based API Testing Tool – A Postman Online Alternative

Smart Dropdowns in React: Using useReducer and useRef for Outside Click Handling