Sunday, March 4, 2018

Extract URL Contents With PHP and jQuery

How to extract url contents ? This tutorial will guide you how to extract url contents like many websites "Facebook, Twitter, Google" and retrieve the information about what any url title and description is about.

Extract URL Contents With PHP and jQuery
Lets start with html which will hold all elements, jQuery to send url to our extract_contents.php which will use regular expressions and fetch "Title, Description and First Image" of submitted url.

index.php

<!DOCTYPE html>
<html>
    <head>
        <title>Extract URL Contents with PHP and jQuery Demo</title>
        <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
        <script type="text/javascript" src="js/jquery-3.1.1.min.js"></script>
        <script type="text/javascript" src="js/javascript.js"></script>
        <link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
        <link rel="stylesheet" href="css/style.css" />
    </head>
    <body>
        <div class="main-container">
            <div class="extract-wrapper section">
                <label>Enter an absolute URL like http://www.codestacked.info</label>
                <form class="url-form">
                    <div class="fields-container">
                        <div class="loader">
                            <i class="fa fa-spinner fa-spin"></i>
                        </div>
                        <input type="url" class="form-control url-input" value="" required="required" placeholder="Enter a URL to extract contents" />
                        <button type="submit">Extract</button>
                    </div>
                </form>
                <div class="content-wrapper" id="content-wrapper"></div>
            </div>
        </div>
    </body>
</html>


extract-contents.php

<?php
if($_POST){
    $post = $_POST;
    $url = strtolower($post["url"]);
    $url = strpos($url,"http") !== false ? $url : "http://$url";

    //=== regular expression to validate url
    $regEx = "/^((https?|ftp):\/\/)(www\.)?[\w\-]+\.[a-z]{2,4}\/?[\w\/\-]*(\.[a-z]{2,4})?$/";

    preg_match($regEx,$url,$hostname);

    //=== Check if url is a valid url
    if(preg_match($regEx,$url)){
        //=== Get contents of url
        $content [email protected]_get_contents($url);

        //=== If failed to get contents show an error
        if(!$content){
            die('<div class="error">Error parsing the submitted URL.</div>');
        }
        $title = $description = "";

        $images_arr = [];

        //=== Open new dom document object
        $dom = new domDocument("1.0", "UTF-8");

        //=== Load url content to dom document object
        @$dom->loadHTML($content);

        //=== Get images from dom document
        $images = $dom->getElementsByTagName("img");

        //=== Loop through images and push them to images array
        foreach ($images as $image)
        {
            $src = parse_url($image->getAttribute("src"));
            if($src["path"])
                $images_arr[]=$image->getAttribute("src");
        }

        //=== Open xpath object for current dom document
        $xPath = new domxpath($dom);
        $og_title = $xPath -> query("//meta[@property='og:title']");
        $og_description = $xPath -> query("//meta[@property='og:description']");
        $og_image = $xPath -> query("//meta[@property='og:image']");

        $meta_description = @$xPath -> query("//meta[@name='description']");
        $meta_title = @$xPath -> query("//title");

        //=== Prepare title of document
        if($og_title->length){
            $title = $og_title -> item(0)->getAttribute("content");
        }elseif($meta_title->length){
            $title = $meta_title -> item(0)->textContent;
        }

        //=== Prepare description of document
        if($og_description->length){
            $description = $og_description -> item(0)->getAttribute("content");
        }elseif($meta_description->length){
            $description = $meta_description -> item(0)->getAttribute("content");
        }

        //=== Prepare image of document
        if($og_image->length){
            $image = $og_image -> item(0)->getAttribute("content");
        }elseif($meta_description->length){
            $image = reset($images_arr);
        }?>
        <div class="url-info-box">
            <?php
            if(!empty($image)){
                //=== Handling the https urls for images
                $image = (preg_match("/^(https?)/",$image)) || (preg_match("/^(\/\/)/",$image)) ? $image : $hostname[0].$image;
            ?>
            <div class="image">
                <img src="<?php echo $image;?>" class="img-responsive" />
            </div>
            <?php } ?>
            <div class="data">
                <div class="title">
                    <?php echo $title;?>
                </div>
                <div class="description"><?php echo $description; ?></div>
            </div>
        </div>
        <?php
    }else{
        echo '<div class="error">Invalid URL submitted.</div>';
    }
}
?>

javascript.js

$(document).ready(function(){
    $(".url-form").on("submit",function(e){
        e.preventDefault();
        var url = $(".url-input").val();
         $(".content-wrapper").hide();
         if(url != ''){
             $(".loader").fadeIn();
             $.ajax({
                url: "extract-contents.php",
                type: "POST",
                data:{
                    url: url
                },
                success: function(data){
                    $(".content-wrapper").html(data).slideDown();
                    $(".loader").fadeOut();
                }
             });
         }
    });
});

style.css

*{
    box-sizing: border-box;
}
html,body{
    margin: 0px;
    padding: 0px;
}
body{
    background: #f0f0f0;
    font: normal normal 14px Open Sans,Verdana, Arial;
}
.main-container{
    max-width: 1024px;
    margin: 0px auto;
}
.extract-wrapper{
    margin-bottom: 20px;
}
.fields-container {
    position: relative;
    margin-bottom: 20px;
    display: flex;
}
.loader{
    position: absolute;
    font-size: 30px;
    background: rgba(150,150,150,0.5);
    width: 100%;
    height: 100%;
    z-index: 5;
    padding: 0px 10px;
    display: none;
    color: #006699;
    text-align: center;
}
.url-form button{
    background: teal;
    color: #fff;
    border: none;
    padding: 11px 15px;
    font-weight: bold;
    cursor: pointer;
}
input.form-control{
    border: 1px solid #ddd;
    padding: 10px;
    color: #444;
    font-size: 15px;
    width: 100%;
}
.url-info-box{
    background: #fefefe;
    border: 1px solid #fefefe;
    overflow: hidden;
    font-size: 13px;
    display: grid;
    align-items: stretch;
    grid-template-columns: auto auto;
}
.content-wrapper .error{
    padding: 10px;
    background: #e95454;
    color: #fff;
}
.img-responsive{
    height: auto;
    max-width: 100%;
    display: block;
}
.content-wrapper .data{
    padding: 15px;
}
.content-wrapper .title{
    font-weight: bold;
    max-height: 35px;
    overflow: hidden;
    color: #3778cd;
}