Extract URL Preview Content with PHP and jQuery

This post demonstrates how to fetch meta title, description and image from a submitted URL similar to social media previews using jQuery, AJAX and PHP. Learn to extract URL content and build a link preview tool using PHP and jQuery. 

Extract URL Preview Content with PHP and jQuery

Sometimes you want to show rich preview of a URL to user to enhance user experience and display some meta information of a URL before user visits it. In this post you will learn how to fetch metadata information from URL like title, description and image with PHP and jQuery. We will be creating following files to achieve this:

  • index.html: Contains HTML form that will allow us to submit a URL for extraction.
  • extract-contents.php: Contains the code to fetch required data from submitted URL.
  • javascript.js: Contains the code to send AJAX request to extract-contents.php.
  • style.css: Contains all the style formatting for our HTML page and URL data preview box.

To extract URL preview content, the extract-contents.php will be doing the main job.

  • Prepare Regular Expression to Validate URL.
  • Validate the URL and fetch the URL content.
  • Open a new DOM document and load the fetched content into DOM.
  • Search for first image in content, title and description tags.
  • Prepare the HTML preview container and return the response.

Prepare HTML Form and Link Preview Container

Prepare a web page with HTML form with input field and submit button as user interface to enter the URL. This form then triggers AJAX request to PHP script to extract metadata from URL.

index.html

<!DOCTYPE html>
<html>
<head>
<title>Extract URL Preview Content with PHP and jQuery - Demo</title>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
<meta content="width=device-width, initial-scale=1, maximum-scale=1" name="viewport" />
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
<link rel="stylesheet" href="css/style.css" />
<script type="text/javascript" src="js/jquery-3.1.1.min.js"></script>
<script type="text/javascript" src="js/javascript.js"></script>
</head>
<body>
<section class="section py-4">
<div class="container">
<div class="extract-wrapper">
<label>Enter an absolute URL like https://www.codestacked.info</label>
<form class="url-extract-form">
<div class="input-group">
<input type="url" class="form-control url-input" value="" required="required" placeholder="Enter a URL to extract contents" />
<button type="submit" class="btn btn-green">Extract</button>
</div>
<div class="loader">
<i class="fa fa-spinner fa-spin"></i>
</div>
</form>
<div class="content-wrapper" id="content-wrapper"></div>
</div>
</div>
</section>
</body>
</html>

Send AJAX Request to PHP Script

This code block prevents the default form submission behavior and sends request to PHP script via jQuery AJAX to fetch meta tags information from URL.

javascript.js

$(document).ready(function(){
$(".url-extract-form").on("submit",function(e){
e.preventDefault();
var url = $(".url-input").val();
$(".content-wrapper").hide();
if(url != ''){
$(".loader").fadeIn();
$.ajax({
url: "extract-contents.php",
type: "POST",
data:{
url: url
},
success: function(data){
$(".content-wrapper").html(data).slideDown();
$(".loader").fadeOut();
}
});
}
});
});
 

Extract URL Metadata for Link Preview in PHP

This PHP script validates the submitted URL, loads its HTML content to read meta tags from URL, and uses DOM and XPath to find title, description and image metadata. We open DOM Document in PHP and extract content from URL in order to generate link preview in PHP. The new domxpath() will be used for accessing elements in loaded DOM document using xpath queries.

  • Create a regular expression to validate the submitted URL.
  • If URL is not valid prepare and return an error response.
  • Set title and description variables with empty values.
  • Prepare an array for images in case there is no open graph image found on page, we will use the first image from document.
  • We will fetch the contents of URL if URL is valid.
  • Open a new DOM document and load this fetched content from URL.
  • Loop through all images in document and add them to $images array.
  • Get title from URL with xpath query and set it to $title variable. First fetch open graph meta title, if it does not exist then get the content of document title tag.
  • Get description from URL with xpath query and set it to $description variable. First fetch open graph meta description, if it does not exist then get the description from type="description" meta tag.
  • Get the image from URL with xpath query and set it to $image variable. First fetch the open graph meta image, if it does not exist then use the first image from $images array.
  • Finally return the response as link preview in HTML format.

extract-contents.php

<?php
if(!empty($_POST)){
$post = $_POST;

$url = strtolower($post['url']);
$url = str_starts_with($url, 'http') ? $url : 'https://'. $url;

// regular expression to validate url
$regex = '/^((https?|ftp):\/\/)(www\.)?[\w\-]+\.[a-z]{2,4}\/?[\w\/\-]*(\.[a-z]{2,4})?$/';

preg_match($regex, $url, $hostname);

// Check if url is a valid url
if(preg_match($regex, $url)){
// Get contents of url
$content =@file_get_contents($url);

// If failed to get contents show an error
if(!$content){
die('<div class="error">Error parsing the submitted URL.</div>');
}
$title = $description = "";

$images_arr = [];

// Open new dom document object
$dom = new domDocument('1.0', 'UTF-8');

// Load url content to dom document object
@$dom->loadHTML($content);

// Get images from dom document
$images = $dom->getElementsByTagName('img');

// Loop through images and push them to images array
foreach ($images as $image)
{
$src = parse_url($image->getAttribute('src'));
if($src['path'])
$images_arr[]=$image->getAttribute('src');
}

// Open xpath object for current dom document
$xpath = new domxpath($dom);
$og_title = $xpath -> query('//meta[@property="og:title"]');
$og_description = $xpath -> query('//meta[@property="og:description"]');
$og_image = $xpath -> query('//meta[@property="og:image"]');

$meta_description = @$xpath -> query('//meta[@name="description"]');
$meta_title = @$xpath -> query('//title');

// Prepare title of document
if($og_title->length){
$title = $og_title -> item(0)->getAttribute('content');
}elseif($meta_title->length){
$title = $meta_title -> item(0)->textContent;
}

// Prepare description of document
if($og_description->length){
$description = $og_description -> item(0)->getAttribute('content');
}elseif($meta_description->length){
$description = $meta_description -> item(0)->getAttribute('content');
}

// Prepare image of document
if($og_image->length){
$image = $og_image -> item(0)->getAttribute('content');
}elseif($meta_description->length){
$image = reset($images_arr);
}?>
<div class="url-info-box">
<?php
if(!empty($image)){
// Handling the https urls for images
$image = (preg_match('/^(https?)/',$image)) || (preg_match('/^(\/\/)/',$image))
? $image
: $hostname[0].$image;

list($width, $height) = getimagesize($image);
?>
<div class="image">
<img src="<?=$image;?>" class="img-responsive" width="<?=$width?>" height="<?=$height?>" alt=""/>
</div>
<?php } ?>
<div class="data">
<div class="title">
<?=$title;?>
</div>
<div class="description"><?=$description;?></div>
</div>
</div>
<?php
}else{
echo '<div class="error">Invalid URL submitted.</div>';
}
}
 

Add CSS Styles for Form, Loader and Preview Container

Add CSS styles for whole page including form input field, spinner and URL preview container.

style.css

* {
box-sizing: border-box;
}
html,body {
margin: 0;
padding: 0;
}
body {
background-color: #f6f6f6;
font-family: "Segoe UI", "Roboto", "Helvetica", sans-serif;
font-size: 15px;
font-weight: normal;
font-style: normal;
}
.py-4 {
padding-top: 1rem;
padding-bottom: 1rem;
}
.container {
max-width: 1024px;
margin: 0 auto;
padding-left: 15px;
padding-right: 15px;
}
.url-extract-form {
position: relative;
margin-bottom: 1rem;
}
.extract-wrapper label {
display: inline-block;
margin-bottom: 0.25rem;
}
.input-group {
position: relative;
display: flex;
flex-wrap: wrap;
align-items: stretch;
width: 100%;
}
.form-control {
border: 1px solid #ddd;
padding: 10px;
position: relative;
font-size: inherit;
flex: 1 1 auto;
width: 1%;
min-width: 0;
}
.form-control:focus {
border-color: #00c0ef;
outline: 0;
}
.loader {
position: absolute;
inset: 0;
font-size: 1.75rem;
background: rgba(150,150,150,0.5);
z-index: 5;
padding: 0px 10px;
display: none;
color: #006699;
text-align: center;
}
.url-extract-form button {
display: inline-block;
padding: 5px 10px;
cursor: pointer;
font: inherit;
background: #00a65a;
border: 1px solid #009549;
color: #fff;
margin-left: -1px;
}
.content-wrapper .error {
padding: 10px;
background: #e95454;
color: #fff;
}
.url-info-box {
background: #fefefe;
border: 1px solid #fefefe;
overflow: hidden;
font-size: 13px;
max-width: 300px;
}
.img-responsive {
max-width: 100%;
height: auto;
display: block;
margin: 0 auto;
}
.url-info-box .data {
padding: 15px;
background: #efefef;
}
.url-info-box .title {
font-weight: bold;
max-height: 35px;
overflow: hidden;
color: #3778cd;
}