I’m working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address.
From a previous article, I shared how I get Geographical Location from an IP Address, you can read more via: http://4rapiddev.com/internet/free-online-tools-get-ip-address-location-organization-isp-hostname-country/. Today, I will show How do I parse Title Description Keywords From A Website by using PHP script.
The main ideas of the PHP script is:
- Take a URL as input
- Get HTML content of the URL by using file_get_contents
- Parse Title, Description and Keywords from the content by using preg_match and preg_match_all PHP functions
- Return an Array which includes Title, Description and Keywords as Array Items
PHP scirpt
<?php function getUrlData($url) { $result = false; $contents = getUrlContents($url); if (isset($contents) && is_string($contents)) { $title = null; $metaTags = null; preg_match('/<title>([^>]*)<\/title>/si', $contents, $match ); if (isset($match) && is_array($match) && count($match) > 0) { $title = strip_tags($match[1]); } preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 3) { $originals = $match[0]; $names = $match[1]; $values = $match[2]; if (count($originals) == count($names) && count($names) == count($values)) { $metaTags = array(); for ($i=0, $limiti=count($names); $i < $limiti; $i++) { $metaname=strtolower($names[$i]); $metaname=str_replace("'",'',$metaname); $metaname=str_replace("/",'',$metaname); $metaTags[$metaname] = array ( 'html' => htmlentities($originals[$i]), 'value' => $values[$i] ); } } } if(sizeof($metaTags)==0) { preg_match_all('/<[\s]*meta[\s]*content="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*name="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 3) { $originals = $match[0]; $names = $match[2]; $values = $match[1]; if (count($originals) == count($names) && count($names) == count($values)) { $metaTags = array(); for ($i=0, $limiti=count($names); $i < $limiti; $i++) { $metaname=strtolower($names[$i]); $metaname=str_replace("'",'',$metaname); $metaname=str_replace("/",'',$metaname); $metaTags[$metaname] = array ( 'html' => htmlentities($originals[$i]), 'value' => $values[$i] ); } } } } $result = array ( 'title' => $title, 'metaTags' => $metaTags ); } return $result; } function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0) { $result = false; $contents = file_get_contents($url); if (isset($contents) && is_string($contents)) { preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1) { if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections) { return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection); } $result = false; } else { $result = $contents; } } return $contents; } ?> |
Usage
<?php $result = getUrlData("http://4rapiddev.com/php/php-parse-title-description-keywords-from-a-website/"); if($result['title']=="") { $title="No Data Available"; } else { $title=$result['title']; } if($result['metaTags']['description']['value']=="") { $description="No Data Available"; } else { $description=$result['metaTags']['description']['value']; } if($result['metaTags']['keywords']['value']=="") { $keywords="No Data Available"; } else { $keywords=$result['metaTags']['keywords']['value']; } echo "title: " . $title . "<br>"; echo "description: " . $description . "<br>"; echo "keywords: " . $keywords . "<br>"; ?> |
Output
Title: PHP Parse Title Description Keywords From A Website | 4 Rapid Development Description: I'm working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address. Keywords: file_get_contents,preg_match,preg_match_all,php |
Click here to download the source code.
(*)I copied the PHP script somewhere but I completely forget where I copied it from. Thank you & appreciate the guy who created this script.