Truemag

  • Categories
    • Tips And Tricks
    • Internet
    • PHP
    • Javascript
    • CSharp
    • SQL Server
    • Linux
  • Lastest Videos
  • Our Demos
  • About
  • Contact
  • Home
  • Write With Us
  • Job Request
Home PHP PHP Parse Title Description Keywords From A Website

PHP Parse Title Description Keywords From A Website

I’m working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address.

From a previous article, I shared how I get Geographical Location from an IP Address, you can read more via: http://4rapiddev.com/internet/free-online-tools-get-ip-address-location-organization-isp-hostname-country/. Today, I will show How do I parse Title Description Keywords From A Website by using PHP script.

The main ideas of the PHP script is:

  • Take a URL as input
  • Get HTML content of the URL by using file_get_contents
  • Parse Title, Description and Keywords from the content by using preg_match and preg_match_all PHP functions
  • Return an Array which includes Title, Description and Keywords as Array Items

PHP scirpt

<?php
function getUrlData($url)
{
	$result = false;
	$contents = getUrlContents($url);
 
	if (isset($contents) && is_string($contents))
	{
		$title = null;
		$metaTags = null;
 
		preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );
 
		if (isset($match) && is_array($match) && count($match) > 0)
		{
			$title = strip_tags($match[1]);
		}
 
		preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
		if (isset($match) && is_array($match) && count($match) == 3)
		{
			$originals = $match[0];
			$names = $match[1];
			$values = $match[2];
 
			if (count($originals) == count($names) && count($names) == count($values))
			{
				$metaTags = array();
 
				for ($i=0, $limiti=count($names); $i < $limiti; $i++)
				{
					$metaname=strtolower($names[$i]);
					$metaname=str_replace("'",'',$metaname);
					$metaname=str_replace("/",'',$metaname);
					$metaTags[$metaname] = array (
					'html' => htmlentities($originals[$i]),
					'value' => $values[$i]
					);
				}
			}
		}
		if(sizeof($metaTags)==0) {
			preg_match_all('/<[\s]*meta[\s]*content="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*name="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
 
			if (isset($match) && is_array($match) && count($match) == 3)
			{
				$originals = $match[0];
				$names = $match[2];
				$values = $match[1];
 
				if (count($originals) == count($names) && count($names) == count($values))
				{
					$metaTags = array();
 
					for ($i=0, $limiti=count($names); $i < $limiti; $i++)
					{
						$metaname=strtolower($names[$i]);
						$metaname=str_replace("'",'',$metaname);
						$metaname=str_replace("/",'',$metaname);
						$metaTags[$metaname] = array (
							'html' => htmlentities($originals[$i]),
							'value' => $values[$i]
						);
					}
				}
			}
		}
 
		$result = array (
			'title' => $title,
			'metaTags' => $metaTags
		);
	}
 
	return $result;
}
 
function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
	$result = false;
	$contents = file_get_contents($url);
 
	if (isset($contents) && is_string($contents))
	{
		preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
 
		if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
		{
			if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
			{
				return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
			}
 
			$result = false;
		}
		else
		{
			$result = $contents;
		}
	}
 
	return $contents;
}
?>

<?php function getUrlData($url) { $result = false; $contents = getUrlContents($url); if (isset($contents) && is_string($contents)) { $title = null; $metaTags = null; preg_match('/<title>([^>]*)<\/title>/si', $contents, $match ); if (isset($match) && is_array($match) && count($match) > 0) { $title = strip_tags($match[1]); } preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 3) { $originals = $match[0]; $names = $match[1]; $values = $match[2]; if (count($originals) == count($names) && count($names) == count($values)) { $metaTags = array(); for ($i=0, $limiti=count($names); $i < $limiti; $i++) { $metaname=strtolower($names[$i]); $metaname=str_replace("'",'',$metaname); $metaname=str_replace("/",'',$metaname); $metaTags[$metaname] = array ( 'html' => htmlentities($originals[$i]), 'value' => $values[$i] ); } } } if(sizeof($metaTags)==0) { preg_match_all('/<[\s]*meta[\s]*content="?' . '([^>"]*)"?[\s]*' .'[lang="]*[^>"]*["]*'.'[\s]*name="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 3) { $originals = $match[0]; $names = $match[2]; $values = $match[1]; if (count($originals) == count($names) && count($names) == count($values)) { $metaTags = array(); for ($i=0, $limiti=count($names); $i < $limiti; $i++) { $metaname=strtolower($names[$i]); $metaname=str_replace("'",'',$metaname); $metaname=str_replace("/",'',$metaname); $metaTags[$metaname] = array ( 'html' => htmlentities($originals[$i]), 'value' => $values[$i] ); } } } } $result = array ( 'title' => $title, 'metaTags' => $metaTags ); } return $result; } function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0) { $result = false; $contents = file_get_contents($url); if (isset($contents) && is_string($contents)) { preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1) { if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections) { return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection); } $result = false; } else { $result = $contents; } } return $contents; } ?>

Usage

<?php
 
	$result = getUrlData("http://4rapiddev.com/php/php-parse-title-description-keywords-from-a-website/");
 
	if($result['title']=="") {
		$title="No Data Available";
	} else {
		$title=$result['title'];
	}
	if($result['metaTags']['description']['value']=="") {
		$description="No Data Available";
	} else {
		$description=$result['metaTags']['description']['value'];
	}
	if($result['metaTags']['keywords']['value']=="") {
		$keywords="No Data Available";
	} else {
		$keywords=$result['metaTags']['keywords']['value'];
	}
 
	echo "title: " . $title . "<br>";
	echo "description: " . $description . "<br>";
	echo "keywords: " . $keywords . "<br>";
?>

<?php $result = getUrlData("http://4rapiddev.com/php/php-parse-title-description-keywords-from-a-website/"); if($result['title']=="") { $title="No Data Available"; } else { $title=$result['title']; } if($result['metaTags']['description']['value']=="") { $description="No Data Available"; } else { $description=$result['metaTags']['description']['value']; } if($result['metaTags']['keywords']['value']=="") { $keywords="No Data Available"; } else { $keywords=$result['metaTags']['keywords']['value']; } echo "title: " . $title . "<br>"; echo "description: " . $description . "<br>"; echo "keywords: " . $keywords . "<br>"; ?>

Output

Title: PHP Parse Title Description Keywords From A Website | 4 Rapid Development
Description: I'm working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address.
Keywords: file_get_contents,preg_match,preg_match_all,php

Title: PHP Parse Title Description Keywords From A Website | 4 Rapid Development Description: I'm working on a Website Directory project which need to get and display Title, Description and Keywords of any website and some information regarding Geographical Location such as: Hostname, Country, Region, City, Postal Code, Latitude, Longitude, ISP, Organization via the website IP Address. Keywords: file_get_contents,preg_match,preg_match_all,php

Click here to download the source code.

(*)I copied the PHP script somewhere but I completely forget where I copied it from. Thank you & appreciate the guy who created this script.

Apr 17, 2011Hoan Huynh
Change Favicon In Your Wordpress Blog Or WebsiteTop Free Building Backlinks services
You Might Also Like:
  • How To Track Website With Multiple Google Analytisc Accounts
  • Auto Rotate Web Page Title With JavaScript
  • Build search box for my website at Google.com
  • C# Parse Item Value And Name In XML String
  • Top Free Real Time Website Analytics Services And Tools
  • Display User Current Location On Google Map
  • C# Generate Website Screenshot And Save Thumbnail
  • Linux CentOS view live website access log with tail command
  • Steps To Check How Traffic and Popular A Website Is
  • PHP CURL Post To HTTPS Website
Hoan Huynh

Hoan Huynh is the founder and head of 4rapiddev.com. Reach him at [email protected]

11 years ago PHPfile_get_contents, preg_match, preg_match_all463
0
GooglePlus
0
Facebook
0
Twitter
0
Digg
0
Delicious
0
Stumbleupon
0
Linkedin
0
Pinterest
Most Viewed
PHP Download Image Or File From URL
24,554 views
Notepad Plus Plus Compare Plugin
How To Install Compare Text Plugin In Notepad Plus Plus
21,892 views
Microsoft SQL Server 2008 Attach Remove Log
Delete, Shrink, Eliminate Transaction Log .LDF File
17,745 views
JQuery Allow only numeric characters or only alphabet characters in textbox
15,069 views
C# Read Json From URL And Parse/Deserialize Json
11,802 views
4 Rapid Development is a central page that is targeted at newbie and professional programmers, database administrators, system admin, web masters and bloggers.
Recent Posts
  • Things to Learn about Installingderm Loan Type S
  • Online Photo Editor – Free Photoediting Software
  • A Guide to Finding the Best Paper Sellers
  • Photoediting in Home Isn’t Hard to Do!

  • Free Photo Editor Online
Categories
  • CSharp (45)
  • Facebook Graph API (19)
  • Google API (7)
  • Internet (87)
  • iPhone XCode (8)
  • Javascript (35)
  • Linux (27)
  • MySQL (16)
  • PHP (84)
  • Problem Issue Error (29)
  • Resources (32)
  • SQL Server (25)
  • Timeline (5)
  • Tips And Tricks (141)
  • Uncategorized (647)
Recommended
  • Custom Software Development Company
  • Online Useful Tools
  • Premium Themes
  • VPS
2014 © 4 Rapid Development