PHP Classes

It looks like each function refetches the URL.

Recommend this page to a friend!

      Feed Finder  >  All threads  >  It looks like each function...  >  (Un) Subscribe thread alerts  
Subject:It looks like each function...
Summary:Package rating comment
Messages:16
Author:Patrice
Date:2007-11-16 14:04:24
Update:2007-12-28 08:03:05
 

Patrice rated this package as follows:

Utility: Not sure
Consistency: Not sure
Examples: Sufficient
  1 - 10   11 - 16  

  1. It looks like each function...   Reply   Report abuse  
Picture of Patrice Patrice - 2007-11-16 14:04:24
It looks like each function refetches the URL. I guess it can be optimized.

  2. Re: It looks like each function...   Reply   Report abuse  
Picture of Manel Zaera Manel Zaera - 2007-11-16 22:13:08 - In reply to message 1 from Patrice
Yes, perhaps it would be desirable to read the firs element in the document and then process it. I'll create some private methods to check the XML and if the call is direct to isFeed, isRss1, isRss2 or isAtom, then the class will fetch the URL.

  3. Re: It looks like each function...   Reply   Report abuse  
Picture of Manel Zaera Manel Zaera - 2007-11-16 23:08:22 - In reply to message 1 from Patrice
Ok, I updated the class.

  4. Re: It looks like each function...   Reply   Report abuse  
Picture of Patrice Patrice - 2007-11-17 21:04:31 - In reply to message 3 from Manel Zaera
Thanks Manel. It's much better now.

Could you also save the feed type in the array. The array would contain array['url'] and array['type'].

  5. Re: It looks like each function...   Reply   Report abuse  
Picture of Manel Zaera Manel Zaera - 2007-11-20 08:48:25 - In reply to message 4 from Patrice
Ok, I'll post it this week, I hope.

  6. Re: It looks like each function...   Reply   Report abuse  
Picture of Patrice Patrice - 2007-11-20 10:44:57 - In reply to message 5 from Manel Zaera
Hi Manel,

Here are the changes I made in discoverFeeds

if ($aRel == 'alternate' && ($aType == self::MIME_RSS || $aType == self::MIME_ATOM))
{
$new_item['rel'] = $aRel;
$new_item['type'] = $aType;
$new_item['title'] = $aTitle;
$new_item['href'] = $aHref;
$aFeeds[] = $new_item;
}

Please note the bug fix "$aRel == 'alternate'" and not "$aRel = 'alternate'"

  7. Re: It looks like each function...   Reply   Report abuse  
Picture of Patrice Patrice - 2007-11-20 10:50:25 - In reply to message 6 from Patrice
The complete function:

private function discoverFeeds($aUrl) {
$aFeeds = array();
try {
$aDocument = new DOMDocument();
$aDocument->loadHTMLFile($aUrl);
$aDocument->normalize();
$aElements = $aDocument->getElementsByTagName('link');
foreach ($aElements as $aElement) {
$aRel = $aElement->getAttribute('rel');
$aType = $aElement->getAttribute('type');
$aTitle = $aElement->getAttribute('title');
$aHref = $aElement->getAttribute('href');
if ($aRel == 'alternate' && ($aType == self::MIME_RSS || $aType == self::MIME_ATOM)) {
$new_item['rel'] = $aRel;
$new_item['type'] = $aType;
$new_item['title'] = $aTitle;
$new_item['href'] = $aHref;
$aFeeds[] = $new_item;
}
}
} catch (Exception $aEx) {
// None
}
return $aFeeds;
}

  8. Re: It looks like each function...   Reply   Report abuse  
Picture of Manel Zaera Manel Zaera - 2007-11-20 23:24:49 - In reply to message 7 from Patrice
I've made more changes. Tomorrow I'll upload the class.

<?php
/*
* FeedFinder
* Author: Manel Zaera (manelzaera@gmail.com)
* Description: Singleton class to find syndication feeds in
* a website.
* Creation date: 2007-11-06
*
* Modified by:
* 2007-11-16 - Manel Zaera (manelzaera@gmail.com) - Reduce number of URL fetches
* 2007-11-21 - Manel Zaera (manelzaera@gmail.com) - Return array of feeds indicating the type of each feed
*
* This work is published under the GPL license (http://www.gnu.org/copyleft/gpl.html)
*
*/
class FeedFinder {
private static $sInstance = null;
const MIME_RSS = 'application/rss+xml';
const MIME_ATOM = 'application/atom+xml';
const XMLNS_RSS1 = 'http://purl.org/rss/1.0/';
const XMLNS_ATOM = 'http://www.w3.org/2005/Atom';

// Feed array element constants
const FEED_FIELD_TYPE = 'type';
const FEED_FIELD_URL = 'url';

// Feed type constants
const FEED_TYPE_NONE = 0;
const FEED_TYPE_RSS1 = 1;
const FEED_TYPE_RSS2 = 2;
const FEED_TYPE_ATOM = 3;

private function __construct() {

}

/**
* Gets the unique class instance
*/
public static function getInstance() {
if (self::$sInstance == null) {
self::$sInstance = new FeedFinder();
}
return self::$sInstance;
}

/**
* Get the feeds discovered from a URL
* @param $aUrl URL that can be a feed
* or that contains one or more
* feed references
*
* @return Array of found feed URLs, null otherwise. The array elements are paris of 'url' and 'type' data.
*/
public function getFeeds($aUrl) {
$aaFeeds = array();
$aType = $this->typeOf($aUrl);
if ($aType!=self::FEED_TYPE_NONE) {
$aFeed = array(self::FEED_FIELD_TYPE=>$aType, self::FEED_FIELD_URL=>$aUrl);
$aaFeeds[] = $aFeed;
} else {
// Not a feed URL -> find feeds in document
$aaFeeds = $this->discoverFeeds($aUrl);
}
return $aaFeeds;
}

/**
* Check if a URL is a feed URL
* @param $aUrl URL to analyze
*/
public function isFeed($aUrl) {
try {
$aDoc = $this->prepareXmlReader($aUrl);
$zIsFeed = ($this->isRssDoc($aDoc) || $this->isAtomDoc($aDoc));
} catch (Exception $aEx) {
$zIsFeed = false;
}
return $zIsFeed;
}

/**
* Check if a URL is RSS 1.0 or RSS 2.0 feed
* @param $aUrl URL to analyze
*/
public function isRss($aUrl) {
try {
$aDoc = $this->prepareXmlReader($aUrl);
$zIsRss = $this->isRssDoc($aDoc);
} catch (Exception $aEx) {
$zIsRss = false;
}
return $zIsRss;
}

/**
* Check if a URL is a RSS 1.0 feed
* @param $aUrl URL to analyze
*/
public function isRss1($aUrl) {
try {
$aDoc = $this->prepareXmlReader($aUrl);
$zIsRss1 = $this->isRss1Doc($aDoc);
} catch (Exception $aEx) {
$zIsRss1 = false;
}
return $zIsRss1;
}

/**
* Check if a URL is a RSS 2.0 feed
* @param $aUrl URL to analyze
*/
public function isRss2($aUrl) {
try {
$aDoc = $this->prepareXmlReader($aUrl);
$zIsRss2 = $this->isRss2Doc($aDoc);
} catch (Exception $aEx) {
$zIsRss2 = false;
}
return $zIsRss2;
}

/**
* Check if a URL is an Atom feed
* @param $aUrl URL to analyze
*/
public function isAtom($aUrl) {
try {
$aDoc = $this->prepareXmlReader($aUrl);
$zIsAtom = $this->isAtomDoc($aDoc);
} catch (Exception $aEx) {
$zIsAtom = false;
}
return $zIsAtom;
}

/*
* Look for feeds within a document
* @param $aUrl URL of document
*
* @return Array of feed URLs
*/
private function discoverFeeds($aUrl) {
$aFeeds = array();
try {
$aDocument = new DOMDocument();
$aDocument->loadHTMLFile($aUrl);
$aDocument->normalize();
$aElements = $aDocument->getElementsByTagName('link');
foreach ($aElements as $aElement) {
$aRel = $aElement->getAttribute('rel');
$aAttrType = $aElement->getAttribute('type');
$aHref = $aElement->getAttribute('href');
if ($aRel = 'alternate' && ($aAttrType == self::MIME_RSS || $aAttrType == self::MIME_ATOM)) {
$aType = $this->typeOf($aHref);
if ($aType != self::FEED_TYPE_NONE) {
$aFeed = array(self::FEED_FIELD_TYPE=>$aType, self::FEED_FIELD_URL=>$aHref);
$aFeeds[] = $aFeed;
}
}
}
} catch (Exception $aEx) {
// None
}
return $aFeeds;
}

/**
* Get the type of feed for a Url
* @param $aUrl URL to check
*
* @return Number One of the constant values in this class related to feed types
*/
public function typeOf($aUrl) {
try {
$aDoc = $this->prepareXmlReader($aUrl);
$aType = $this->typeOfDoc($aDoc);
} catch (Exception $aEx) {
$aType = self::FEED_TYPE_NONE;
}
return $aType;
}

/*
* Get the type of a loaded document
* @param $aDoc Loaded XML document, positioned at first element
*/
private function typeOfDoc($aDoc) {
$aType = self::FEED_TYPE_NONE;
if ($this->isRss1Doc($aDoc)) {
$aType = self::FEED_TYPE_RSS1;
} elseif ($this->isRss2Doc($aDoc)) {
$aType = self::FEED_TYPE_RSS2;
} elseif ($this->isAtomDoc($aDoc)) {
$aType = self::FEED_TYPE_ATOM;
}
return $aType;
}

/*
* Check if a loaded document is RSS 1.0 or RSS 2.0 feed
* @param $aDoc Loaded XML document, positioned at first element
*/
private function isRssDoc($aDoc) {
return ($this->isRss2Doc($aDoc) || $this->isRss1Doc($aDoc));
}

/*
* Check if a loaded document is a RSS 2.0 feed
* @param $aDoc Loaded XML document, positioned at first element
*/
private function isRss2Doc($aDoc) {
return ($aDoc->name == 'rss');
}

/*
* Check if a loaded documentL is a RSS 1.0 feed
* @param $aDoc Loaded XML document, positioned at first element
*/
private function isRss1Doc($aDoc) {
return ($aDoc->name == 'rdf:RDF' && $aDoc->getAttribute('xmlns') == self::XMLNS_RSS1);
}

/*
* Check if a loaded document is an Atom feed
* @param $aDoc Loaded XML document, positioned at first element
*/
private function isAtomDoc($aDoc) {
return ($aDoc->name == 'feed' && $aDoc->namespaceURI == self::XMLNS_ATOM);
}

/*
* Get an XMLReader object from a URL and positions it at the first element
* of the document
* @param $aUrl URL to get the XMLReader object from
*/
private function prepareXmlReader($aUrl) {
$aDoc = new XMLReader();
$aDoc->open($aUrl);
$aDoc->read();
return $aDoc;
}
}
?>

  9. Re: It looks like each function...   Reply   Report abuse  
Picture of Patrice Patrice - 2007-11-21 09:46:44 - In reply to message 8 from Manel Zaera
Hi Manel,

I believe there is a bug in discoverFeeds.
if ($aRel = 'alternate' && ($aAttrType == self::MIME_RSS || $aAttrType == self::MIME_ATOM))

It should be:
$aRel == 'alternate' and not $aRel = 'alternate'

Why not also keeping attribute title ($aElement->getAttribute('title')) ?

Thanks for your nice class!



  10. Re: It looks like each function...   Reply   Report abuse  
Picture of Manel Zaera Manel Zaera - 2007-11-24 00:15:59 - In reply to message 9 from Patrice
Hi,

I updated the class. Now you have the feed title in the returned array elements.

Manel

 
  1 - 10   11 - 16