dahui Posted September 13, 2005 Share Posted September 13, 2005 hi I have Google XML Sitemap v1.3, the cronjob workz nicely. is there any possibility to let automatically be indexed other pages as well? I know I could add them manually but I would rather have that done automatically and be submitted by the cronjob as well? thx for yr 2cents dahui Quote Link to comment Share on other sites More sharing options...
dahui Posted September 13, 2005 Author Share Posted September 13, 2005 my php skills are to flat, I start beeing able to 'read' and understand code but thats it already. basically I would aproach it like this: e.g. -<url> -<loc> http://elflein-kosmetik.de/chiral-a-anti-a...trate-p-31.html </loc> <lastmod>2005-09-13</lastmod> <changefreq>weekly</changefreq> <priority>1.0</priority> </url> is a part of sitemapproducts.xml generated by the code in sitemap.class.php function GenerateProductSitemap(){ $sql = "SELECT products_id as pID, products_date_added as date_added, products_last_modified as last_mod, products_ordered FROM " . TABLE_PRODUCTS . " WHERE products_status='1' ORDER BY products_ordered DESC"; if ( $products_query = $this->DB->Query($sql) ){ $this->debug['QUERY']['PRODUCTS']['STATUS'] = 'success'; $this->debug['QUERY']['PRODUCTS']['NUM_ROWS'] = $this->DB->NumRows($products_query); $container = array(); $number = 0; $top = 0; while( $result = $this->DB->FetchArray($products_query) ){ $top = max($top, $result['products_ordered']); $location = $this->hrefLink(FILENAME_PRODUCT_INFO, 'products_id=' . $result['pID'], 'NONSSL', false); $lastmod = $this->NotNull($result['last_mod']) ? $result['last_mod'] : $result['date_added']; $changefreq = GOOGLE_SITEMAP_PROD_CHANGE_FREQ; $ratio = $top > 0 ? $result['products_ordered']/$top : 0; $priority = $ratio < .1 ? .1 : number_format($ratio, 1, '.', ''); $container[] = array('loc' => htmlspecialchars(utf8_encode($location)), 'lastmod' => date ("Y-m-d", strtotime($lastmod)), 'changefreq' => $changefreq, 'priority' => $priority ); if ( sizeof($container) >= 50000 ){ $type = $number == 0 ? 'products' : 'products' . $number; $this->GenerateSitemap($container, $type); $container = array(); $number++; } } # end while $this->DB->Free($products_query); if ( sizeof($container) > 1 ) { $type = $number == 0 ? 'products' : 'products' . $number; return $this->GenerateSitemap($container, $type); } # end if } else { $this->debug['QUERY']['PRODUCTS']['STATUS'] = 'false'; $this->debug['QUERY']['PRODUCTS']['NUM_ROWS'] = '0'; } } # end function /** * Funciton to generate category sitemap data * @author Bobby Easland * @version 1.1 * @return boolean */ function GenerateCategorySitemap(){ $sql = "SELECT categories_id as cID, date_added, last_modified as last_mod FROM " . TABLE_CATEGORIES . " ORDER BY parent_id ASC, sort_order ASC, categories_id ASC"; if ( $categories_query = $this->DB->Query($sql) ){ $this->debug['QUERY']['CATEOGRY']['STATUS'] = 'success'; $this->debug['QUERY']['CATEOGRY']['NUM_ROWS'] = $this->DB->NumRows($categories_query); $container = array(); $number = 0; while( $result = $this->DB->FetchArray($categories_query) ){ $location = $this->hrefLink(FILENAME_DEFAULT, 'cPath=' . $this->GetFullcPath($result['cID']), 'NONSSL', false); $lastmod = $this->NotNull($result['last_mod']) ? $result['last_mod'] : $result['date_added']; $changefreq = GOOGLE_SITEMAP_CAT_CHANGE_FREQ; $priority = .5; $container[] = array('loc' => htmlspecialchars(utf8_encode($location)), 'lastmod' => date ("Y-m-d", strtotime($lastmod)), 'changefreq' => $changefreq, 'priority' => $priority ); if ( sizeof($container) >= 50000 ){ $type = $number == 0 ? 'categories' : 'categories' . $number; $this->GenerateSitemap($container, $type); $container = array(); $number++; } } # end while $this->DB->Free($categories_query); if ( sizeof($container) > 1 ) { $type = $number == 0 ? 'categories' : 'categories' . $number; return $this->GenerateSitemap($container, $type); } # end if } else { $this->debug['QUERY']['CATEOGRY']['STATUS'] = 'false'; $this->debug['QUERY']['CATEOGRY']['NUM_ROWS'] = '0'; } } # end function of sitemap.class.php I would like to add after the while-function for all products or categories some code that at the end of the xml file beeing created appends my 'other pages to be indexed' like e.g. -<url> -<loc> http://elflein-kosmetik.de/information.php </loc> <lastmod>2005-09-13</lastmod> <changefreq>weekly</changefreq> <priority>1.0</priority> </url> the idea is to maintain that 'other pages' code than by adding/deleting in the sitemap.class.php manually when files changes or are added and beeing able to use the cronjob to index them 'other pages' autoatically as well. I know that arrays can be extende with arrayname.= and that maybe this return $this->GenerateSitemap($container, $type); is where to start off, but it's to heavy for me. any input highly appreciated. dahui Quote Link to comment Share on other sites More sharing options...
dahui Posted September 13, 2005 Author Share Posted September 13, 2005 nobody an idea where and how to insert in sitemap.class.php e.g. the following code to make it be appended to either sitemapproducts.xml or sitemapcategories.xml ??? $otherpages = '-<url> ' . "\n" . ' -<loc> ' . "\n" . ' http://elflein-kosmetik.de/information.php ' . "\n" . ' </loc> ' . "\n" . ' <lastmod>2005-09-13</lastmod> ' . "\n" . ' <changefreq>weekly</changefreq> ' . "\n" . ' <priority>1.0</priority> ' . "\n" . ' </url>' . "\n"; $otherpages .= '-<url> ' . "\n" . ' -<loc> ' . "\n" . ' http://elflein-kosmetik.de/impressum.php ' . "\n" . ' </loc> ' . "\n" . ' <lastmod>2005-09-13</lastmod> ' . "\n" . ' <changefreq>weekly</changefreq> ' . "\n" . ' <priority>1.0</priority> ' . "\n" . ' </url>' . "\n"; dahui Quote Link to comment Share on other sites More sharing options...
dahui Posted September 13, 2005 Author Share Posted September 13, 2005 I seem to be the only one on this, so either i) I have made the mistake and placed this in wrong board ii) I am absolutely wrong, not in my codeing ideas but in my intention in general iii) this is already solved otherwise - if so please point me in right direction I tried working out s.th. that might work I created manually a file named sitemapothers.xml and placed into same dir as sitemapindex.xml sitemapcategories.xml sitemapproducts.xml in my case root of mydomain.com the file has to be maintained manually for the moment and looks e.g. like this: <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> <url> <loc>/http://mydomain.com/anysite_to_be_indexed_1.php</loc> <lastmod>2005-08-07</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>/http://mydomain.com/anysite_to_be_indexed_2.php</loc> <lastmod>2005-08-07</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>/http://mydomain.com/anysite_to_be_indexed_3.php</loc> <lastmod>2005-08-07</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> </urlset> in sitemap.class.php the sitemapindex.xml (mentioned above) is created dynamically. I added some very basic code ;) sitemapindex.xml is the one that the cronjob will call, and sitemapindex.xml will call sitemapothers.xml sitemapcategories.xml sitemapproducts.xml to make the content of those files to be indexed. therefor i modified sitemap.class.php (as said very basic - bare with me please, any assistance here is highly appreciated) as follows: /** * Function to generate sitemap index file * @author Bobby Easland * @version 1.1 * @return boolean */ function GenerateSitemapIndex(){ $content = '<?xml version="1.0" encoding="UTF-8"?>' . "\n"; $content .= '<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">' . "\n"; $pattern = defined('GOOGLE_SITEMAP_COMPRESS') ? GOOGLE_SITEMAP_COMPRESS == 'true' ? "{sitemap*.xml.gz}" : "{sitemap*.xml}" : "{sitemap*.xml}"; foreach ( glob($this->savepath . $pattern, GLOB_BRACE) as $filename ) { if ( eregi('index', $filename) ) continue; $content .= "\t" . '<sitemap>' . "\n"; $content .= "\t\t" . '<loc>'.$this->base_url . basename($filename).'</loc>' . "\n"; $content .= "\t\t" . '<lastmod>'.date ("Y-m-d", filemtime($filename)).'</lastmod>' . "\n"; $content .= "\t" . '</sitemap>' . "\n"; } # end foreach $content .= "\t" . '<sitemap>' . "\n"; $content .= "\t\t" . '<loc>'.$this->base_url.'sitemapothers.xml</loc>' . "\n"; $content .= "\t\t" . '<lastmod>2005-09-01</lastmod>' . "\n"; $content .= "\t" . '</sitemap>' . "\n"; $content .= '</sitemapindex>'; return $this->SaveFile($content, 'index'); } # end function I am still working on the $content .= "\t\t" . '<lastmod>2005-09-01</lastmod>' . "\n"; result is an sitemapindex.xml BEFORE: <sitemapindex>- <sitemap> <loc>http://mydomain.com/sitemapcategories.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://mydomain.com/sitemapproducts.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://mydomain.com/sitemapothers.xml</loc> <lastmod>2005-09-01</lastmod> </sitemap> </sitemapindex> AFTER: <sitemapindex>- <sitemap> <loc>http://mydomain.com/sitemapcategories.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://mydomain.com/sitemapothers.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://mydomain.com/sitemapproducts.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://mydomain.com/sitemapothers.xml</loc> <lastmod>2005-09-01</lastmod> </sitemap> </sitemapindex> so what should happen now? the cronjob will call the sitemapindex.xml and instead of 2 files 3, the sitemapothers.xml as well, should be processed to google. I would really appreciate any senior to have a look at this and tell me if it will work or not, tested it and sofar no errors as I can see, but who am I :P dahui Quote Link to comment Share on other sites More sharing options...
Guest Posted September 14, 2005 Share Posted September 14, 2005 Basically I think many people will appreciate what you have done but dont have a clue on how to help, certainly if this works i would use the code!!! I will try and test today on my test server. Quote Link to comment Share on other sites More sharing options...
dahui Posted September 14, 2005 Author Share Posted September 14, 2005 wait I will place an update Quote Link to comment Share on other sites More sharing options...
dahui Posted September 14, 2005 Author Share Posted September 14, 2005 after going back and forth I found out that It will work fine only by doing this: created manually a file named sitemapothers.xml and placed into same dir as sitemapindex.xml sitemapcategories.xml sitemapproducts.xml the file has to be maintained manually and should look e.g. like this: <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> <url> <loc>/http://mydomain.com/anysite_to_be_indexed_1.php</loc> <lastmod>2005-08-07</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>/http://mydomain.com/anysite_to_be_indexed_2.php</loc> <lastmod>2005-08-07</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>/http://mydomain.com/anysite_to_be_indexed_3.php</loc> <lastmod>2005-08-07</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> </urlset> for trial now open yr catalog/sitmeapindex.xml in browser, the content should appear by default like this: <sitemapindex> - <sitemap> <loc>http://yrdomain.com/sitemapcategories.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://yrdomain.com/sitemapproducts.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - now simply run the catalog/googlesitemap/index.php in yr browser and then have a call again on catalog/sitmeapindex.xml, should now look like this: <sitemapindex> - <sitemap> <loc>http://yrdomain.com/sitemapcategories.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://yrdomain.com/sitemapothers.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - <sitemap> <loc>http://yrdomain.com/sitemapproducts.xml</loc> <lastmod>2005-09-14</lastmod> </sitemap> - ERGO: all you have to do is to create that sitemapothers.xml, maintain in manually if you have setup a cronjob. if you have process the sitemapindex.xml manually to google, make sure that you have at least once called the catalog/googlesitemap/index.php in yr browser in order for the new sitemapothers.xml to be included to sitemapindex.xml it can be as easy as that. dahui Quote Link to comment Share on other sites More sharing options...
dahui Posted September 14, 2005 Author Share Posted September 14, 2005 (edited) the only prob I have that my cronjob will not work :( mailmessage: Warning: main(includes/configure.php): failed to open stream: No such file or directory in /home/httpd/vhosts/elflein-kosmetik.de/httpdocs/googlesitemap/index.php on line 38 Fatal error: main(): Failed opening required 'includes/configure.php' (include_path='.:/usr/share/php') in /home/httpd/vhosts/elflein-kosmetik.de/httpdocs/googlesitemap/index.php on line 38 the thingy is that I looke up the board and found 1 similar prob but no solution. :( :( :( i tried everything all kind of permissions paths users owners I caanot sort out what it is any Idea??? is yr cron running? which user? which path? how , why , when .... arghhhhhhhh :blush: would love to have that job runnning dahui Edited September 14, 2005 by dahui Quote Link to comment Share on other sites More sharing options...
Guest Posted September 14, 2005 Share Posted September 14, 2005 no cronjob isnt working for me but thats coz I havent asked for shell access... yet Quote Link to comment Share on other sites More sharing options...
dahui Posted September 14, 2005 Author Share Posted September 14, 2005 here we go everything up and running! Google and Froogle are fed automatically with cronjobs i) indexing 'other' pages workz brilliant as described above. at the moment the pages have to be maintaned manually but that seems ok for me as these pages will not change to often as well as the lastmod date of the files is a minor issue to me for the moment. got them both on my todolist ii) cronjobs: that took me some time. I have my own VPS with linux and the problem wasn't permission of the files and dirs, it is due to the fact that the cron runs as a user whos' basedir is the root of the VPS and not the root of the virtual httpdocs, ergo: in e.g. /googlesitemap/index.php or /catalog/admin/froogle.php all refernces and includes refering to the Vhost root/httpd do not work. Means all Includes cannot be performed. solution -> change the defines as follows for those files you need to be accessed by a cronjob on a virtual server with plesk: require_once(DIR_WS_INCLUDES . 'filenames.php'); to: require_once('/the/absolut/path/on/yr/server/to/inclufes/filenames.php'); Hope that my investigation for the whole day will help sme other with their crons on VPS dahui btw any input howto solve the lastmod date or automatically create the 'sitemapothers.xml' is of course very welcome. might in the end result in an addition to contrib ;) Quote Link to comment Share on other sites More sharing options...
Guest Posted September 25, 2005 Share Posted September 25, 2005 here we go everything up and running! Google and Froogle are fed automatically with cronjobs i) indexing 'other' pages workz brilliant as described above. at the moment the pages have to be maintaned manually but that seems ok for me as these pages will not change to often as well as the lastmod date of the files is a minor issue to me for the moment. got them both on my todolist ii) cronjobs: that took me some time. I have my own VPS with linux and the problem wasn't permission of the files and dirs, it is due to the fact that the cron runs as a user whos' basedir is the root of the VPS and not the root of the virtual httpdocs, ergo: in e.g. /googlesitemap/index.php or /catalog/admin/froogle.php all refernces and includes refering to the Vhost root/httpd do not work. Means all Includes cannot be performed. solution -> change the defines as follows for those files you need to be accessed by a cronjob on a virtual server with plesk: require_once(DIR_WS_INCLUDES . 'filenames.php'); ? to: require_once('/the/absolut/path/on/yr/server/to/inclufes/filenames.php'); Hope that my investigation for the whole day will help sme other with their crons on VPS dahui btw any input howto solve the lastmod date or automatically create the 'sitemapothers.xml' is of course very welcome. might in the end result in an addition to contrib ;) I am using jail shell and keep getting: /usr/local/cpanel/bin/jailshell: line 1: PHP: command not found Quote Link to comment Share on other sites More sharing options...
dahui Posted September 25, 2005 Author Share Posted September 25, 2005 I am using jail shell and keep getting: /usr/local/cpanel/bin/jailshell: line 1: PHP: command not found according howto setup cron on yr environment, I cannot assist, sorry, please contact yr host any questions concerning osC are wlcm ;) dahui Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.