Possible to recache all sites automatically after clear cache?

Possible to recache all sites automatically after clear cache?

Tuesday 11 November 2003 12:17:23 am - 7 replies

Modified on Tuesday 11 November 2003 12:23:17 am by Valentin Svelland

Author Message

Alex Jones

Tuesday 11 November 2003 6:15:47 am

You may want to try running a link checker or spider against the site as it will go through and hit every page - assuming you don't have any that aren't linked to. You can also check out programs like HTTrack (www.HTTrack.com) or WGet (www.gnu.org/software/wget/) which allow you to download an entire site locally. You can set theprograms up to not actually download anything, so they will go through the site and hit every page as well.

Alex

Alex
[ bald_technologist on the IRC channel (irc.freenode.net): #eZpublish ]

<i>When in doubt, clear the cache.</i>

Valentin Svelland

Thursday 13 November 2003 5:59:01 am

Thanks for your reply.. Perhaps you could suggest a good webspider or similar free service online?

Alex Jones

Thursday 13 November 2003 6:24:25 am

Either of the two that I mentioned above would work. You might also want to look into using Xenu (http://home.snafu.de/tilman/xenulink.html).

Alex

Alex
[ bald_technologist on the IRC channel (irc.freenode.net): #eZpublish ]

<i>When in doubt, clear the cache.</i>

Valentin Svelland

Friday 14 November 2003 12:17:36 am

I'm trying to rebuild all cache by running a wget through my siste. However I'm new to this command, and not quite sure how to use it properly. I run the command under but the command fails due to no index.html file... eZ frontpage is index.php ...

wget --spider http://www.mysite.com/ez/

Valentin

Alex Jones

Friday 14 November 2003 6:16:48 am

That is odd. The fact that your index file is PHP shouldn't make a difference... What is the actual command you are using?

Is your site hosted on a third-party server, or is it possible that another developer/sysadmin could have set up the server to block spidering scripts?

Alex

Alex
[ bald_technologist on the IRC channel (irc.freenode.net): #eZpublish ]

<i>When in doubt, clear the cache.</i>

Valentin Svelland

Friday 14 November 2003 10:13:55 am

Well, my site is hosted on a third-party server with wget preinstalled, and it could be that there are som sysadmin-limits as to how the wget-command can be used.. I'll check that up on monday.

How I run the command? Well, actually just like i wrote in the previous posting, only with a different url of course. As I mentioned, I've never used wget before so it could be I'm just not using it correctly:

wget --spider http://www.mysite.com/ez/
-----------------------------------------
returns:
wget --spider www.mysite.com/ez/
--19:08:05-- http://www.mysite.com:80/ez/
=> `index.html.1'
Connecting to www.mysite.com:80...
Connection to www.mysite.com:80 refused.
------------------------------------------

By the way: I've achieved some better performance by including my custommade views in the site.ini.append of my siteaccess (sitemap2,sitemap3 and so on). Still, it seems like the caching of my site only last for so long... Wish I could set this caching to be permanent, unless article republished.. And then of course, be able to do the wget-thing to recache if necessary now and then..

Thanks for your feedback, Alex.. It's useful!

Valentin

Resources on the net:
http://www.lns.cornell.edu/public/COMP/info/wget/wget_2.html
http://www.lns.cornell.edu/public/COMP/info/wget/wget_7.html
http://www.gnu.org/manual/wget-1.8.1/html_chapter/wget_4.html

Alex Jones

Friday 14 November 2003 11:20:46 am

Sorry Valentin, I missed your earlier posting of the command. I'm not sure why that wouldn't work. The return you posted seems to indicate that the server is indeed blocking access. Perhaps you could try spoofing the user agent in Wget to see if that works. Try adding somehting like:

--user-agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5)

This will tell the server that Wget is Mozilla running on Windows XP. You may need to enclose the value in quotes, I'm not sure. If thisworks then you can experiment with setting a user agent to whatever you want. This will also help you filter out Wget when analyzing your traffic logs.

Alex

Alex
[ bald_technologist on the IRC channel (irc.freenode.net): #eZpublish ]

<i>When in doubt, clear the cache.</i>

You must be logged in to post messages in this topic!

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.