Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

bots not obeying mod_rewrite


Guest

Recommended Posts

this isn't a problem with the coding or htaccess rules, as it's only certain bots that behave live this.

 

case in point: jeeves / ask.com

Name: mozilla/2.0 (compatible; ask jeeves/teoma; +http://about.ask.com

IP Address: 65.214.44.74

User Agent: mozilla/2.0 (compatible; ask jeeves/teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)

 

 

my rewrite rules as follows:

 

category: /category/

product: /product

 

every other bot follows these rules, but jeeves is trying to find things like /category/images/blabla.gif - which of course do not exist.. because with rewrites the folders do not need to be created. is there a way to force jeeves to obey the rewrite? it has no problem crawling both the category & product page, it just wants to find and index the images and stylesheet by the method i posted above, and cannot figure out why

Link to comment
Share on other sites

i suppose there's no way to find out if jeeves has previously indexed these pages?

 

i also notice guests at random looking at similar things, but it's always for a different product or category... never the same

Link to comment
Share on other sites

I don't think there's really a way to know for sure. Certainly if you just put the mod_rewrite in place you can expect to get some hits at old links.

 

I'd be a little more concerned about customers hitting those old links. Can you find out more about them? Where they're coming from, etc. If it's the first page they're getting to from a search engine, all you can do is wait. If it's an external static link, you will need to get the link changed.

Contributions

 

Discount Coupon Codes

Donations

Link to comment
Share on other sites

the links jeeves (and a few customers) are hitting have never existed. they're simply malformed url's.

 

here's one on my site right now:

/manufacturers/images/titles/categories.gif

 

/manufacturers/ is a valid url, BUT /manufacturers/images/titles/categories.gif isn't.

that's SUPPOSED TO BE /images/titles/categories.gif

 

i can't tell what would make someone's browser disobey the rewrite like this, but here's their browser data: User Agent: mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; maxthon; sv1; .net clr 1.1.4322)

Link to comment
Share on other sites

i can't tell what would make someone's browser disobey the rewrite like this, but here's their browser data: User Agent: mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; maxthon; sv1; .net clr 1.1.4322)

 

It's not the browser. That's not how rewrite works. The webserver displays a modified URL to the browser, and parses a modified URL when the browser requests it. The browser can get un-rewritten versions of a URL, but there is nothing it's doing "wrong" when that happens. Old URLs or bad links can cause this. The webserver is responsible for correctly parsing URLs and rewriting them. The browser cannot obey mod_rewrite because it doesn't know it exists!

 

It's like a telephone call. The browser is on the other end of the line, and it cannot know whether I used my speed dial function or typed in the number manually. Nor does it care.

 

Someone's browser tried to download a picture at that URL because 1) there is a bad link that directed the browser to that URL, or 2) your rewrite rules don't match the expression encountered when the webserver parsed the URL.

Contributions

 

Discount Coupon Codes

Donations

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...