Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Robot and Spider


smithveg

Recommended Posts

Posted

From the thread i read. They points that we must remove the 'Disallow /admin' line from robots file at the root if the admin directory was protected?

 

I was little confuse about robot.txt. How can i create it? How can search engines know wherever i was created or not? How the search engines look for it?

 

Is't i just need to create such file as below in notepad and save as .txt file?

User-agent: *

Disallow: /shop/adminbackups

Disallow: /shop/includes

Disallow: /shop/thumbnails

Disallow: /shop/tmp

Disallow: /shop/account.php

Disallow: /shop/account_edit.php

Disallow: /shop/account_history.php

Disallow: /shop/account_history_info.php

Disallow: /shop/account_newslette

 

If i had create a robot.txt. Do i need spider? What for spider do?

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Posted

Yes, that is how you create a robots file. You shouldn't remove the disallow admin line. The robots file should go in your web root directory. The search engines know to look for it there. If they find it, they should use it. The spiders file is in your includes directory and is used to identify the search engines when they visit your site. They will be treated differnetly than a regular visitor.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
Yes, that is how you create a robots file. You shouldn't remove the disallow admin line. The robots file should go in your web root directory. The search engines know to look for it there. If they find it, they should use it. The spiders file is in your includes directory and is used to identify the search engines when they visit your site. They will be treated differnetly than a regular visitor.

 

Jack

 

Wait ! Jack !

 

You had confused me now.

 

Do you means the search engines will automatically look my robots file? Did search engines so clever?

 

May i know how many robots should i create in my sites? Can you tell me which directories should i put the robots in as your sites.

 

If so, i disallow those directory? what means to disallow? is it don't allow search engines to look it? or any other purposes?

 

smithveg

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Posted

Yes, the SE's will automatically look at it, at least the SE's that follow the rules. They are supposed to look at it and not look at any file or directory that has been disallowed, but not all of them do that. The major ones, like google, yahoo and msn will though). You only need one robots file in the main root of your hosting account (wherever you end up when you go to www.yoursite.com). You should disallow (ask the SE's to ignore) any files and directories you don't want listed, like the includes directories. You can see any sites robots file by going to www.thatsite.com/robots.com. So find a few oscommerce shops and look at their robots file to get an idea of what is needed.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
Yes, the SE's will automatically look at it, at least the SE's that follow the rules. They are supposed to look at it and not look at any file or directory that has been disallowed, but not all of them do that. The major ones, like google, yahoo and msn will though). You only need one robots file in the main root of your hosting account (wherever you end up when you go to www.yoursite.com). You should disallow (ask the SE's to ignore) any files and directories you don't want listed, like the includes directories. You can see any sites robots file by going to www.thatsite.com/robots.com. So find a few oscommerce shops and look at their robots file to get an idea of what is needed.

 

Jack

 

Jack can you give me the complete robots file?

I'm not sure how to create it. I afraid that i cause some problem and make the search engines cannot indexing my sites.

I see the example of robots file... it disallow this and that... why must disallow?... if i allow all then will it rank my site high?... what for i disallow them? in my mind, 'disallow' means limit the chances for search engine to index and search for your sites.... am i right?

 

smithveg

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Posted

Do you want your includes/configure.php file listed in the searche engines so anyone can see it? Probably not. If your shop is setup correctly they couldn't see it anyway but there are other files in there they can see. You don't want those listed. The SE's should only be able to get to a page if a real visitor to your site can get to it.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
Do you want your includes/configure.php file listed in the searche engines so anyone can see it? Probably not. If your shop is setup correctly they couldn't see it anyway but there are other files in there they can see. You don't want those listed. The SE's should only be able to get to a page if a real visitor to your site can get to it.

 

Jack

Hi Jack

 

This is from a contribution (suggestion)

 

# This says to apply these settings to ALL search engine spiders/crawlers

User-agent: *

 

# These settings will keep spiders from indexing your unwanted pages

# This assumes that your OSC install is in your web site's ROOT directory

# ie: http://www.yoursite.com/index.php <- Use if this brings up your OSC main page

Disallow: /admin

Disallow: /account.php

Disallow: /advanced_search.php

Disallow: /checkout_shipping.php

Disallow: /create_account.php

Disallow: /login.php

Disallow: /login.php

Disallow: /password_forgotten.php

Disallow: /popup_image.php

Disallow: /shopping_cart.php

 

I've read not to put the admin line in as it is password protected anyhow, but I see /includes isn't there. Would you just substitute admin for includes & is there anything else you would add or take away?

 

Thank you for your help

Julie

Posted

If you can go to your site and type in something in the url and see a page dislayed, then you may want the SE's to visit that page. Try going to http://www.yoursite.com/includes/ (substitue your domain name, of course). Is whatever displays something you want showing up in the SE listings? If it is, then leave it out of your robots file. If it isn't, disallow it.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
If you can go to your site and type in something in the url and see a page dislayed, then you may want the SE's to visit that page. Try going to http://www.yoursite.com/includes/ (substitue your domain name, of course). Is whatever displays something you want showing up in the SE listings? If it is, then leave it out of your robots file. If it isn't, disallow it.

 

Jack

 

I still not understand why must we disallow, it the purpose to prevent hacker?

In my mind, i think disallow is 'not allow search engines' to find my sites. Why must? Is that willl decrease the listing chances? or any other things behind?

 

One more thing i'm not clear is. i must save my robot file as what?

-robot.txt?

-robots.txt?

 

Thank for your helped.

smithveg

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Posted

I'm not sure how else to explain it. You never answered my quesiton about whether you would want your confiugre file listed in the search engine listings. From your way of thinking, you would want it listed. Do you see the problem with that?

 

As for the robots file, as I mentioned before, just go to any oscommerce shop (there are many listed in the above menu), and see what they have for a robots file. You are free to copy them as you wish.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
If you can go to your site and type in something in the url and see a page dislayed, then you may want the SE's to visit that page. Try going to http://www.yoursite.com/includes/ (substitue your domain name, of course). Is whatever displays something you want showing up in the SE listings? If it is, then leave it out of your robots file. If it isn't, disallow it.

 

Jack

Thanks Jack

 

I have 403 permission denied. I appear to have includes password protected, so I guess I do not need to worry about that either. I guess I'll go with something like I had. I presume something is better than nothing?

 

Thanks for your help :thumbsup:

Julie

Posted
I'm not sure how else to explain it. You never answered my quesiton about whether you would want your confiugre file listed in the search engine listings. From your way of thinking, you would want it listed. Do you see the problem with that?

 

As for the robots file, as I mentioned before, just go to any oscommerce shop (there are many listed in the above menu), and see what they have for a robots file. You are free to copy them as you wish.

 

Jack

 

I also don't know how to asnwer you jack, sorry about that. I was confused.

You asked me whether i want that configure file listed or not. Did you list it? If i list it, is that will help my ranking?

 

If i disallow too many things (file)... is that will decrease my chances for spider to find me?

Then what for we disallow them...

 

I just want to have a good ranking for my sites.

 

Sorry, Jack. I bring problem to you... ;)

smithveg

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Posted
Thanks Jack

 

I have 403 permission denied. I appear to have includes password protected, so I guess I do not need to worry about that either. I guess I'll go with something like I had. I presume something is better than nothing?

 

Thanks for your help :thumbsup:

Julie

You don't have to worry about it but it should still be in robots file. The SE's will only spend so much time on your site. If they waste if following links to places they can't get into, it can only hurt you.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
You don't have to worry about it but it should still be in robots file. The SE's will only spend so much time on your site. If they waste if following links to places they can't get into, it can only hurt you.

 

Jack

OK Thanks Jack

 

I will do as above & add

Disallow: /includes

 

Just starting, so need all the help I can get getting more customers. :lol:

 

Julie

Posted
I also don't know how to asnwer you jack, sorry about that. I was confused.

You asked me whether i want that configure file listed or not. Did you list it? If i list it, is that will help my ranking?

 

If i disallow too many things (file)... is that will decrease my chances for spider to find me?

Then what for we disallow them...

 

I just want to have a good ranking for my sites.

 

Sorry, Jack. I bring problem to you... ;)

smithveg

No need to apolgize. I think you are missing the basic point. Not all filess should be looked at by the search engines. Any file in the includes directory would not have any effect on your llistng results with the search engines since they are not setup for the search engines. So, even if you could list them, it would not help you. Likewise, why have the checkout_success page listed in the search engine, for example? It isn't optimized for any keywords so it won't help anyone find your site. It will just waste the search engines time, listing pages that shouldn't be listed. The more time they waste on those pages, the less they spend on the pages you want listed. The result is that you overall listing placement will suffer. Do it however you want but you are making a mistake if you don't disallow the items that shouldn't show.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
As for the robots file, as I mentioned before, just go to any oscommerce shop (there are many listed in the above menu), and see what they have for a robots file. You are free to copy them as you wish.

 

Jack

Hi Jack

 

I have just been trawling through the online shops, & you will be supprised just how many do not have a robots.txt

 

Anyhow below is a list of what the rest seem to have. I do not know what some of these folders involve as I haven't had to deal with them yet, & I am learning as I go.

 

Do you think these need to be added to the sample given in the robots.txt contribution?

 

User_agent: *

Disallow:

/cgi_bin/

/admin/

/images/

/tmp/

/cache/

/download/

/pub/

 

What are these for (except admin & includes) please? Might help me understand what I am not letting the robots see.

 

Thank you very much

Julie

Posted
No need to apolgize. I think you are missing the basic point. Not all filess should be looked at by the search engines. Any file in the includes directory would not have any effect on your llistng results with the search engines since they are not setup for the search engines. So, even if you could list them, it would not help you. Likewise, why have the checkout_success page listed in the search engine, for example? It isn't optimized for any keywords so it won't help anyone find your site. It will just waste the search engines time, listing pages that shouldn't be listed. The more time they waste on those pages, the less they spend on the pages you want listed. The result is that you overall listing placement will suffer. Do it however you want but you are making a mistake if you don't disallow the items that shouldn't show.

 

Jack

 

normally you only need to disallow pages that are open for spiders to visit and which you do not want indexed.

 

that includes : login.php cookie_usage.php checkout_shipping.php and shopping_cart.php.

 

pages like checkout_confirmation, checkout_success are not visible for spiders as you need to be signed in to ever get to see links to them so you do not need to disallow those.

The same applied for the admin directory, the spiders do not do a directory search on your filesystem so as long as you have no link to your admin side on your webpages, spiders will not go there anyway.

Treasurer MFC

Posted
Hi Jack

 

I have just been trawling through the online shops, & you will be supprised just how many do not have a robots.txt

 

Anyhow below is a list of what the rest seem to have. I do not know what some of these folders involve as I haven't had to deal with them yet, & I am learning as I go.

 

Do you think these need to be added to the sample given in the robots.txt contribution?

What are these for (except admin & includes) please? Might help me understand what I am not letting the robots see.

 

Thank you very much

Julie

Those are directories used by the shop but are not meant for display, except for the images directory. Some sites sell images so they don't want those listed on the Internet. While others do want them listed. You have to decide which is best for you.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
normally you only need to disallow pages that are open for spiders to visit and which you do not want indexed.

 

that includes : login.php cookie_usage.php checkout_shipping.php and shopping_cart.php.

 

pages like checkout_confirmation, checkout_success are not visible for spiders as you need to be signed in to ever get to see links to them so you do not need to disallow those.

The same applied for the admin directory, the spiders do not do a directory search on your filesystem so as long as you have no link to your admin side on your webpages, spiders will not go there anyway.

I agree in principle. But if a link is present, like when the Dynamic Sitemap contribution is installed and not set up properly, the SE's will have a way to the pages. It doesn't cost anything to add the pages and may save some problems so I think it is best to add them.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted
I agree in principle. But if a link is present, like when the Dynamic Sitemap contribution is installed and not set up properly, the SE's will have a way to the pages. It doesn't cost anything to add the pages and may save some problems so I think it is best to add them.

 

Jack

 

http://www.asp101.com/articles/chris/spider/default.asp

From the address above, i don't understand this sentense. Is that means after spider found out sites and read all the relevent link. My sites' path will store in spider (yahoo spider/ google spider) database. Then it consider the listing rank for you? Is that make sense? Am i correct?

 

Until know, i understand a bit what for spider we use. Is that to minimized the chances that could confuse the SE. And limit the SE to see a particular/important page (Ex. Main page content)

 

"Pretty soon, you'll end up with thousands of pages and bits of information in your database. This web of paths is where the term 'spider' comes from."

 

Jack, can you provide any relevent articles here.

Thank

****

Hello World! ^.^ I'm a Internet naive. Browse my working profile

Malaysia Web Services - OPerion Website Marketing System

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...