Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

not getting indexed by google


Project2016

Recommended Posts

Hi - I've spent quite a while building a site and it's all working fine, but now its live its not getting indexed by the big G (Google). I think I've got includes/configure.php, .htaccess, robots.txt and spiders.txt all correct but they're shown below if anyone's got 5 mins to have a look I'd be eternally grateful. I've got seo urls 5 installed + the metatags contrib, + I've submitted it to google webmaster tools etc. I have also made sure to set sessions preferences as follows:

 

Session Directory /tmp

Force Cookie Use False

Check SSL Session ID True

Check User Agent False

Check IP Address False

Prevent Spider Sessions True

Recreate Session True

 

Any advice greatly appreciated

 

project2016

 

--------------------------------

includes/configure.php (cleansed)

--------------------------------

<?php

define('HTTP_SERVER', 'http://www.MYSITE.co.uk');

define('HTTPS_SERVER', 'https://www.MYSITE.co.uk');

define('ENABLE_SSL', true);

define('HTTP_COOKIE_DOMAIN', 'http://www.MYSITE.co.uk');

define('HTTPS_COOKIE_DOMAIN', 'https://www.MYSITE.co.uk');

define('HTTP_COOKIE_PATH', '/');

define('HTTPS_COOKIE_PATH', '/');

define('DIR_WS_HTTP_CATALOG', '/');

define('DIR_WS_HTTPS_CATALOG', '/');

define('DIR_WS_IMAGES', 'images/');

define('DIR_WS_ICONS', DIR_WS_IMAGES . 'icons/');

define('DIR_WS_INCLUDES', 'includes/');

define('DIR_WS_BOXES', DIR_WS_INCLUDES . 'boxes/');

define('DIR_WS_FUNCTIONS', DIR_WS_INCLUDES . 'functions/');

define('DIR_WS_CLASSES', DIR_WS_INCLUDES . 'classes/');

define('DIR_WS_MODULES', DIR_WS_INCLUDES . 'modules/');

define('DIR_WS_LANGUAGES', DIR_WS_INCLUDES . 'languages/');

 

define('DIR_WS_DOWNLOAD_PUBLIC', 'pub/');

define('DIR_FS_CATALOG', '/home/content/97/XXXXXXXXXXX/html/');

define('DIR_FS_DOWNLOAD', DIR_FS_CATALOG . 'download/');

define('DIR_FS_DOWNLOAD_PUBLIC', DIR_FS_CATALOG . 'pub/');

 

define('DB_SERVER', 'XXXXXXXXXXXXXXXXXXX');

define('DB_SERVER_USERNAME', 'XXXXXXXX');

define('DB_SERVER_PASSWORD', 'XXXXXXXX');

define('DB_DATABASE', 'XXXXXXXX');

define('USE_PCONNECT', 'false');

define('STORE_SESSIONS', 'mysql');

?>

 

--------------------

.HTACCESS

--------------------

 

# $Id: .htaccess 1739 2007-12-20 00:52:16Z hpdl $

#

# This is used with Apache WebServers

#

# For this to work, you must include the parameter 'Options' to

# the AllowOverride configuration

#

# Example:

#

# <Directory "/usr/local/apache/htdocs">

# AllowOverride Options

# </Directory>

#

# 'All' with also work. (This configuration is in the

# apache/conf/httpd.conf file)

 

# The following makes adjustments to the SSL protocol for Internet

# Explorer browsers

 

<IfModule mod_setenvif.c>

<IfDefine SSL>

SetEnvIf User-Agent ".*MSIE.*" \

nokeepalive ssl-unclean-shutdown \

downgrade-1.0 force-response-1.0

</IfDefine>

</IfModule>

 

# If Search Engine Friendly URLs do not work, try enabling the

# following Apache configuration parameter

 

# AcceptPathInfo On

 

# Fix certain PHP values

# (commented out by default to prevent errors occuring on certain

# servers)

 

# php_value session.use_trans_sid 0

# php_value register_globals 1

 

 

Options +FollowSymLinks

<IfModule mod_rewrite.c>

RewriteEngine On

 

# RewriteBase instructions

# Change RewriteBase dependent on how your shop is accessed as below.

# http://www.mysite.com = RewriteBase /

# http://www.mysite.com/catalog/ = RewriteBase /catalog/

# http://www.mysite.com/catalog/shop/ = RewriteBase /catalog/shop/

 

# Change RewriteBase using the instructions above

RewriteBase /

 

RewriteRule ^(.*)-p-([0-9]+).html$ product_info.php?products_id=$2&%{QUERY_STRING}

RewriteRule ^(.*)-c-([0-9_]+).html$ index.php?cPath=$2&%{QUERY_STRING}

RewriteRule ^(.*)-m-([0-9]+).html$ index.php?manufacturers_id=$2&%{QUERY_STRING}

RewriteRule ^(.*)-pi-([0-9]+).html$ popup_image.php?pID=$2&%{QUERY_STRING}

RewriteRule ^(.*)-pr-([0-9]+).html$ product_reviews.php?products_id=$2&%{QUERY_STRING}

RewriteRule ^(.*)-pri-([0-9]+).html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING}

# Articles contribution

RewriteRule ^(.*)-t-([0-9_]+).html$ articles.php?tPath=$2&%{QUERY_STRING}

RewriteRule ^(.*)-a-([0-9]+).html$ article_info.php?articles_id=$2&%{QUERY_STRING}

# Information pages

RewriteRule ^(.*)-i-([0-9]+).html$ information.php?info_id=$2&%{QUERY_STRING}

# Links contribution

RewriteRule ^(.*)-links-([0-9_]+).html$ links.php?lPath=$2&%{QUERY_STRING}

# Newsdesk contribution

RewriteRule ^(.*)-n-([0-9]+).html$ newsdesk_info.php?newsdesk_id=$2&%{QUERY_STRING}

RewriteRule ^(.*)-nc-([0-9]+).html$ newsdesk_index.php?newsPath=$2&%{QUERY_STRING}

RewriteRule ^(.*)-nri-([0-9]+).html$ newsdesk_reviews_info.php?newsdesk_id=$2&%{QUERY_STRING}

RewriteRule ^(.*)-nra-([0-9]+).html$ newsdesk_reviews_article.php?newsdesk_id=$2&%{QUERY_STRING}

</IfModule>

 

 

# this code added from xss shield

Options +FollowSymLinks

RewriteEngine On

RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]

RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]

RewriteCond %{QUERY_STRING} (\<|%3C).*iframe.*(\>|%3E) [NC,OR]

RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]

RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})

RewriteRule ^(.*)$ index_error.php [F,L]

RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)

RewriteRule .* - [F]

 

 

 

# Deny domain access to spammers and other scumbags

 

RewriteEngine on

SetEnvIfNoCase User-Agent "^libwww-perl*" block_bad_bots

Deny from env=block_bad_bots

 

 

# Redirect index.php to domain.com

# RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/

# RewriteRule ^index\.php$ http://www.MYSITE.co.uk/ [R=301,L]

 

 

RewriteBase /

# filter for most common exploits

 

RewriteCond %{HTTP_USER_AGENT} libwww-perl [OR]

RewriteCond %{QUERY_STRING} tool25 [OR]

RewriteCond %{QUERY_STRING} cmd.txt [OR]

RewriteCond %{QUERY_STRING} cmd.gif [OR]

RewriteCond %{QUERY_STRING} r57shell [OR]

RewriteCond %{QUERY_STRING} c99 [OR]

 

# ban spam bots

 

RewriteCond %{HTTP_USER_AGENT} almaden [OR]

RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]

RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]

RewriteCond %{HTTP_USER_AGENT} ^attach [OR]

RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]

RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]

RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]

RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]

RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]

RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]

RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]

RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR]

RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]

RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]

RewriteCond %{HTTP_USER_AGENT} ^CICC [OR]

RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]

RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]

RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]

RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]

RewriteCond %{HTTP_USER_AGENT} ^DA [OR]

RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]

RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]

RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]

RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]

RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]

RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]

RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]

RewriteCond %{HTTP_USER_AGENT} ^DSurf15a [OR]

RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]

RewriteCond %{HTTP_USER_AGENT} ^EasyDL/2.99 [OR]

RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]

RewriteCond %{HTTP_USER_AGENT} email [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]

RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]

RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]

RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]

RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]

RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]

RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]

RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]

RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]

RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]

RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]

RewriteCond %{HTTP_USER_AGENT} ^gigabaz [OR]

RewriteCond %{HTTP_USER_AGENT} ^Go\!Zilla [OR]

RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]

RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]

RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]

RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]

RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]

RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]

RewriteCond %{HTTP_USER_AGENT} ^grub-client [OR]

RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]

RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]

RewriteCond %{HTTP_USER_AGENT} ^httpdown [OR]

RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]

RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]

RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]

RewriteCond %{HTTP_USER_AGENT} ^Indy*Library [OR]

RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]

RewriteCond %{HTTP_USER_AGENT} ^InternetLinkagent [OR]

RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]

RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]

RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]

RewriteCond %{HTTP_USER_AGENT} ^JBH*agent [OR]

RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]

RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]

RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]

RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]

RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]

RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]

RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]

RewriteCond %{HTTP_USER_AGENT} ^Link*Sleuth [OR]

RewriteCond %{HTTP_USER_AGENT} ^likse [OR]

RewriteCond %{HTTP_USER_AGENT} ^Link [OR]

RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]

RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]

RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]

RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]

RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla*MSIECrawler [OR]

RewriteCond %{HTTP_USER_AGENT} ^MS\ FrontPage* [OR]

RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]

RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]

RewriteCond %{HTTP_USER_AGENT} ^MSProxy [OR]

RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]

RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]

RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]

RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]

RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]

RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]

RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]

RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]

RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]

RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]

RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]

RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]

RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]

RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]

RewriteCond %{HTTP_USER_AGENT} ^PingALink [OR]

RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]

RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]

RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]

RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]

RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]

RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]

RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]

RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]

RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]

RewriteCond %{HTTP_USER_AGENT} ^Seeker [OR]

RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]

RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]

RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]

RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]

RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]

RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]

RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]

RewriteCond %{HTTP_USER_AGENT} ^sproose [OR]

RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]

RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]

RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]

RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]

RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]

RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR]

RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]

RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]

RewriteCond %{HTTP_USER_AGENT} ^URLSpiderPro [OR]

RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]

RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]

RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]

RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]

RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[bb]andit [OR]

RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]

RewriteCond %{HTTP_USER_AGENT} ^Web\ Downloader [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]

RewriteCond %{HTTP_USER_AGENT} ^Website [OR]

RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]

RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]

RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]

RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]

RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]

RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]

RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]

RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]

RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [OR]

RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]

RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]

RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]

RewriteCond %{HTTP_USER_AGENT} ^Zeus

 

# WHAT DOES THIS DO?

# RewriteRule ^.* - [F,L]

# RewriteCond %{HTTP_REFERER} ^http://www.MYSITE.co.uk$

# RewriteRule !^http://[^/.]\.MYSITE.co.uk.* - [F,L]

 

# deny most common except .php

 

<FilesMatch "\.(inc|tpl|h|ihtml|sql|ini|conf|class|bin|spd|theme|module|exe)$">

</FilesMatch>

 

 

# Disable .htaccess viewing from browser

 

<Files ~ "^\.ht">

 

Order allow,deny

Deny from all

Satisfy All

 

</Files>

 

 

# Disable access to config.php

 

<Files ~ "includes\configure.php$">

deny from all

 

</Files>

 

-----------------------------------

ROBOTS.TXT

-----------------------------------

User-agent: *

 

Disallow: /cgi-bin/

Disallow: /_db_backups/

Disallow: /.hcc.thumbs/

Disallow: /banned/

Disallow: /ext/

Disallow: /images/

Disallow: /includes/

Disallow: /lightbox/

Disallow: /phpThumb/

Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/

Disallow: /pub/

Disallow: /stats/

 

Disallow: /account.php

Disallow: /account_edit.php

Disallow: /account_history.php

Disallow: /account_history_info.php

Disallow: /account_notifications.php

Disallow: /account_newsletters.php

Disallow: /account_password.php

Disallow: /add_checkout_success.php

Disallow: /address_book.php

Disallow: /address_book_process.php

Disallow: /advanced_search.php

Disallow: /advanced_search_result.php

Disallow: /blocked.php

Disallow: /checkout.php

Disallow: /checkout_confirmation.php

Disallow: /checkout_payment.php

Disallow: /checkout_payment_address.php

Disallow: /checkout_process.php

Disallow: /checkout_shipping.php

Disallow: /checkout_shipping_address.php

Disallow: /checkout_success.php

Disallow: /conditions.php

Disallow: /contact_us.php

Disallow: /cookie_usage.php

Disallow: /create_account.php

Disallow: /create_account_success.php

Disallow: /download.php

Disallow: /gdform.php

Disallow: /index_error.php

Disallow: /info_shopping_cart.php

Disallow: /login.php

Disallow: /logoff.php

Disallow: /oscthumb.php

Disallow: /password_forgotten.php

Disallow: /php.ini

Disallow: /popup_image.php

Disallow: /popup_search_help.php

Disallow: /shopping_cart.php

Disallow: /product_reviews.php

Disallow: /product_reviews_write.php

Disallow: /product_reviews_info.php

Disallow: /products_new.php

Disallow: /redirect.php

Disallow: /reviews.php

Disallow: /shipping.php

Disallow: /shopping_cart.php

Disallow: /slide.js

Disallow: /specials.php

Disallow: /ssl_check.php

Disallow: /tell_a_friend.php

 

# stops google image viewer from indexing the site

User-agent: Googlebot-Image

Disallow: /

 

----------------------------------------

SPIDERS.TXT

----------------------------------------

crawl

slurp

spider

ebot

obot

abot

dbot

hbot

kbot

lbot

mbot

nbot

pbot

rbot

sbot

tbot

ubot

vbot

ybot

zbot

bot.

bot/

_bot

.bot

/bot

-bot

:bot

(bot

accoona

adressendeutschland

appie

architext

asterias

atlocal

atomz

augurfind

bannana_bot

baypup

bdfetch

biglotron

blaiz

blo.

blog

boitho

booch

ccubee

cfetch

charlotte

comagent

combine

csci

curl

dataparksearch

daumoa

depspid

digger

ditto

dmoz

docomo

dtaagent

ebingbong

ejupiter

falcon

findlinks

gazz

genieknows

goforit

googlebot

gralon

grub

gulliver

harvest

helix

heritrix

holmes

homer

htdig

ia_archiver

ichiro

iconsurf

iltrovatore

indexer

ingrid

ivia

jakarta

jetbot

kit_fireball

knowledge

kretrieve

lachesis

larbin

libwww

lwp

mantraagent

mapoftheinternet

mediapartners

mercator

metacarta

microsoft url control

minirank

miva

mj12

mnogo

moget/

multitext

muscatferret

myweb

najdi

nameprotect

ncsa beta

netmechanic

netresearchserver

ng/

nokia6682/

npbot

noyona

nutch

objectssearch

omni

onetszukaj

openintelligencedata

osis-project

pagebull

page_verifier

panscient

pear.

pogodak

poirot

pompos

poppelsdorf

psycheclone

publisher

python

rambler

salty

sbider

scooter

scoutjet

scrubby

seeker

seek.

shopwiki

sidewinder

silk

smartwit

sna-

snappy

sohu

sphere

sphider

spinner

spyder

steeler/

sygol

szukacz

tarantula

t-h-u-n-d-e-r-s-t-o-n-e

/teoma

theophrastus

tutorgig

twiceler

twisted

updated

vagabondo

volcano

voyager/

voyager-hc

w3c_validator

walker

wauuu

wavefire

websitepulse

wget

wire

worldlight

worm

wwwster

xenu

xirq

yandex

yanga

yeti

yodao

zao/

zippp

zyborg

....

Link to comment
Share on other sites

This won't help your problem but take this line out:

 

Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/

It does no good at all to rename the admin then tell "the bad guys" where to find it...

:blush:

If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you.

 

"Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice."

- Me -

 

"Headers already sent" - The definitive help

 

"Cannot redeclare ..." - How to find/fix it

 

SSL Implementation Help

 

Like this post? "Like" it again over there >

Link to comment
Share on other sites

You can simplify your robots.txt by using <meta name="robots" content="noindex, nofollow"> in files such as create_account and login.php instead of making them known to possible hackers. If your renamed Admin folder is password protected with htaccess then there is no need to reference it in your robots.txt

Link to comment
Share on other sites

You dont get indexed before some site already in google link to you. Add link to webmaster tools may take for ever if you get listed at all.

Make sure youe have some quality links going into your site from places already listed, and wait until next time they crawl their site and find link to yours.. It still take some time..

Link to comment
Share on other sites

You dont get indexed before some site already in google link to you.

This is incorrect. A link to a site is not required to get it indexed.

 

Graham - if your site has been up for almost any time at all, it is most likely already indexed. Go to google and type in site:your.domain_name to check it.

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

Get the latest versions of my addons

Recommended SEO Addons

Link to comment
Share on other sites

This is incorrect. A link to a site is not required to get it indexed.

 

My fault, not ment as hars as that. It can get indexed for sure, but it may take a long time before it is crawled / visible in indexes if not having links in.

 

At least thats my experience with it... :)

Link to comment
Share on other sites

This is incorrect. A link to a site is not required to get it indexed.

A search engine (e.g., Google) needs to learn about your site before it can crawl the site. They don't go around trying every possible domain name to see what works (there are a huge number of potential domain names). Some methods that they may use:

 

* follow a link to your site from an existing, indexed site (e.g., in your "signature" in a forum posting)

* you explicitly submit your site to the search engine, asking to be indexed

* a hosting service may publish a list of sites on its servers (I wouldn't like that)

* the search engine does a reverse DNS lookup on a known server, finding the sites on it (it needs to know about the server, first)

* grab sites from a service that lists them (there are lots of sites out there that for one reason or another list all your "neighbors" on a server -- they need to start with a known site or server, and then probably do a reverse DNS lookup)

 

If your server is not already known to the world, and there are no links to any site on it, and no site on it is already indexed, it might stay hidden. Some search engines may go through the trouble of trolling the DNS information to discover new servers.

Link to comment
Share on other sites

This won't help your problem but take this line out:

 

Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/

It does no good at all to rename the admin then tell "the bad guys" where to find it...

:blush:

 

Fair point, so as long as /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ is protected by .htaccess google won't be able to index it? Is that right? Thats why it was in robots.txt.

 

thanks,

Link to comment
Share on other sites

Fair point, so as long as /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ is protected by .htaccess google won't be able to index it? Is that right? Thats why it was in robots.txt.

 

thanks,

It's not google you need to worry about.

 

There are conditions by which the admin can be compromised even behind a fully functional .htaccess protection.

 

I've seen it happen.

 

My point was don't tell the hackers where to find it.

 

Adherence to robots.txt rules is not mandatory.

If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you.

 

"Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice."

- Me -

 

"Headers already sent" - The definitive help

 

"Cannot redeclare ..." - How to find/fix it

 

SSL Implementation Help

 

Like this post? "Like" it again over there >

Link to comment
Share on other sites

Hi Project2016 :-)

 

Am no OSC expert and really appreciate all the stuff posted here by them that know ... but my experience with G would suggest that you would be advised to open for starters a Google Analytics account

 

http://www.google.com/analytics/

 

This is straight forward and free ... benefit is you have tracking code to add on what will be your landing page. You can check that is correctly installed and then Google knows who you are and that you expect them to watch over the visitors.

 

Also check out the Advertising Programme ... http://www.google.com/intl/en/ads/

 

If you don't mind their ads on your site then Adsense is one choice but do start a low budget Adwords one - I have had one for ages set to a limit of £2 per day which ticks over and helps to have established a "good name" or Quality Score. In fact your host IP may have a promotional programme whereby you can have a £75 or more credit to get started (necessary since as a newbie it costs a lot to buy clicks when your score is low ... maybe more than £1.50 pc rather than less than £0.30 when your QS is high). BTW - a lot of patience is required with Ad approval !

 

All these require code on your site which Google looks for and so you can be sure they will visit you on a regular basis (check the cached view in the Google search results to see when that page of your site was last visited by their "bot")

Link to comment
Share on other sites

Some people just don't understand how web bots work.

 

Normal web bots just follow links.

 

If you don't have any links to your admin (whatever you named it) they'll NEVER end up there.

 

Only the bad bots are trying to find your admin and they don't give sh*t what you "disallow" in the robots.txt file.

If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you.

 

"Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice."

- Me -

 

"Headers already sent" - The definitive help

 

"Cannot redeclare ..." - How to find/fix it

 

SSL Implementation Help

 

Like this post? "Like" it again over there >

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...