Project2016 Posted February 7, 2010 Share Posted February 7, 2010 Hi - I've spent quite a while building a site and it's all working fine, but now its live its not getting indexed by the big G (Google). I think I've got includes/configure.php, .htaccess, robots.txt and spiders.txt all correct but they're shown below if anyone's got 5 mins to have a look I'd be eternally grateful. I've got seo urls 5 installed + the metatags contrib, + I've submitted it to google webmaster tools etc. I have also made sure to set sessions preferences as follows: Session Directory /tmp Force Cookie Use False Check SSL Session ID True Check User Agent False Check IP Address False Prevent Spider Sessions True Recreate Session True Any advice greatly appreciated project2016 -------------------------------- includes/configure.php (cleansed) -------------------------------- <?php define('HTTP_SERVER', 'http://www.MYSITE.co.uk'); define('HTTPS_SERVER', 'https://www.MYSITE.co.uk'); define('ENABLE_SSL', true); define('HTTP_COOKIE_DOMAIN', 'http://www.MYSITE.co.uk'); define('HTTPS_COOKIE_DOMAIN', 'https://www.MYSITE.co.uk'); define('HTTP_COOKIE_PATH', '/'); define('HTTPS_COOKIE_PATH', '/'); define('DIR_WS_HTTP_CATALOG', '/'); define('DIR_WS_HTTPS_CATALOG', '/'); define('DIR_WS_IMAGES', 'images/'); define('DIR_WS_ICONS', DIR_WS_IMAGES . 'icons/'); define('DIR_WS_INCLUDES', 'includes/'); define('DIR_WS_BOXES', DIR_WS_INCLUDES . 'boxes/'); define('DIR_WS_FUNCTIONS', DIR_WS_INCLUDES . 'functions/'); define('DIR_WS_CLASSES', DIR_WS_INCLUDES . 'classes/'); define('DIR_WS_MODULES', DIR_WS_INCLUDES . 'modules/'); define('DIR_WS_LANGUAGES', DIR_WS_INCLUDES . 'languages/'); define('DIR_WS_DOWNLOAD_PUBLIC', 'pub/'); define('DIR_FS_CATALOG', '/home/content/97/XXXXXXXXXXX/html/'); define('DIR_FS_DOWNLOAD', DIR_FS_CATALOG . 'download/'); define('DIR_FS_DOWNLOAD_PUBLIC', DIR_FS_CATALOG . 'pub/'); define('DB_SERVER', 'XXXXXXXXXXXXXXXXXXX'); define('DB_SERVER_USERNAME', 'XXXXXXXX'); define('DB_SERVER_PASSWORD', 'XXXXXXXX'); define('DB_DATABASE', 'XXXXXXXX'); define('USE_PCONNECT', 'false'); define('STORE_SESSIONS', 'mysql'); ?> -------------------- .HTACCESS -------------------- # $Id: .htaccess 1739 2007-12-20 00:52:16Z hpdl $ # # This is used with Apache WebServers # # For this to work, you must include the parameter 'Options' to # the AllowOverride configuration # # Example: # # <Directory "/usr/local/apache/htdocs"> # AllowOverride Options # </Directory> # # 'All' with also work. (This configuration is in the # apache/conf/httpd.conf file) # The following makes adjustments to the SSL protocol for Internet # Explorer browsers <IfModule mod_setenvif.c> <IfDefine SSL> SetEnvIf User-Agent ".*MSIE.*" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 </IfDefine> </IfModule> # If Search Engine Friendly URLs do not work, try enabling the # following Apache configuration parameter # AcceptPathInfo On # Fix certain PHP values # (commented out by default to prevent errors occuring on certain # servers) # php_value session.use_trans_sid 0 # php_value register_globals 1 Options +FollowSymLinks <IfModule mod_rewrite.c> RewriteEngine On # RewriteBase instructions # Change RewriteBase dependent on how your shop is accessed as below. # http://www.mysite.com = RewriteBase / # http://www.mysite.com/catalog/ = RewriteBase /catalog/ # http://www.mysite.com/catalog/shop/ = RewriteBase /catalog/shop/ # Change RewriteBase using the instructions above RewriteBase / RewriteRule ^(.*)-p-([0-9]+).html$ product_info.php?products_id=$2&%{QUERY_STRING} RewriteRule ^(.*)-c-([0-9_]+).html$ index.php?cPath=$2&%{QUERY_STRING} RewriteRule ^(.*)-m-([0-9]+).html$ index.php?manufacturers_id=$2&%{QUERY_STRING} RewriteRule ^(.*)-pi-([0-9]+).html$ popup_image.php?pID=$2&%{QUERY_STRING} RewriteRule ^(.*)-pr-([0-9]+).html$ product_reviews.php?products_id=$2&%{QUERY_STRING} RewriteRule ^(.*)-pri-([0-9]+).html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING} # Articles contribution RewriteRule ^(.*)-t-([0-9_]+).html$ articles.php?tPath=$2&%{QUERY_STRING} RewriteRule ^(.*)-a-([0-9]+).html$ article_info.php?articles_id=$2&%{QUERY_STRING} # Information pages RewriteRule ^(.*)-i-([0-9]+).html$ information.php?info_id=$2&%{QUERY_STRING} # Links contribution RewriteRule ^(.*)-links-([0-9_]+).html$ links.php?lPath=$2&%{QUERY_STRING} # Newsdesk contribution RewriteRule ^(.*)-n-([0-9]+).html$ newsdesk_info.php?newsdesk_id=$2&%{QUERY_STRING} RewriteRule ^(.*)-nc-([0-9]+).html$ newsdesk_index.php?newsPath=$2&%{QUERY_STRING} RewriteRule ^(.*)-nri-([0-9]+).html$ newsdesk_reviews_info.php?newsdesk_id=$2&%{QUERY_STRING} RewriteRule ^(.*)-nra-([0-9]+).html$ newsdesk_reviews_article.php?newsdesk_id=$2&%{QUERY_STRING} </IfModule> # this code added from xss shield Options +FollowSymLinks RewriteEngine On RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR] RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR] RewriteCond %{QUERY_STRING} (\<|%3C).*iframe.*(\>|%3E) [NC,OR] RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR] RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2}) RewriteRule ^(.*)$ index_error.php [F,L] RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK) RewriteRule .* - [F] # Deny domain access to spammers and other scumbags RewriteEngine on SetEnvIfNoCase User-Agent "^libwww-perl*" block_bad_bots Deny from env=block_bad_bots # Redirect index.php to domain.com # RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/ # RewriteRule ^index\.php$ http://www.MYSITE.co.uk/ [R=301,L] RewriteBase / # filter for most common exploits RewriteCond %{HTTP_USER_AGENT} libwww-perl [OR] RewriteCond %{QUERY_STRING} tool25 [OR] RewriteCond %{QUERY_STRING} cmd.txt [OR] RewriteCond %{QUERY_STRING} cmd.gif [OR] RewriteCond %{QUERY_STRING} r57shell [OR] RewriteCond %{QUERY_STRING} c99 [OR] # ban spam bots RewriteCond %{HTTP_USER_AGENT} almaden [OR] RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR] RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR] RewriteCond %{HTTP_USER_AGENT} ^attach [OR] RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR] RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR] RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR] RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR] RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR] RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^CICC [OR] RewriteCond %{HTTP_USER_AGENT} ^Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Copier [OR] RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DA [OR] RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR] RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Drip [OR] RewriteCond %{HTTP_USER_AGENT} ^DSurf15a [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EasyDL/2.99 [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} email [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^gigabaz [OR] RewriteCond %{HTTP_USER_AGENT} ^Go\!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^gotit [OR] RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^grub-client [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR] RewriteCond %{HTTP_USER_AGENT} ^httpdown [OR] RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Indy*Library [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^InternetLinkagent [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR] RewriteCond %{HTTP_USER_AGENT} ^Iria [OR] RewriteCond %{HTTP_USER_AGENT} ^JBH*agent [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^JustView [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR] RewriteCond %{HTTP_USER_AGENT} ^lftp [OR] RewriteCond %{HTTP_USER_AGENT} ^Link*Sleuth [OR] RewriteCond %{HTTP_USER_AGENT} ^likse [OR] RewriteCond %{HTTP_USER_AGENT} ^Link [OR] RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR] RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR] RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^Memo [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla*MSIECrawler [OR] RewriteCond %{HTTP_USER_AGENT} ^MS\ FrontPage* [OR] RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR] RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR] RewriteCond %{HTTP_USER_AGENT} ^MSProxy [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR] RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^Ping [OR] RewriteCond %{HTTP_USER_AGENT} ^PingALink [OR] RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR] RewriteCond %{HTTP_USER_AGENT} ^psbot [OR] RewriteCond %{HTTP_USER_AGENT} ^Pump [OR] RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR] RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR] RewriteCond %{HTTP_USER_AGENT} ^Seeker [OR] RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR] RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^Snake [OR] RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR] RewriteCond %{HTTP_USER_AGENT} ^sproose [OR] RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^URLSpiderPro [OR] RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[bb]andit [OR] RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR] RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^Webster [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus # WHAT DOES THIS DO? # RewriteRule ^.* - [F,L] # RewriteCond %{HTTP_REFERER} ^http://www.MYSITE.co.uk$ # RewriteRule !^http://[^/.]\.MYSITE.co.uk.* - [F,L] # deny most common except .php <FilesMatch "\.(inc|tpl|h|ihtml|sql|ini|conf|class|bin|spd|theme|module|exe)$"> </FilesMatch> # Disable .htaccess viewing from browser <Files ~ "^\.ht"> Order allow,deny Deny from all Satisfy All </Files> # Disable access to config.php <Files ~ "includes\configure.php$"> deny from all </Files> ----------------------------------- ROBOTS.TXT ----------------------------------- User-agent: * Disallow: /cgi-bin/ Disallow: /_db_backups/ Disallow: /.hcc.thumbs/ Disallow: /banned/ Disallow: /ext/ Disallow: /images/ Disallow: /includes/ Disallow: /lightbox/ Disallow: /phpThumb/ Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ Disallow: /pub/ Disallow: /stats/ Disallow: /account.php Disallow: /account_edit.php Disallow: /account_history.php Disallow: /account_history_info.php Disallow: /account_notifications.php Disallow: /account_newsletters.php Disallow: /account_password.php Disallow: /add_checkout_success.php Disallow: /address_book.php Disallow: /address_book_process.php Disallow: /advanced_search.php Disallow: /advanced_search_result.php Disallow: /blocked.php Disallow: /checkout.php Disallow: /checkout_confirmation.php Disallow: /checkout_payment.php Disallow: /checkout_payment_address.php Disallow: /checkout_process.php Disallow: /checkout_shipping.php Disallow: /checkout_shipping_address.php Disallow: /checkout_success.php Disallow: /conditions.php Disallow: /contact_us.php Disallow: /cookie_usage.php Disallow: /create_account.php Disallow: /create_account_success.php Disallow: /download.php Disallow: /gdform.php Disallow: /index_error.php Disallow: /info_shopping_cart.php Disallow: /login.php Disallow: /logoff.php Disallow: /oscthumb.php Disallow: /password_forgotten.php Disallow: /php.ini Disallow: /popup_image.php Disallow: /popup_search_help.php Disallow: /shopping_cart.php Disallow: /product_reviews.php Disallow: /product_reviews_write.php Disallow: /product_reviews_info.php Disallow: /products_new.php Disallow: /redirect.php Disallow: /reviews.php Disallow: /shipping.php Disallow: /shopping_cart.php Disallow: /slide.js Disallow: /specials.php Disallow: /ssl_check.php Disallow: /tell_a_friend.php # stops google image viewer from indexing the site User-agent: Googlebot-Image Disallow: / ---------------------------------------- SPIDERS.TXT ---------------------------------------- crawl slurp spider ebot obot abot dbot hbot kbot lbot mbot nbot pbot rbot sbot tbot ubot vbot ybot zbot bot. bot/ _bot .bot /bot -bot :bot (bot accoona adressendeutschland appie architext asterias atlocal atomz augurfind bannana_bot baypup bdfetch biglotron blaiz blo. blog boitho booch ccubee cfetch charlotte comagent combine csci curl dataparksearch daumoa depspid digger ditto dmoz docomo dtaagent ebingbong ejupiter falcon findlinks gazz genieknows goforit googlebot gralon grub gulliver harvest helix heritrix holmes homer htdig ia_archiver ichiro iconsurf iltrovatore indexer ingrid ivia jakarta jetbot kit_fireball knowledge kretrieve lachesis larbin libwww lwp mantraagent mapoftheinternet mediapartners mercator metacarta microsoft url control minirank miva mj12 mnogo moget/ multitext muscatferret myweb najdi nameprotect ncsa beta netmechanic netresearchserver ng/ nokia6682/ npbot noyona nutch objectssearch omni onetszukaj openintelligencedata osis-project pagebull page_verifier panscient pear. pogodak poirot pompos poppelsdorf psycheclone publisher python rambler salty sbider scooter scoutjet scrubby seeker seek. shopwiki sidewinder silk smartwit sna- snappy sohu sphere sphider spinner spyder steeler/ sygol szukacz tarantula t-h-u-n-d-e-r-s-t-o-n-e /teoma theophrastus tutorgig twiceler twisted updated vagabondo volcano voyager/ voyager-hc w3c_validator walker wauuu wavefire websitepulse wget wire worldlight worm wwwster xenu xirq yandex yanga yeti yodao zao/ zippp zyborg .... Link to comment Share on other sites More sharing options...
germ Posted February 7, 2010 Share Posted February 7, 2010 This won't help your problem but take this line out: Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ It does no good at all to rename the admin then tell "the bad guys" where to find it... :blush: If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you. "Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice." - Me - "Headers already sent" - The definitive help "Cannot redeclare ..." - How to find/fix it SSL Implementation Help Like this post? "Like" it again over there > Link to comment Share on other sites More sharing options...
Hotclutch Posted February 7, 2010 Share Posted February 7, 2010 You can simplify your robots.txt by using <meta name="robots" content="noindex, nofollow"> in files such as create_account and login.php instead of making them known to possible hackers. If your renamed Admin folder is password protected with htaccess then there is no need to reference it in your robots.txt Link to comment Share on other sites More sharing options...
www.in.no Posted February 7, 2010 Share Posted February 7, 2010 You dont get indexed before some site already in google link to you. Add link to webmaster tools may take for ever if you get listed at all. Make sure youe have some quality links going into your site from places already listed, and wait until next time they crawl their site and find link to yours.. It still take some time.. Link to comment Share on other sites More sharing options...
Jack_mcs Posted February 7, 2010 Share Posted February 7, 2010 You dont get indexed before some site already in google link to you. This is incorrect. A link to a site is not required to get it indexed. Graham - if your site has been up for almost any time at all, it is most likely already indexed. Go to google and type in site:your.domain_name to check it. Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. Get the latest versions of my addons Recommended SEO Addons Link to comment Share on other sites More sharing options...
www.in.no Posted February 7, 2010 Share Posted February 7, 2010 This is incorrect. A link to a site is not required to get it indexed. My fault, not ment as hars as that. It can get indexed for sure, but it may take a long time before it is crawled / visible in indexes if not having links in. At least thats my experience with it... :) Link to comment Share on other sites More sharing options...
MrPhil Posted February 7, 2010 Share Posted February 7, 2010 This is incorrect. A link to a site is not required to get it indexed. A search engine (e.g., Google) needs to learn about your site before it can crawl the site. They don't go around trying every possible domain name to see what works (there are a huge number of potential domain names). Some methods that they may use: * follow a link to your site from an existing, indexed site (e.g., in your "signature" in a forum posting) * you explicitly submit your site to the search engine, asking to be indexed * a hosting service may publish a list of sites on its servers (I wouldn't like that) * the search engine does a reverse DNS lookup on a known server, finding the sites on it (it needs to know about the server, first) * grab sites from a service that lists them (there are lots of sites out there that for one reason or another list all your "neighbors" on a server -- they need to start with a known site or server, and then probably do a reverse DNS lookup) If your server is not already known to the world, and there are no links to any site on it, and no site on it is already indexed, it might stay hidden. Some search engines may go through the trouble of trolling the DNS information to discover new servers. Link to comment Share on other sites More sharing options...
Project2016 Posted February 11, 2010 Author Share Posted February 11, 2010 This won't help your problem but take this line out: Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ It does no good at all to rename the admin then tell "the bad guys" where to find it... :blush: Fair point, so as long as /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ is protected by .htaccess google won't be able to index it? Is that right? Thats why it was in robots.txt. thanks, Link to comment Share on other sites More sharing options...
germ Posted February 11, 2010 Share Posted February 11, 2010 Fair point, so as long as /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ is protected by .htaccess google won't be able to index it? Is that right? Thats why it was in robots.txt. thanks, It's not google you need to worry about. There are conditions by which the admin can be compromised even behind a fully functional .htaccess protection. I've seen it happen. My point was don't tell the hackers where to find it. Adherence to robots.txt rules is not mandatory. If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you. "Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice." - Me - "Headers already sent" - The definitive help "Cannot redeclare ..." - How to find/fix it SSL Implementation Help Like this post? "Like" it again over there > Link to comment Share on other sites More sharing options...
RareandOrganic Posted February 13, 2010 Share Posted February 13, 2010 Hi Project2016 :-) Am no OSC expert and really appreciate all the stuff posted here by them that know ... but my experience with G would suggest that you would be advised to open for starters a Google Analytics account http://www.google.com/analytics/ This is straight forward and free ... benefit is you have tracking code to add on what will be your landing page. You can check that is correctly installed and then Google knows who you are and that you expect them to watch over the visitors. Also check out the Advertising Programme ... http://www.google.com/intl/en/ads/ If you don't mind their ads on your site then Adsense is one choice but do start a low budget Adwords one - I have had one for ages set to a limit of £2 per day which ticks over and helps to have established a "good name" or Quality Score. In fact your host IP may have a promotional programme whereby you can have a £75 or more credit to get started (necessary since as a newbie it costs a lot to buy clicks when your score is low ... maybe more than £1.50 pc rather than less than £0.30 when your QS is high). BTW - a lot of patience is required with Ad approval ! All these require code on your site which Google looks for and so you can be sure they will visit you on a regular basis (check the cached view in the Google search results to see when that page of your site was last visited by their "bot") Link to comment Share on other sites More sharing options...
Peper Posted February 14, 2010 Share Posted February 14, 2010 Disallow: /MY-RANDOMLY-NAMED-ADMIN-DIRECTORY/ Put a separate robots.txt file inside the admin directory User-agent: * Disallow: / Getting the Phoenix off the ground Link to comment Share on other sites More sharing options...
germ Posted February 14, 2010 Share Posted February 14, 2010 Some people just don't understand how web bots work. Normal web bots just follow links. If you don't have any links to your admin (whatever you named it) they'll NEVER end up there. Only the bad bots are trying to find your admin and they don't give sh*t what you "disallow" in the robots.txt file. If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you. "Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice." - Me - "Headers already sent" - The definitive help "Cannot redeclare ..." - How to find/fix it SSL Implementation Help Like this post? "Like" it again over there > Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.