Monday, September 3, 2012

Aboundex bot

It's my first post with some story that may help you. So enjoy!
In the last month or so, I have encountered a lot of bots (around a thousand) scraping a forum I'm admin in. After some digging, I've seen that the most of the bots come from 173.192.x.x segment.
I went to a "whois" site which suggested that the segment is part of softlayer data centers:
NetRange:       173.192.0.0 - 173.193.255.255
CIDR:           173.192.0.0/15
OriginAS:       AS36351
NetName:        SOFTLAYER-4-8
NetHandle:      NET-173-192-0-0-1
Parent:         NET-173-0-0-0-0
NetType:        Direct Allocation
Comment:        SoftLayer provides on-demand IT infrastructure, dedicated servers and cloud resources.
RegDate:        2009-07-21
Updated:        2012-03-09
Ref:            http://whois.arin.net/rest/net/NET-173-192-0-0-1
What the? My site is located in Israel, and it's in Hebrew, so there's no reason for them to scan my site.
But, after googling around I've found this:
The Aboundex Crawler is a bot from Aboundex Search, currently operating out of the Softlayer network with the IP Address 173.192.34.95.
Reports about the Aboundex crawler claim it ignores rules in robots.txt, and is a fast page scraper which may switch IP's when blocked from spidering pages.
According to this, the Aboundex Crawler bot ignores the robots.txt file. So why just not ban them?
Well, I think if some new search engine or whatever want to make a good reputation, then it must follow some simple rules, and of course one of them is the robots.txt. So maybe something is wrong with my site? Let's check out what the Aboundex site suggest.
The site doesn't seem to be working, as it says "under construction" when you try to search something, but there is an about page with this info (the only link on the site):
How do i stop Aboundexbot from indexing my website? If you have a concern about Aboundexbot, we hope you give us a chance to address it via the email below but if you need to block Aboundexbot, the robots.txt file will allow you to accomplish that goal.

To block Aboundexbot from your entire web site you add this to your robots.txt file:

User-agent: Aboundexbot
Disallow: /  
I guess it's a good thing to try it. What you think? I'll update later as I'll add it to the forums.

Hope you enjoyed :)

1 comment:

  1. Airjordan 8 Casino Online | Casino - Air Jordan 8
    Enjoy the most popular, best air jordan 18 retro toro mens sneakers order and most where to find air jordan 18 retro varsity red luxurious casino 토토갤러리 games and games around! Airjordan 8 Casino Online 금융 거래 정보 제공 사실 통보서 is an ideal venue air jordan 18 retro yellow suede online free shipping for all types of players from

    ReplyDelete