|
|
|
|
Bots vs Browsers - database of 172,463 user agents and growing
The Latest: |
2007.10.25 - The Creature Post
After looking at the types of bots we've had drop in lately, plus keeping with the halloween spirit, we figured it was time we had a "Creature" post.
-
To start of the creatures, we were visited by a squid / clam - no kidding:
SquidClamAV_Redirector 1.8.2.
This seems to be some sort of proxy / redirector bot. Not sure what redirecting has to do with squids or clams, but the idea is interesting.
related...
-
Next up, we have the dreadful
Yeti/0.01,
a creature so fierce that they even have a ride at Disney devoted to it.
As fierce as the Yeti may be, the user agent string does kindly inform you that it does "check robots.txt daily and follow it".
related...
-
genieBot
paid us a visit, but did not offer us three wishes - just a contact email for the bot. Bummer.
related...
-
Okay, so this one may be a stretch, but
CookieMonster
is still a creature, right? Maybe we should save this one for a Sesame Street post. We are calling this one a bot until we see otherwise.
related...
-
To round off the list, we have a classic bot that hits us quite a bit, but in a slightly different form this time - you guess it,
Python-urllib/1.17
related...
Happy halloween everyone - this brings our site to 110,099 user agents and 1,293 bots.
|
|
2007.9.7 - iTunes, Facebook, Google Earth, and more
We have introduced some new categories for some of the various bots and user agents we are seeing for the first time.
-
Google Earth
and iTunes have browsers built into their systems, so their user agents will be tracked in these two new categories.
- We've seen several
RSS related user agents - mainly bots - and we'll be tracking these under this category going forward.
- Finally, we have started tracking Sitemap user agents - some of these are from us evaluating sitemap tools, and some are visitors that we did not invite.
Usually, these tools are used to build sitemaps for Google or Yahoo!, which both offer useful sitemap services to help webmasters publish all of their URLs directly to the search engines to help the web crawlers do a thorough job.
We hope that the new categories help with all of your bot research - enjoy!
|
2007.7.22 - Hot Summer Bots
The bots continue to roll in this summer, so here's a catch-up on the last few weeks:
-
Spock Crawler
visited us on it's "mission to index every single human being on the planet" (no, seriously - this was taken straight from their website).
Whatever their service is, it is invitation only for now.
related...
-
Pete-Spider/1.1
is a new spider that dropped in, and the top google results for this user agent point to our site.
Since we have no idea what this spider does, it appears nobody does.
related...
-
Another new bot, ilial/Nutch-0.9
dropped in, containing a Nutch-0.9 engine and owned by "Ilial, Inc. is a Los Angeles based Internet startup company."
According to their site, ilial is a new search engine built from venture capital that has technology to redefine the online advertising industry.
related...
-
Don't forget our new
iPhone category
- it now has a grand total of 17 user agents. If you have an iPhone, please visit our site so we can capture the user agent and add it to the list.
related...
This leaves us at 86,589 total user agents in the database, and 1,102 are bots.
|
2007.7.6 - The Apple iPhones have landed!
With the new release of the Apple iPhone, we already have a few iPhone user agents in the logs. We even dedicated a new category to these user agents, since we anticipate many more, and hope this is useful to our readers.
In other news, we have discontinued our Internet Explorer 7 category, since it had grown to the point that it was no longer of any use. Our search feature is a much better way to research IE 7 browsers in a more targetted fashion.
This week leaves us with 83,074 user agents and 1,078 bots.
|
2007.6.13 - Twiceler calls off the bots
Twiceler was more than helpful with my request to slow down on our sites. Within half an hour of my email, here is their reply:
Sorry for any inconvenience. We should not be crawling
your sites that fast. I will stop crawling everything in your
IP space immediately.
I will check that we understand your robots.txt, but robots.txt
often has significant time delay as the standard says that you
should cache the robots.txt file for seven days.
Thanks, twiceler! That was surprisingly helpful, and a very fast response.
related...
|
2007.6.13 - Twiceler Strikes!
Our sites have been covered up in traffic from twiceler robots this month. I finally had to break down and add an entry to our robots.txt file to disallow them from the root. For those of you who don't know, we've always been very liberal with our robots.txt file here on Bots vs Browsers.com to encourage a variety of robots to visit us and leave their information and so we could observe bot behavior in the wild without any disturbances.
In addition, I wrote an email twiceler's address mentioned on their website asking them to stop hitting our sites. I'll make sure and post an update on their reply to see how responsive they are. The next step is for me to ban their IP and move on, but I like to give people the chance to do the right thing.
related...
|
2007.6.3 - Land of 1,000 bots
Wow, a whole month without posting. In that time, we broke the 1,000 bot mark, added new features for researching user agents by IP, and added a nifty Donation button to the top of the screen for anyone that feels compelled to help keep the lights on here. We have alot to report on, so here's the high points:
-
We spent some time enhancing our directory to allow user agent research by IP address.
We did this by launching our new User Agent Ip Directory.
When viewing an IP's details, we provide a listing of user agents that have been detected from that IP, as well as an ARIN WHOIS lookup on the IP to see who owns the actual IP or IP range.
In addition, when viewing a user agent's details, you can click known IP's from that user agent to see other user agents from that ip.
-
The mysterious VadixBot visited us also.
Unfortunately, we don't have any information about this bot. In fact, when we google it, our site is the first result, and that was less than helpful.
related...
-
Here is a variety of Nutch bots that have passed by in the last month - there's more to each user agent string, so click the link on each to see more details:
related...
This puts us at 76,859 user agents and 1,020 bots.
|
2007.4.22 - April Bots Bring May Traffic... (we hope)
April bots continue to hit our sites, and hopefully the bandwidth we loan to these robots this month will bring us more traffic in May.
One can hope, right? For all we know, these bots are just ripping off content from our sites, and taking it for their own.
Anyways, here's the highlights for the last week's worth of bots:
-
Dan Matan's site attempted to inject HTML
for a link to his website. As usual, we don't permit free-loading on our site for links or script from user agents.
As always, we recommend you protect your site from hijack attempts by stripping or encoding HTML or script that clients can input.
related...
This leaves us with 70,300 user agents and 941 bots.
|
2007.4.16 - Spring Cleaning in the Web Traffic Logs
Springtime brings spring cleaning, so we're going to try and clean out the user agent logs since it's been 3 weeks (or more) since our last post. Here's what we found lurking around in our absence:
-
As far as mythical creatures in the traffic logs, we saw a
gnome-vfs/2.16.2 neon/0.25.4
digging around and a
Yeti/0.01
crawling through. The Yeti's user agent told us that it "check robots.txt daily and follows it", and is a variation of Naver.
related...
-
We added 2 variations of the Larbin Web Crawler to our database this week -
here's one
and here's the other.
related...
-
This one was unique - the facebookscraper/1.0
visited our sites, even though we are not facebook. I guess it got lost and scraped up on some other sites?
related...
-
Someone is actually keeping a meta tags directory and is harvesting their content using
metatagsdir/0.7.
Seems like an odd hobby, but then again, running a blog for web robots isn't exactly normal.
related...
This update puts us at 69,429 user agents and 939 bots.
|
2007.3.24 - March Bot Madness
The last couple of weeks have brought us over 20 bots and a few other points of interest. Here they are:
-
If you see contype
in your user agent logs, this is not a bot. Microsoft has published a knowledgebase article about this problem -
in a nutshell, older versions of IE (4 to 5.5) send multiple requests to retrieve an unknown plugin.
One of the multiple requests uses the user agent "contype" to request just the content-type from the server.
Click here to find out more.
related...
-
In other news, Quantcastbot/1.0
visited us this week. Quantcast provides ratings and demographic information to web sites for free.
related...
-
A new version of Blaiz-Bee visited us -
Blaiz-Bee/2.00.8315.
This particular version crawls for a search engine made entirely of Windows 98 servers using sub-standard hardware.
It's good to know that all those licenses of 10-year old Windows have a home and can feel useful.
related...
We now have 65,484 user agents and 898 bots.
|
| More... |
|
|