|
|
Bots vs Browsers - database of 311,536 user agents and growing
The Latest: |
2009.5.31 - Summer Bots make a splash!
With summer beginning, we saw a big splash from some new bots, and some new versions of old familar bots.
-
One interesting new bot this month was from the search engine Caret Byte (a.k.a. ^Byte).
The Caret Byte search engine is yet another search engine setting out to serve the global community by crawling free data and re-presenting it for a profit.
According to their site, they respect robots.txt files, so if you do not want to participate in serving the global community, at least you have a choice.
related...
-
In one of the most astounding feats of software development known to man, the rassler bot advanced from v0.6 to v0.13 in the span of one month.
We saw a new version every few days, so we assume the rassler development crew had a busy month putting out so many builds.
Here are links to the respective versions with their debut dates as well:
2009-05-04 - rassler/0.6
2009-05-08 - rassler/0.7
2009-05-13 - rassler/0.8
2009-05-14 - rassler/0.9
2009-05-24 - rassler/0.10
2009-05-25 - rassler/0.11
2009-05-26 - rassler/0.12
2009-05-29 - rassler/0.13
-
Heeii/0.3.2 is another new bot this month.
This one comes from the Heeii toolbar that is downloadable as a browser add-on that allows users to recommend links to others.
related...
-
Yandex is still one of the most active bots in our logs, this month debuting a new bot by the name of
Yandex/2.01.000.
related...
-
Nutch made the second biggest splash this month as far as new bots - here are the new versions we saw:
related...
-
By far and away the biggest splash of the month came from bots related to tdmsic.org. We saw 23 new bots this month.
Tdmsic stands for "The Danish Main Securtiy Intelligence Network", a new community which hit beta this year.
According to their site, "TDMSIC engages in research and development of high-leverage computersystem-technology for public and non-public purposes."
Here are a few of the bots that showed up this month:
related...
This puts us dangerously close to the 300,000 mark with 297,455 user agents and 3,233 bots.
|
|
2009.4.18 - Where did Palm go?
Recently I was helping a client setup their phone for Google Apps' mail (Just GMail with a different face).
I had gone through some very helpful support articles which included instructions for phones running Android,
Windows Mobile 5 and 6,
iPhone,
Blackberry, and the list goes on.
The client had told me they were running a Treo 700, which I assumed was running Windows Mobile 5 or 6.
After some research, I found that their version was actually a Treo 700p running Palm's OS, not the Treo 700w running Windows Mobile.
In looking for instructions for setting this up, help was scarce, even from Google's extensive mobile help knowledgebase.
Considering their former prominence in the mobile market, one would think that legacy support for Palm's operating systems would still be strong.
Granted, I have not used a Palm in over 10 years, and back then it was a basic Palm Pilot PDA in black and white, but for many years they were the prevailing mobile device manufacturer.
In light of their market domination in handheld devices between 2000 and 2005, I became curious about how they disappeared so quickly.
In doing a little research, it seems that Palm just missed the boat on the smartphone industry.
They had an early competitive advantage in the market, but were late to market and slow to innovate in the smartphone revolution.
The idea of a PDA without a phone built-in is redundant these days, with the only exception being the iPod Touch.
Palm is all but dead in the phone market and smaller PDA market these days.
The strangest part about Palm's lingering PDA presence is their online Palm shop.
They still list three models of phone-less PDA's,
but when viewing their availability, all three report that "This product is not currently available on the Palm Store".
In memory of Palm's former market presence, I reviewed our user agent logs to get an idea of what we've seen over the years.
Over the last 4 years, we have captured about 70 variations of Palm user agents,
few of which ran Palm OS.
|
2009.2.18 - Happy Belated 3rd Birthday to Us!
So it's been a while since we've posted our progress, so here goes:
We've been busy, and so have our web traffic logs - we've discovered 320 bots since our last post, and found over 31,000 other new user agents since then.
There's alot of noise in those logs, so here are the high points that we hope will interest you:
-
Yacybot has become one of our most active bots recently.
We now have 90 variations of this bot, and we have seen a fair amount of traffic from them.
These bots are from Yacy.net, a "distributed web search" that can be downloaded and run locally as "a scalable personal web crawler and web search engine".
related...
-
Sensis Web Crawler
dropped in - according to their site, "Sensis Corporation's Purpose is to provide distinctively elegant, innovative technical solutions in the service of humanity."
related...
-
Over the past two months, we've had many flavors of Nutch pass by - below is a taste of what we've seen
related...
-
Where do all the bad bots go? We found out that there is in fact a "Bot Hell"
(Bothell, WA)
located about 12 miles southwest of Seattle, Washington.
We're pretty sure it's *not* pronounced "bot-hell", but it's still funny enough. Maybe bots like
devil in disguise (v. 2.0.1a)
and HellBoundHackerOS
reside there...
-
Here are a few random bots from the last month that interested us:
-
mixi-crawler/2.00
is from Japan. I spent some time on their site, but my Japanese is non-existent, and even with the help of Google Translate, I could not figure out what the site does
related...
-
ZONGOLBOT the Web Spider is a bot from the UK search engine of the same name
related...
-
flatlandbot/allspark
is a spider from Flatland Industries. To their credit, they provide not only robots.txt support and an email to contact them at, but they also have a phone number to dial if you are having trouble with one of their bots.
related...
-
Two new versions of Zontirbot showed up recently
related...
This brings us to 252,674 user agents and 2,828 bots.
|
2008.11.23 - Bots should be thankful
This thanksgiving, all the bots out there should give thanks for the masses of bandwidth they chew each day at the expense of webmasters around the world.
If our logs are any indication, few bots actually bring enough human traffic to make up for their crawls, so there's alot of free-loading bots out there.
Anyways, we had over 200 new faces, and here are the ones we found interesting:
This month brings us to 221,187 user agents and 2,508 bots. Thanks for visiting!
|
2008.10.22 - Would you say I have a plethora of new bots?
We have a new category and over 100 new bots this month. Let's cut to the chase:
-
In honor of Google's new browser "Google Chrome", we have created a new category to classify user agents from this browser.
To date, we have 47 user agents in this category, and growing. Be sure to bring your Google Chrome user agent by for a spin to add it to our category.
related...
-
Yacybot
got busy on our web logs this month showing us 13 new user agents. Variations include OS details, version numbers, and processor architectures.
related...
-
While Yacybot
got busy with our logs, DomainCrawler/1.0
got busier, bringing in 34 new user agents this month.
The variations on this bot are not as interesting - the user agent just varies from each domain by adding the domain to the user agent.
got busy on our web logs this month showing us 13 new user agents. Variations include OS details, version numbers, and processor architectures.
related...
-
We recorded some of the most interesting Nutch bots ever this month -
-
Some students at some university unleashed some bot on the web, but we may never know who / what / why this bot even exists:
University_Bot_Beta_1 (compatible; MSIE 6.0;)
Here's a helpful hint: put more information on your user agent if you want notoriety for your bot.
related...
-
Yet another script injection hack attempt - this one triggers a JavaScript alert to tell you your site has been "HACKED" -
<script>alert("HACKED")</script>
related...
-
A couple of Pathtraq bots dropped in, Pathtraq/0.1
and Pathtraq/0.1 Gungho/0.09007.
It's worth noting that the second one is Gungho.
related...
-
Finally, BitvoUserAgent made its debut in our logs.
This bot is a media crawler that is downloadable and available to the public.
related...
This installment brings us to 212,592 User Agents and 2,376 bots.
|
2008.9.6 - 200,000 User Agents!
The last month was very exciting, since we broke 200,000 user agents!
It was almost exactly a year ago that we broke 100,000 user agents, and almost 2 years since we broke 50,000 user agents.
How the time flies by.
Here are some bots that seemed interesting to us this month as we head further into September:
-
The Internet Ninja 6.0 leads off the list for this month of new bots.
However, a good Ninja would visit a web site and not leave footprints behind in the weblogs.
related...
-
This bot has some of the best movie references - they managed to squeeze "Terminator" and "2001 : Space Odyssey" references into one user agent:
HAL 9000; Cyberdyn X3; T1000 autonomous
related...
-
Another new bot, CydralSpider/3.0, crawls for a "Visual Search Engine" named Cydral.
related...
-
The ShrinkTheWeb.com Crawler v1.0
bot crawls pages on the web to produce a thumbnail of each page.
According to their site, "ShrinkTheWeb is the most powerful, free website thumbnail provider".
In the fine print, they allow up to 250,000 free thumbnails per month for free, and pay services extend beyond that.
related...
-
We had 2 toasters drop in - the first one,
WebToaster V0.9 Alpha,
was nice enough, but the second one attempted HTML injection on our site to hijack a free link:
</a><a href='http://www.webtoaster.com'>WebToaster</a> - WebToaster V0.9 Alpha.
Shame, shame, we know your name, and now everyone that reads our site does too.
related...
-
Nutch made a big splash in our logs this month with quite a few new user agent variations: related...
-
We also got some new curls this month - Perl interface for libCURL,
curl/7.16.4,
curl/7.18.0, and
curl/7.15.3.
related...
-
Here's the rundown on some other interesting bots that dropped in over the last month:
We are now at 200,000 user agents and 2,194 bots.
|
2008.8.3 - Here comes August...
The last couple of weeks have been pretty routine around here.
In our logs, we've sifted through a growing number of script injection hacks via user agent, some if which are getting quite creative with their HTML markup and JavaScript technique.
We've also seen some new bots, and some old bots that are very active as of late.
Here's what we've seen:
-
In close relation to the robot WALL-E, his cousin swish-e turned up in our logs this week.
Instead of cleaning up 700 years of trash on what is left of Earth, swish-e is an open source system for indexing web pages.
The acronym "swish-e" stands for "Simple Web Indexing System for Humans - Enhanced".
related...
-
More and more user agents for the Nintendo Wii have been showing up lately, so we decided to create a new category for this.
The Nintendo Wii category
is our latest category tracking user agents, so check it out to see any WII-related user agents that we've encountered in our logs.
related...
-
We've noticed a great deal of activity recently from Russian search engine bot Yandex
- in particular, from IP 77.88.25.28.
They hit our sites over 15,000 times today, but the concentration per domain name and over time intervals was not quite enough to put them in range of being banned.
In the two years that we've been tracking this bot, we've never seen the traffic rates this high.
We'll keep a close eye on them over the next few weeks, as this trend may affect our readers as well.
related...
-
Several of our consulting clients as of late have needed help ridding their sites of SQL injections.
One thing that we have noticed in common when running our cleanup tools on their databases is that most of them end up with a table on their database called "t_jiaozhu".
After Googling the term, we have found countless others with the same story - SQL injection hack, table created with weird name "t_jiaozhu".
The point is, make sure your site is SQL injection attack-proof.
If you think you may have been hacked or just aren't sure, check your database for the table "t_jiaozhu".
Depending on your web architecture, be sure that all SQL calls are scrubbed either through common framework level cleansing or by home-grown means.
Once you have secured your database from SQL injection, make sure you don't forget to protect yourself from script injection attempts as well.
We've documented many of these script injection attempts that appear in user agents here.
Anyways, off the soap box and back to the bots!
related...
-
BobCrawl/Nutch-0.9 is a new form of Nutch that appeared in our logs, claiming to be a "Test/Development crawler".
On a side note, in an effort to inform us that its URL and email are not available, they mis-spelled this in the user agent and put in "notavalable".
Leave it to us to get caught up in the details.
related...
-
Flatland Industries sent their web spider flatlandbot.
Their website claims that the bot follows robots.txt exclusion standards, so if you don't want them around, be sure to let them know.
related...
-
Here's a strange one - Blubberlutsch/1.0.
There is absolutely no information currently on Google for this user agent.
Results vary from site to site from "Donald Duck" to "Star Wars Attack of the Clones".
My best guess is that it's German slang for something, so in the meantime, we'll call it a bot, and check back on it later.
related...
-
We had our first close encounter / UFO sighting this week -
UFO/77.7 (CoSMoS; Z; Pearl 256; peep) F!R3F0>< P\/\/NS y0!.
related...
-
Another new search engine bot appeared from Isidorus/2.0.
related...
-
Quite possibly the strangest user agent of the year,
Nintendo64/1.0 (SuperMarioOS with Cray-II Y-MP Emulation)
paid us a visit recently.
related...
Opening up August, we have 192,894 user agents and 2,070 bots.
Thanks for dropping by, and remember - only you can prevent injection hacks!
|
2008.7.13 - 2,000 bots!
We set a new milestone this week for our user agent database - we broke 2,000 bots!
This week's logs were comprised mostly of new versions of bots we know about.
There are a few new faces mixed in there too - here's the latest:
-
On the browser side, one new face in our logs is from
iGetter.
This is a new download utility that looks pretty interesting.
related...
-
Thumbshots is another new bot on our site - we witnessed 3 unique bot user agents from them this week:
thumbshots-de-bot,
KFSW/1.0.0.0.4 thumbshot-de-bot, and
KFSW/1.0.0.0.1 ThumbShots-de-bot.
related...
-
Exooba crawler/0.5.1 debuted this version this week in our server logs.
According to their site, they are a testing crawler that uses gathered data strictly for testing purposes only.
Their site also informed us that they fully comply with robots.txt exclusion standards.
related...
-
Here's an interesting one - Daruma/1.08 (Windows 98; U) [en].
We were unable to find anything about this user agent in our research.
When googling this bot, we are the only result on google as of yet.
However, according to Wikipedia, the word "Daruma" is commonly used to describe "Daruma Dolls", also known as dharma dolls.
These are hollow, round Japanese wish dolls, modelled after Zen patriarch Bodhidharma.
This just goes to show that you can learn alot more than techie facts on this blog.
related...
-
CatchBot/1.0 is another new face on the site.
related...
-
Metaspinner/0.01
is a bot from MetaSpinner - yet another search engine crawling the web for data.
related...
-
OmniExplorer_Bot/1.0x Job Crawler
is a new version of the OmniExplorer bot family.
OmniExplorer has a long standing pedigree of bots on our site, and here you can see more of them:
related...
-
Yacybot is another long-standing bot family on our site, and they turned up a new version this week as well
related...
-
We posted about our first OOZBOT sighting last week, and now we have a new user agent from them as well:
OOZBOT
related...
Mid July is approaching fast, and we've got 187,746 user agents and 2,018 bots - thanks for visiting!
|
2008.7.6 - Dog Days of Summer Bots
We had quite a few new faces in the logs last week, from established sites like Yahoo! all the way to some new startup search engines that are sending their bots out into the wild to harvest searchable content.
Here's the high points of our week:
-
Yahoo! has a new YahooSeeker-Testing/v3.9 bot that dropped in for a visit.
related...
-
TheRarestParser/0.4b is a new version sent from TheRarestBlog.
This bot crawls web pages to find the rarest words used on the web.
related...
-
Watch out Google - here's two startup search companies that are crawling the web -
OOZBOT/0.17 from Setooz - their site is currently "Under Construction", but a search engine will likely appear there someday.
related...
Zook.in looks like a startup that sent their Knight/0.2 bot by for a visit.
From visiting their site, it looks like they aren't quite ready for prime time yet since there isn't really a site there yet.
related...
-
Slurpem Search sent a bot informing us that they are "Sucken Up The Web".
I hope Slurpem is thirsty, because there's alot of web out there to suck up.
related...
-
Firebat 2.7.13 is a new bot on our site as well.
Firebat does not have a good reputation on the web from what our research has turned up.
They are mentioned in many places with large denial of service attacks and several other shady activities.
related...
-
EmailWolf 1.00 prowled our sites looking for emails to chew on, but we don't post emails, so the email wolf went hungry.
related...
-
Someone or something at IP 84.0.233.218
was kind enough to attempt a script injection
to call a javascript alert telling the calling browser of XSS (Cross Site Scripting) vulnerabilities on the site.
Of course, we are not vulnerable to XSS, but nice try.
This puts us at 75 unique script injection attempts.
Make sure when you are viewing your web stats that you do not allow scripts to be executed from the user agent, or they could easily snipe your cookie and hack your web stats portal.
related...
We're heading into the second week in July now at 185,694 user agents and 1,993 bots - thanks for visiting!
|
2008.6.23 - Another one bites the dust
Just a quick update - we had a few new interesting bots, and another menacing crawl from Wget...
-
Wget/1.11.3
returned with a vengeance today, this time from IP
131.107.0.106.
They burned through 66,000 page views in just a few hours.
As usual, we banned this IP and life will go on. related...
-
SearchHub Bot
is a bot sent from inSynchro, a Malaysian Business Intelligence company established in 1999 that specializes in developing web-based enterprise solutions specifically in the area of Project Management.
related...
-
Thanks to the Google "Translate this page" feature, I was able to scout out
Semager/1.2.
Their site is in German, and from what I can tell in the English translation, the Semager bot feeds the Semager Search Engine.
Nothing sinister here - just another search engine bot. We do have a few Semager user agents in stock -
related...
Bottom line: 182,499 user agents and 1,959 bots.
|
| More... |
|
|