What's the user-agent header? When a web browser requests a page from a web server, it sends along a packet of additional information about the request. One of these is a "user-agent" string that tells the server what kind of web browser it is.
At least it was supposed to.
Let's look at some example user-agent strings:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.1.4322)
Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.1 (like Gecko)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
msnbot/0.9 (+http://search.msn.com/msnbot.htm)
XmlRssTimingBot/2.03 (libwww-perl/5.800)
The first several are different kinds of web browsers; the last 4 are spiders.
By now you should be wondering: Who's Mozilla?
Mozilla was the codename for the original web browser to end all web browsers, Netscape. There were earlier web browsers like Mosaic, but Netscape is what really made the World Wide Web popular.
Whenever a Netscape made a request for a web page, it send along a user-agent header identifying itself as "Mozilla" followed by a slash, and its version number.
Not soon after that, Microsoft decided to make its own web browser, which we know today as Microsoft Internet Explorer. For some reason, some dark, hooded person at Microsoft decided it would be better for Internet Explorer if it misidentified itself as Netscape. Internet Explorer, therefore, sent a header calling itself "Mozilla/4.0 (compatible; MSIE 3.0)". Since then, user-agent strings have become almost worthless. It's now very difficult to tell what kind of browser people are using.
I'll tell you what browsers I think these user-agents represent:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7 -- Firefox running on Windows XP
Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) -- The Internet Explorer 6.0 edition used through AOL
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98) -- a very old web browser; Internet Explorer 5.5 running on Windows 98
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8 -- The Safari web browser, running on an Intel Mac
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.1.4322) -- Internet Explorer 6.0 running on Windows XP with the "FunWebProducts" toolbar installed
Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.1 (like Gecko) -- the Konqueror web browser running on Linux
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) -- Google's spider
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) -- Yahoo's spider
msnbot/0.9 (+http://search.msn.com/msnbot.htm) -- MSN's spider
XmlRssTimingBot/2.03 (libwww-perl/5.800) -- some custom spider someone built
As you can see, there is
no standard format for user-agent strings. In addition, different toolbars and/or spyware will add themselves to browsers' user agent strings. This makes automatic parsing of user-agent strings
very complicated and many programs may not do it very well.