Explore Technorati: Technology / Business / Entertainment / Lifestyle / Politics / Sports / Boating / Motorcycles / Celebrity / IT / Film / Music / Advertising

Technorati Blog

Blogosphere

We’re asked all the time how platforms like Twitter and Facebook are impacting the Blogosphere. This is what we’re seeing: Twitter is not replacing blogs, but it has evolved as a major awareness vehicle for bloggers and people who read blogs (same goes for Facebook status updates).

So the idea behind Twittorati was to allow people to view the topical geist that the most influential bloggers are tweeting and blogging about.

We say this is where the Blogosphere meets the Twittersphere, but what does that mean? Twittorati shows what top bloggers are tweeting about, and how these trends compare to Blogosphere trends. You’ll be able to filter tweets by topic, see the most tweeted blog posts, and compare leading Blogosphere and twitter trends.

We’re featuring the Technorati Top 100 Bloggers at launch, but we’re going to expand very soon to include the many more authors in the active Blogosphere. You can find the Twitter content most relevant to them, compare the day’s top blog and hash tags to see the hottest topics in both spheres, as well as see the most popular links that are being tweeted and which blogs are linking to them. You can also really dig into the information source: writer pages display each tweeter’s blogs and Twitter information and Technorati Authority.

Please reach out and let us know what you think
… And special thanks to Sawhorse Media, publisher of Muckrack.com and VentureMaven.com, who helped us produce the site.

The Technorati Attention Index: These are the top sites with highest number of blogs linking to them in the past 30 days. This time around, in addition to rank, we've added Attention numbers. Attention is the number of blogs (not the number of links) that have linked to the site in the past 30 days. Here are the mainstream media gainers and losers in the blogosphere:


New to the top 50

The Dallas Morning News

San Jose Mercury News

Star Tribune


Out of the top 50

US News & World Report

Rolling Stone

Christian Science Monitor

International Herald Tribune (now part of nytimes.com)


5 biggest gains in Rank

PBS

The Houston Chronicle

Google News

NY Post

Slate


5 biggest losses in Rank

The Economist

Chicago Tribune

The White House

Financial Times

Newsweek


5 biggest gains in attention

YouTube

The Wall Street Journal

CNN

LA Times

Wired


5 biggest losses in attention

Reuters

Telegraph.co.uk

The Boston Globe

Financial Times

The Economist


Overall Rankings and Attention

 
1.   YouTube 60,644
2.   The New York Times 17,374
3.   guardian.co.uk 8,039
4.   The Wall Street Journal ­7,513
5.   The Washington Post 6,891
6.   CNN 6,330
7.   Telegraph.co.uk ­5,380
8.   Yahoo! News 5,070
9.   MSNBC 5,036
10. The Los Angeles Times 4,536
11. Reuters ­4,314
12. FOX News 4,001
13. The Boston Globe 3,838
14. USA Today 3,619
15. Daily Mail 3,530
16. Time 3,524
17. BBC News 3,399
18. NPR 3,189
19. NY Daily News 2,588
20. Forbes 2,534
21. San Francisco Chronicle 2,420
22. Slate 2,187
23. CBS News 2,156
24.
Google News 2,093
25. Wired 2,062
26. Financial Times 2,056
27. PBS 2,053
28. NY Post 2,025
29. San Francisco Examiner 1,968
30. BusinessWeek 1,949
31. The White House 1,929
32. Salon 1,928
33. Chicago Tribune 1,924
34. Newsweek 1,880
35. CNNMoney 1,712
36. CBC 1,696
37. Yahoo! Finance 1,642
38. The Economist 1,565
39. New York Magazine 1,550
40. philly.com 1,288 
41. The Houston Chronicle 1,120
42. Science Daily 1,093
43. MarketWatch 1,076
44. People 1,066
45. Miami Herald 1,049
46. The Seattle Times 1,049
47. Yahoo! Sports 1,047
48. The Dallas Morning News 939
49. San Jose Mercury News 879
50. Star Tribune 877

After reading the TechCrunch article today, I wanted to shed some light on the Technorati part of the equation. While it's true that the online conversation extends through platforms like Facebook and Twitter, the fluctuations in Technorati Authority are primarily due to improvements we've made to our data.

We have cleaned up our data set and there is definitely an impact on Technorati Authority numbers for many bloggers. We've seen some blogs drop in Authority and others rise.

Last year, we saw a major proliferation of spam blogs (splogs) - which tend to link to well known blogs to drive traffic and to disguise themselves. Since July, we've been putting major effort into purging splogs from our index and we continue to fight the onslaught every day. As we remove bad data and spam, authority numbers shift. Furthermore, as we continue to move our focus toward conversation and attention, links in blog posts become much more meaningful to the conversation than sidebar links.

Just looking at the Technorati Top 100 since February, we've actually seen a lot of the blogs in the lower end of the Top 100 make significant gains in Authority while some blogs at the upper end have slipped downward.

The following table shows re-calculated Technorati Authority from 3 months ago vs. today. Both time periods are calculated against the cleaner Technorati index.

Authority 3 months agoAuthority todayBlog
29,01828,924huffingtonpost.com
15,59416,292engadget.com
17,59015,633techcrunch.com
17,61512,241gizmodo.com
12,44812,473boingboing.net

You can see that 2 really haven't changed much, 1 has gone up and two have dropped. We'll continue our efforts to clean the index and to manage Technorati Authority to the high standard of integrity you expect.

On March 2nd, 2009, Blog, more commonly known as the White House Blog, entered the Technorati Top 100 for the first time at #99.

The blog was started January 20th, 2009. 42 days later it had rocketed into the Top 100. I am pretty certain that is the fastest any blog has ever done that.

As of this writing, it sits at #96 right between Jalopnik and delicious:days, an auto blog and a food blog.

All I can say is, welcome to the Technorati Top 100, Mr. President. Well done.

Technorati Tags: , ,

"Companies should inform, with a real commitment to speed, the conversations among the new influencers — always under way on blogs, in discussion forums, and bulletin boards." 2009 Edelman Trust Barometer

Last week, Edelman released their annual Trust Barometer.
No surprise — consumer trust in business and institutions is at an all time low.

Trust in U.S. business — at 38% down from 58% last year — is the lowest in the Barometer’s 10-year history. 77% of consumers said they refused to buy products or services from a distrusted company. Mainstream media fared worse: only 36% trust TV news, and only 34% trust newspapers. Who DO we trust? Experts and peers.

Where do you find experts and peers? The blogosphere. How to rebuild? The qualities companies need to embrace to succeed in 2009 are inherent to the blogosphere: transparency, word of mouth, and trust.

Bloggers themselves (60%) place the most trust in what they’re hearing in the blogosphere. 90% of them are writing about brands. Right now, bloggers (and their audiences) want straight talk and something to believe in, and they need to know you’re still in business.

If this is the year of getting back to basics, here are the most basic things you can do:

Listen: monitor what’s being said about your brand, your products, your competitors, and your blog posts.

Technorati advanced search
With advanced search you can track who is talking about you and your competitors in the same breath. You can see the blogosphere’s reaction to articles about you, or your blog posts.

Watchlists let you track any subject. Simply give Technorati a few words or website URLs you're interested in, and we'll tell you whenever they're mentioned.

Favorites let you track, organize and filter your favorite blogs.

Tag pages:
You’ll find these when you click on a tag. Or, to automatically generate a tag page about anything, simply enter Technorati.com/tag/ and any topic you can think of. Once you’re there, you’ll see all of the content in the blogosphere around that tag: posts, videos, images and a chart of the tag’s popularity over time. You can even plot multiple tags against each other.

Communicate: – one to one, one to many – but go where the conversations are. Blog, comment, comment back to the people who are commenting on you. Participate in the conversations that are already taking place in the blogosphere every day.

Or just be present in environments of relevance and trust. Advertising in the blogosphere has come a long way and now offers everything from display and rich media to formats such as conversational media, designed specifically to harness the power of the blogosphere.

On October 1st, we’re teaming up to support DonorsChoose.org in their 2nd Annual Blogger Challenge DonorsChoose.org is dedicated to getting our kids the materials, resources and experiences they need to learn. They’re challenging the blogosphere to compete to see who can rally the most support for public schools. Across the blogosphere, bloggers are creating giving pages that list specific classroom requests in public schools--and then encouraging their readers to donate to those classroom requests.

Technorati is sponsoring the "generosity rankings" – which also means that at the end of the challenge we’ll be broadcasting the results showing which bloggers drove the most generosity. You can see the current giving contest here.

During the last DonorsChoose.org Blogger Challenge, blog readers donated $420,000 toward classroom projects benefitting 75,000 students in low-income communities. This year, the need is even more urgent: the rough road ahead for the US economy means an even rougher road for public schools. With your participation, you and your readers can help thousands of public school kids. It’s easy:

HOW YOUR BLOG CAN HELP

1. Pick a few classroom requests posted on DonorsChoose.org and add them to a challenge page which takes 1-2 minutes to set up.

2. Do a post on October 1 encouraging your readers to donate to any of the classroom requests on each of your challenge page. Your readers can give as little as $5.

3. Publish a widget which pulls in the classroom requests you have selected and shouts out the readers who have donated to those requests. Simply select the category to which your blog belongs to grab the appropriate widget.

If you have additional questions or need help getting started, feel free to contact DonorsChoose.org directly at bloggers@DonorsChoose.org.

BACKGROUND ON THE CHARITY
DonorsChoose.org grew out of a high school in the Bronx where teachers saw their students going without the materials needed to learn. Our website provides an easy way for everyday people to address this problem. Public school teachers post project requests that range from a $100 classroom library, to a $600 digital projector, to a $1,000 trip to the zoo. People like you can choose which projects to fund and then get photos and thank-you letters from the classroom.

BACKGROUND ON THE 2008 DONORSCHOOSE.ORG BLOGGER CHALLENGE
In October of 2007, bloggers competed to see who could rally the most support for public schools via DonorsChoose.org. Blog readers gave $420,000 to classroom projects benefitting 75,000 students in low-income communities. While A-list bloggers like Engadget and TechCrunch inspired great generosity, smaller blogs with really engaged readers generated even more! In fact, it was a personal blog from Brooklyn, TomatoNation who brought in a whopping $100,000.

Thank you so much for your support.

I’m very happy to announce that we released the 2008 State of the Blogosphere report this morning. If you missed my talk at Blog World Expo on Saturday, you can see the study here.

We’ve been publishing this report since Dave Sifry wrote the first one in 2004.
This year, we wanted to go beyond the numbers and deliver deeper insights into bloggers and the state of blogging today. In addition to analyzing the data from the Technorati Index, for the first time, we’ve reached out to the blogosphere to understand the role of blogging in their lives; tools, time and resources used for their blogs; and how blogging has impacted them personally, professionally and financially.

So what did Technorati measure this year and why?

There’s a wide range of estimates of the number of global blogs as well as blog readership (including ours), but all the numbers agree that blogs are a global phenomenon that has hit the mainstream. Further, as the blogosphere grows in size and influence, the lines between what is a blog and what is a mainstream site become less clear. Larger blogs are taking on more characteristics of mainstream sites and mainstream sites are incorporating styles and formats from the blogosphere.

We feel that the real story now lies with the Active Blogosphere. The trends, stories and behaviors here influence not only the rest of the blogosphere but mainstream media as well.

Technorati defines the Active Blogosphere as: The ecosystem of interconnected communities of bloggers and readers at the convergence of journalism and conversation.

So how do we determine who’s active? Some blogs are more integral to the blogosphere than others: How frequently does this blog post? Is this blog linking to others and are others linking to it? Does this blogger post original, opinion, or reactive content? These are all factors that factor into a blog’s authority and determine its place in the active blogosphere.

In short, these are the bloggers that are making the space tick.

The study goes live over the course of this week:

Live today: Overview, and Who are the Bloggers?
Tuesday: The What and Why of Blogging
Wednesday: The How of Blogging
Thursday: Blogging for profit
Friday: Brands in the blogosphere

I was in Chicago last week to participate in ad-tech. The content and speakers struck me as particularly good this time around, with a major focus on social media.

The media shift of the past few years is fundamental – you can’t underestimate this – and it’s critical that brands adapt to life in this new environment. There was definitely an air of urgency on the part of everyone present to figure it all out.

Overwhelmingly, the two main themes I heard were:

Brands need to be part of or at least adjacent to the conversation

Brands need to go where their audiences are versus trying to bring audiences to them

A few highlights and how-tos from the sessions I attended:

The six drivers of brand credibility in social media environments*

  • Trust
  • Authenticity
  • Transparency
  • Affirmation
  • Listening
  • Responsiveness
The commitment needs to permeate the entire company, not just the marketing organization.


The conversation is less about brands and more about the issues and topics that surround brands, or that are passion points for the audiences of those brands.

Every brand is different: You might need to blog, you might need to listen and interact or you might simply need to be present alongside the conversation.

Speaking of execution:

The microsite was declared dead. Rising up in its place are media that function as the microsite, but do it one better by putting that content and interactivity where your audiences ARE: conversational ads and channels, widgets.

Even the most universally loved brands have their critics. Look at this new era not as a problem to solve but as an unprecedented opportunity to truly know what people think about you, and to engage with them.

The long tail is where you find influence. Even if a blogger has a relatively small number of followers, the level of influence and trust is exponentially higher than with large, mainstream media

And finally, don’t wait for a crisis to get started. The case studies are there: conversational strategies are working.


“We’re not serving them dinner anymore, we’re at the dinner party.”
- Richard Binhammer, Dell, Inc

*Pete Blackshaw, EVP of Nielsen Online

Nowhere have we seen a bigger impact of blogging and social media on the American political landscape than on the 2008 presidential election. Candidate appearances formerly confined to a small town are uploaded to YouTube and seen by millions. Conversations once shared by small groups spread instantly and globally. Facebook and MySpace are as important as New Hampshire and Iowa.

According to Yahoo, 51% of internet users will turn to blogs to gather information and communicate about politics. Citizen journalists are the ones posting the stories that break through the campaigning and ask the hard questions.

Authenticity is what plays with this audience. Spread misinformation or spin, and more than 30,000 political blogs (tagged politics in the Technorati index) are ready to call foul.

There's a brilliant application of Technorati data over at Tech President. (Disclosure: co-founder Micah Sifry's brother David Sifry is Technorati's founder.)

View Technorati election data profiles.

Taking a pulse of the blogosphere today, what do the numbers tell us? (Keep in mind that Technorati is indexing in real time, so the numbers can vary even by a few minutes.)

Barak Obama has pulled into the lead – in terms of attention in the blogosphere. If 2008 is truly the social media election, as has been posited, all signs point to yes. As the only Republican candidate, should John McCain be benefitting with a focus in attention – or will he rebound once the Democrats have picked a candidate?

Simple and telling: the tag cloud on Technorati's politics page.

Technorati Tag Cloud

Hilary Clinton

English posts that contain Hillary Clinton per day for the last 30 days.
Technorati Chart
Get your own chart!

Barak Obama

English posts that contain Obama per day for the last 30 days.
Technorati Chart
Get your own chart!

John McCain

English posts that contain McCain per day for the last 30 days.
Technorati Chart
Get your own chart!

Hey, it's that time again, time to slow down, take a deep breath, and dig into the data!

About this Report, and the Obligatory Plug for Technorati

Technorati is known widely for its quarterly State of the Blogosphere reports, analyzing the trends around blogs and blogging. With this report, we expand on this tradition by introducing information and analysis relating to the broader range of social media on the Web -- what we and many others call the Live Web (another good definition). Technorati continues to grow well beyond its roots at the leading blog search engine; increasingly, we are the main aggregation point for all forms of social media on the Web, including blogs, of course, but also video, photos, audio such as podcasts and much more.

What makes this possible is the rise in the use of tags across all forms of social media and the increasing implementation of tags by the publishing platforms supporting each form of media. Increasingly, tags have become a lingua franca of Live Web, helping to categorize social media while also indicating where people’s attention might be at any given moment. But because each form of media is published from unique platforms with their own established communities, the audience found itself hopping from platform to platform to get a sense of what might be hot at any given moment. Which is why our social media aggregation service -- made manifest on our tagged media pages -- is growing at a torrid pace.

While we still have substantial reporting on the the State of the Blogosphere, we now expanding the report to provide information about the State of Tags. Admittedly, the information we have on this new area of focus for our report isn’t as deep or as expansive as our State of the Blogosphere, and we expect that over time, this and other new sections will expand, but we believe this is a good first step in trying to provide a more comprehensive snapshot of the Live Web.

OK, on to the numbers!

The State of the Blogosphere

The state of the Blogosphere is strong, and is maturing as an influential and important part of the web.

For nearly four years, we’ve been tracking and enabling the growth of this phenomenon and theirs is much in our data to indicate that the medium is “growing up.”

Slide0005

Technorati is now tracking over 70 million weblogs, and we're seeing about 120,000 new weblogs being created worldwide each day. That's about 1.4 blogs created every second of every day.

Slide0007

Spam and splogs (spam blogs) continue to be a problem in the blogosphere, and there was a marked increase in splogs that coincided with the holiday season last year. Technorati has been tracking between 3,000 - 7,000 new splogs created each day, but there was a significant spike in splog creation during early December, when we tracked over 11,000 splogs created each day during December - a total of 341,000 splogs that we removed from our indexes during that period.

Fortunately, spam rates have decreased somewhat since then, as blog hosting providers have responded to the issue during the months of January and February. My personal take on the issue of spam is that all healthy ecosystems have parasites - the only question is whether or not the system is structurally vulnerable to being overwhelmed. Thankfully, because of the accountability that is built into the web itself (the URL structure is fundamentally accountable), I believe that while the vulnerability of the live web to spam is real, it is managable.

Slide0006

Since our last State of the Blogosphere report in October 2006, we’ve seen a slowing in the doubling of the size of the blogosphere. This shouldn't be surprising, as we're dealing with the law of large numbers - it takes a lot more growth to double from 35 million blogs to 70 million (which took about 320 days) than when it doubled from 5 million to 10 million blogs (which took about 180 days).

We also see a slowing in growth in the rate of posts created per day; while there are spikes in blog posts during times of significant world crisis -- for instance, last summer’s conflict between Israel and Hezbollah -- the overall trend is that posting volume is growing more slowly, at about 1.5 million postings per day. That's about 17 posts per second. In October 2006, Technorati was tracking about 1.3 million postings per day, about 15 posts per second.

Slide0008

Popularity of Blogs vs. the MSM

Slide0010
Slide0011

In previous reports, we looked at the popularity of mainstream media compared to blog sites. One interesting item to note in April 2007, the number of blogs in the top 100 most popular sites has risen substantially. During Q3 2006 there were only 12 blogs in the Top 100 most popular sites.

In Q4, however, there were 22 blogs on the list -- further evidence of the continuing maturation of the Blogosphere. Blogs continue to become more and more viable news and information outlets. For instance, information not shown in our data but revealed in our own user testing in Q1 2007 indicates that the audience is less and less likely to distinguish a blog from, say, nytimes.com -- for a growing base of users, these are all sites for news, information, entertainment, gossip, etc. and not a “blog” or a “MSM site”.

Further, there is a wider diversity of languages represented here, specifically Farsi with TodayLink.ir, Persian Blog Fans Club, and Giliran.com making the Top 100. More on that in a moment, as we discuss the international growth of the Blogosphere.

The Global Blogosphere

Slide0013

In terms of blog posts by language, Japanese retakes the top spot from our last report, with 37% (up from 33%) of the posts followed closely by English at 36% (down from 39%). Additionally there was movement in the middle of the top 10 languages, highlighted by Italian overtaking Spanish for the number four spot.

The newcomer to the top 10 languages is Farsi, just joining the list at #10. It has been very interesting to watch the growth of the blogging world in the middle east, especially in countries like Iran, and it is reflected in the language distribution above.

Slide0015
Slide0014
Slide0016

English, Japanese and Chinese look almost identical to our last report in their posting distribution. With Italian overtaking Spanish, we get to see another language with a different distribution, which contrast both the extreme geographic correlations of the Asian languages and the relative lack of geographic correlations of English. Again it would appear that both English and Spanish are more global languages based on consistency of posting through a 24 hour period, whereas other top languages, specifically Japanese, Chinese, and Italian, are more geographically correlated. It would also appear that a significant number of people who are blogging are doing it during work hours.

The State of Tags

The explosive growth that we see in the Technorati index is mirrored in social media sites throughout the Web, including Flickr, YouTube, and the like. This shared phenomenon allows us to marry the wealth of information in our index with the wealth of that stored on social media sites across the Live Web through the shared construct of tags.

For the uninitiated, a tag is a category or descriptor that someone (often the creator) assigns to it . This descriptor literally hangs off the media that’s published to the Web much in the same way a luggage tag hangs off your suitcase -- easily identifying the bag.

The bottom line: we’re seeing explosive growth in the tags index. People are clicking on tags, people are using tags, Google features tagged media in its results pages. Tags adoption has become a phenomenon across the Live Web, and we are seeing a correlative explosive growth at Technorati.

On to the numbers:

Slide0023

Technorati is now tracking over 230 million posts using tags or categories, and the number of people who are using tags is growing:


Slide0022

As of February 2007, About 35% of all posts Technorati tracks use tags.

Slide0024

The number of bloggers that are using tags is also increasing month over month. About 2.5 million blogs posted at least one tagged post in February 2007.

Growth and Maturation

Back in 2002 when Technorati started tracking the blogosphere, social mores and community practices were still forming, and its growth was primarily through the written word. It was a fledgling medium that was initially reviled, then feared, and, now, embraced as mainstream.

The blogosphere started well before Technorati was founded, and its growth was fostered by many people and organizations that brought openness and cooperation to the medium. One of those people, Dave Winer, just celebrated the tenth anniversary of his weblog. Given this auspicious anniversary, I wanted to give my thanks and support to Dave and to all of the other early pioneers in the world of blogging, RSS, and the Live Web. Without Dave's efforts, the web wouldn't look the way it does today. His creation and support for systems like weblogs.com and open formats like RSS were critical in building the early infrastructure that Technorati relies upon and helps to support.

Thanks, Dave!

Wrapping it all Up

As a result of this work and the cultural mores of openness, we also have photo sharing, podcasting, online music publishing, online video publishing, user-generated games, and, increasingly, we have structured data-sharing such as upcoming events. All of this seething, lively activity constitutes the Live Web and Technorati is its hub -- thanks in large part to the growing use and ubiquity of tags. Through the social constructs of tags, we help people find unique voices and points of view. We also help social media publishers to find the people formerly known as their audience. And they all converge, as a result, on Technorati.

We’re proud of this position, of course, but also humbled by the responsibility it imposes.

As we continue to bring more and more of the Live Web to the fore, and to organize it and present it in ways that are useful, entertaining, and informative to you all, I hope you’ll continue to tell us your opinions (as if I could stop you!) and provide us your guidance. Our credo has been and will always remain: “Be of Service.” Your voice helps us to do this, so please continue to tell us what we can do better.

In summary:

  • 70 million weblogs
  • About 120,000 new weblogs each day, or...
  • 1.4 new blogs every second
  • 3000-7000 new splogs (fake, or spam blogs) created every day
  • Peak of 11,000 splogs per day last December
  • 1.5 million posts per day, or...
  • 17 posts per second
  • Growing from 35 to 75 million blogs took 320 days
  • 22 blogs among the top 100 blogs among the top 100 sources linked to in Q4 2006 - up from 12 in the prior quarter
  • Japanese the #1 blogging language at 37%
  • English second at 33%
  • Chinese third at 8%
  • Italian fourth at 3%
  • Farsi a newcomer in the top 10 at 1%
  • English the most even in postings around-the-clock
  • Tracking 230 million posts with tags or categories
  • 35% of all February 2007 posts used tags
  • 2.5 million blogs posted at least one tagged post in February

Getting All the Reports

You can get all of the State of the Blogosphere and State of the Live Web reports, going back my first report in October 2004 at http://www.sifry.com/stateoftheliveweb/ All of this material is licensed under a creative commons for-attribution license, and all I ask in addition is that you please keep the Technorati logo and links to the original reports in any use of the charts or data.

Technorati Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

Today's screencast addresses an oft-overlooked feature on Technorati...the browser button! Browser buttons are little shortcuts that live in your browser's toolbar so that you are just one short click away from performing an action. Technorati's browser buttons let you favorite a blog while you're reading the blog, or do a search from a blog to see what other blogs are linking to it. But why am I even typing this out, when you can just watch it happen with your own eyes!



What are browser buttons?

There's been a blogosphere discussion about the River of News approach to keeping up with blogs - showing recent posts from all the blogs you read, newest first.

Dave Winer likes this approach and says other reader models force the user to delete articles to get them out of the way. To me this is dissonant, why would I want to delete someone else's article? In fact I want to keep them all so I can search them.

Dave, do try out Technorati Favorites It will import your OPML, and lets you restrict searches to your favorite blogs, as well as showing a river of news.

Robert Scoble wants to put blogs in folders, which you can do by tagging your favorite blogs, so you can search or read a subset too. Give it a try, and let us know what you think.

Those of you paying attention to the Technorati 100 will have noticed that it is getting more international, due to the explosion of non-English blogs as Dave noted in State of the Blosphere. With today's update, the number one spot has changed from Boing Boing by Xeni Jardin and friends to 老徐 徐静蕾 新浪BLOG by Xu Jing Lei.
Evidently a name starting with X is a big help - perhaps Xiaxue will be next?

Yes, another quarter has passed, and it is time to take a look at the numbers!

For historical perspective, you can see earlier State of the Blogosphere reports from February 2006, July 2005, from March 2005, and from October 2004.

The State of the Blogosphere is strong.

I continue to marvel at it, but the blogosphere continues to grow at a quickening pace. Technorati currently tracks 35.3 Million weblogs, and the blogosphere we track continues to double about every 6 months, as the chart below shows:

Slide0002-3

The blogosphere is over 60 times bigger than it was only 3 years ago.

New blog creation continues to grow. Technorati currently tracks over 75,000 new weblogs created every day, which means that on average, a new weblog is created every second of every day - and 19.4 million bloggers (55%) are still posting 3 months after their blogs are created. That's an increase both absolute and relative terms over just 3 months ago, when only 50.5% or 13.7 million blogs were active. In other words, even though there's a reasonable amount of tire-kicking going on, blogging continues to grow as a habitual activity.

In addition to that, about 3.9 million bloggers update their blogs at least weekly. Here's a chart of the number of new blogs created each day, from January 2004 to April 2006:

Slide0003-6

Spam, Splogs and Spings

Spam blogs and their cousins Spings (which I described in January's report) continue to present infrastructure providers like Technorati a challenge, as more people rely on understanding the real-time web There has been an increase in the overall noise level in the blogosphere during 2006, but aside from a few notable spam storms ("sporms"? Just how far can you take this naming system?) noted in red in the chart above, the high level of interesting, original content being created greatly outweighs the fake or duplicate content listed on splogs.

Posting Volume

A better indicator of the growth of the blogosphere than simply the number of new blogs created each day is the rate of postings to those blogs. Daily Posting Volume tracked by Technorati is now over 1.2 Million posts per day, which is about 50,000 posts per hour. The blogosphere also reacts to world events. I've pointed out a number of the spikes in posting volume that have accompanied major news events in the chart below of posting volume:

Slide0004-6

I wasn't able to identify all of the spikes, but I did find some of the notables. For example, it certainly appears that technology product launches attract great interest in the blogosphere - seems that we just can't restrain our inner geekiness when products like the iPod Video and the Intel Macintoshes were launched. Posting volumes on those two days even eclipsed blog coverage and commentary of the Superbowl and the 2006 State of the Union speech.

In summary:

  • Technorati now tracks over 35.3 Million blogs
  • The blogosphere is doubling in size every 6 months
  • It is now over 60 times bigger than it was 3 years ago
  • On average, a new weblog is created every second of every day
  • 19.4 million bloggers (55%) are still posting 3 months after their blogs are created
  • Technorati tracks about 1.2 Million new blog posts each day, about 50,000 per hour

Next: The growth of tagging, and the Blogosphere broken down by language

In Part 1 of the State of the Blogosphere report, I covered the overall growth of the blogosphere. Today I'm going to cover the growth of the blogosphere as media, and discuss some of the emerging trends that deal with handling information overload. In a world of over 50,000 postings per hour, and over 70,000 new weblogs created each day, keeping on top of and in tune with the most interesting and influential people and topics is the new frontier beyond search. I've also got some surprises for you at the end of this post, two new features that I hope you'll find useful. But first, let's get our hands dirty in the data!

MSM vs. Blogs

To start, let's look at how attention has been shifting in the blogosphere. In the chart below, the top news and media sites are charted according to the number of bloggers linking to them, and clearly, people are still paying a lot of attention to mainstream media stalwarts like The New York Times, CNN, and The Washington Post.

Blogs and MSM

For these sites, which sit on what I call The "Big Head" of the curve (as opposed to the now-famous "long tail," four blogs -- BoingBoing, Engadget, PostSecret, and Daily Kos -- show up. This may look a bit smaller than the data of last August, but a quick look a bit further down the tail starts telling a more interesting story (Note that I've flipped the axes so that you can see more data):

Blogs and MSM

As you continue down the media attention curve past the "big head", that the number of blogs starts to grow.

The Long Tail

The chart below shows the attention curve once you get past the blogs that look just like mainstream media above. It is important to note how long the long tail really is: this chart at this scale doesn't show it - the long tail of the blogging world goes out to 27.8 million blogs. To give a sense of scale, if this chart was kept to the same scale and I printed out the additional sheets necessary on regular 8.5 x 11 inch sheets of paper in landscape mode to show the entire long tail, the length of the complete graph would be about 120 pages long, making the entire chart about 110 feet long!

Blogs by authority
Movement along the curve

With so may blogs and bloggers out there, one might think that it is a lost cause for new bloggers to achieve any significant audience, that the power curve means that there's no more room left at the top of the "A-List".

Fortunately, the data shows that this isn't the case.

Thanks to the Wayback machine, here's a look at the Technorati Top 100 as it appeared on November 26, 2002 (bear with me if the wayback machine is slow). Then look at it as it appeared on December 5, 2003. And again on November 30, 2004. And again on April 1, 2005. And now look at it today.

Let's take a few examples. Have a look at PostSecret. It is the #3 site on the Technorati Top 100 today, with over 12,000 sites that have linked to it in the last 180 days. It didn't even exist on the chart in April of 2005. Or look at The Huffington Post. It is #5 on the Top 100. It too, didn't exist on the chart in April of 2005. Or look at the #47 blog in April, 2005 Baghdad Burning. This blog still is regularly posting, but has fallen to #304.

This should not be meant to imply that there are no network effects, or that a power law relationship doesn't exist in the Blogosphere. Of course there are network effects. But I want to go a level or two deeper than just thinking about the blogosphere as an A-List and The Long Tail -- for that's far too simplistic, and leaves out some of the most interesting blogs and bloggers out there.

The Magic Middle

This realm of publishing, which I call "The Magic Middle" of the attention curve, highlights some of the most interesting and influential bloggers and publishers that are often writing about topics that are topical or niche, like Chocolate and Zucchini on food, Wi-fi Net News on Wireless networking, TechCrunch on Internet Companies, Blogging Baby on parenting, Yarn Harlot on knitting, or Stereogum on music - these are blogs that are interesting, topical, and influential, and in some cases are radically changing the economics of trade publishing.

At Technorati, we define this to be the bloggers who have from 20-1000 other people linking to them. As the chart above shows, there are about 155,000 people who fit in this group. And what is so interesting to me is how interesting, exciting, informative, and witty these blogs often are. I've noticed that often these blogs are more topical or focused on a niche area, like gardening, knitting, nanotech, mp3s or journalism and a great way to find them has been through Blog Finder.

Explore: Dealing with Information Overload

Given that there's a lot of interesting topical posts by influential or authoritative bloggers in those topic areas, we formulated an idea: Why not use these authoritative bloggers as a new kind of editorial board? Watch what they do, what they post about, and what they link to as input to a new kind of display - a piece of media that showed you the most interesting posts and conversations that related to a topic area, like food, or technology, or politics, or PR. The idea is to use the bloggers that know the most about an area or topic to help spot the interesting trends that may never hit the "A-list". We call this new section Explore, and we've seeded it with some of the most interesting topics that we could find. But one of the nice things about Explore is that there are no gatekeepers, and that anyone who writes interesting topical blog posts can get included simply by tagging his blog and tagging his posts.

It's still pretty new, and occasionally an irrelevant post or two sneaks into the display. We're working on fixing that, but one of the new features we're launching today is the ability to subscribe to a RSS feed of any explore category, so you can now read the most interesting posts via your favorite newsreader.

These middle tier blogs also define communities of interest in the blogosphere. Its easy to think of the blogosphere as a cacophony of voices spread out over a big long tail distribution. But Blog Finder and Explore help resolve these thousands of blogs into topical, relevant communities of interest that interlink, refer to one another and often wrestle with ideas, discuss them and move them along. People often ask, "what blogs should I read?" And often times a good answer is, "you should read the posts from the leading blogs in topics that of interest you. Blog Finder and Explore make this possible for the first time on a wide variety of topics--- and in so doing we hope will the blgosphere more approachable, useful, and comprehensible to more people than ever before.

Filter By Authority: Giving YOU the power to tune your searches

There's one more big feature that I wanted to write about tonight, our new Filter By Authority feature. You can see this on all keyword search results pages, looking like this:

Explore

Clicking on the green slider allows you to easily refine your search results to show greater or fewer matching blog posts. For some searches, you might want to pick and choose only posts from blogs that have been around a while and are highly influential - so pick "a lot of authority" as shown above. I've found this great for searches on highly trafficked topics, like "George Bush" or Olympics, or on topics that are known to get a lot of spam, like mortgage or refinance. I find that it often helps me to also answer the question, "Who is the most influential blogger talking about ___ this week, and what did she say?"

Clicking lower on the slider gives you the ability to see how different levels of filtering affect your search results. For my ego feeds, I always want to see every single mention, so I turn off filtering for those feeds. I also love looking at the charts on the left-hand side of each search result to see what changes when I change the filter, too.

As we implemented this feature, we spent a lot of time thinking about how to name it. We frequently use the term authority on our site when we talk about inbound links, as in "a link is a vote of authority." So to maintain consistency we called this new feature, " sort by authority." But in no way should this imply a value judgment. More authority doesn't necessarily mean more good or more interesting. In many instances, less authority yields more interesting results: a greater diversity of opinion, less mainstream thinking, more individual voices. The authority filter is a tool to fine tune results, and its a great way to zoom in on the voices that are commanding the most attention, and then zoom back out and listen to the whole diverse medium that is the blogosphere. With so many voices we're happy to add a new tuning control!

This new feature is a beta feature, so we're looking for your feedback! Do you like it? Find it useful? Or is it confusing? What about the name? We tried a number of different names for the feature, but ended up picking "filter by authority" since we speak about a blogger's authority as being based on the number of links he gets from other people, but it isn't a perfect analogy. In the end, we decided that rather than having the perfect name, we'd much rather get the feature out there for all of you to try, and we'd listen intently to your feedback and comments.

In Summary
  • Blogging and Mainstream Media continue to share attention in blogger's and reader's minds, but bloggers are climbing higher on the "big head" of the attention curve, with some bloggers getting more attention than sites including Forbes, PBS, MTV, and the CBC.
  • Continuing down the attention curve, blogs take a more and more significant position as the economics of the mainstream publishing models make it cost prohibitive to build many nice sites and media
  • Bloggers are changing the economics of the trade magazine space, with strong entries covering WiFi, Gadgets, Internet, Photography, Music, and other nice topic areas, making it easier to thrive, even on less aggregate traffic.
  • There is a network effect in the Technorati Top 100 blogs, with a tendency to remain highly linked if the blogger continues to post regularly and with quality content.
  • Looking at the historical data shows that the inertia in the Top 100 is very low - in other words, the number of new blogs jumping to the top of the Top 100 as well as he blogs that have fallen out of the top 100 show that the network effect is relatively weak.
  • The Magic Middle is the 155,000 or so weblogs that have garnered between 20 and 1,000 inbound links. It is a realm of topical authority and significant posting and conversation within the blogosphere.
  • Technorati Explore is a new feature that uses the authoritative topical bloggers as a distributed editorial team, highlighting the most interesting blog posts and links in over 2,500 categories.
  • The new Filter By Authority slider makes it easy to refine a search and look for either a wider array of thoughts and opinions, or to narrow the search to only bloggers that have lots of other people linking to them. This gives you the power to decide how much filtering you want.

Technorati is now hosting GrabPERF, a blog performance monitoring project created by Stephen Pierzchala in a Silicon Valley basement. We've been big fans of GrabPERF's distributed monitoring service and want to make sure it continues to serve the web community. Stephen has posted his own announcement on his blog.

GrabPERF rack

Above is a picture of the old GrabPERF rack, complete with a small wine cellar and cable modem. GrabPERF is now hosted at 365 Main in San Francisco along with Technorati's existing hundreds of servers. The web server and database are now connected to a fatter bandwidth pipe allowing Dave and other stat-loving geeks refresh the page with quicker response times.

GrabPERF monitoring locations are spread throughout the world to emulate the load times and experiences of a variety of users. Technorati numbers reported by the service are collected from these various survey locations throughout the world and are not affected by the new hosting arrangement.

Thanks to everyone who helped keep the service alive and continues to make GrabPERF and web services in general a continued success.

It's been 4 months since last October's State of the Blogosphere report, so it's time to update the numbers! For historical perspective, you can see earlier State of the Blogosphere reports from July 2005, from March 2005, and from October 2004.

The State of the Blogosphere is Strong

OK, I'm paraphrasing from a more famous speech that happened last week, but the truth is that the blogosphere continues to grow at a quickening pace. Technorati currently tracks 27.2 million weblogs, and the blogosphere we track continues to double about every 5.5 months, as the chart below shows:

Weblogs tracked by Technorati

The blogosphere is over 60 times bigger than it was only 3 years ago.

New blog creation continues to grow. We currently track over 75,000 new weblogs created every day, which means that on average, a new weblog is created every second of every day - and 13.7 million bloggers are still posting 3 months after their blogs are created. In other words, even though there's a reasonable amount of tire-kicking going on, blogging is growing as a habitual activity. In October of 2005, when Technorati was only tracking 19 million blogs, about 10.4 million bloggers were still posting 3 months after the creation of their blogs.

In addition to that, about 2.7 million bloggers update their blogs at least weekly. Here's a chart of the number of new blogs created each day, from January 2004 to January 2006:

Technorati new blog creation
Dealing with Spam

There has been an increase in the overall noise level in the blogosphere, most notably in the number of spam and fake pings that are sent - what I call "spings." These spam pings are fake or bogus notifications that a blog has been updated; in some cases, these spings can amount to a denial-of-service attack, and can sometimes account for as much as 60% of the total pings Technorati receives. However, we've built a sophisticated system that mitigates the spings, and helps to keep spam blogs out of our indexes. Beyond that, about 9% of new blogs are spam or machine generated, or are attempts to create link farms or click fraud. Technorati continues to take an ecosystem approach to solving this problem, working closely with other players such as Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Tucows, WordPress and Yahoo!, and there will be another Web Spam Squashing Summit this spring, building on the success of the previous two summits.

A News Cycle Measured in Megahertz

Moving beyond spam, the number of people reaching out and reaching each other continues to grow. Daily Posting Volume tracked by Technorati continues to grow, and the blogosphere also reacts to world events. I've pointed out a number of the spikes in posting volume that have accompanied major news events in the chart below of posting volume:

Daily posting volume measured by Technorati

We track about 1.2 million posts each day, which means that there are about 50,000 posts each hour. At that rate, it is literally impossible to read everything that is relevant to an issue or subject, and a new challenge has presented itself - how to make sense out of this monstrous conversation, and how to find the most interesting and authoritative information out there.

The Continued Rise of Tagging

In January 2005, Technorati launched its tagging service, based on the rel=tag microformat, which is a simple way for bloggers to associate their posts with topics, and to make it easy for people to find interesting posts on a given subject. Today, we have tracked over 81 million posts with tags or categories - and over 400,000 new tagged posts are created every day. The chart below shows the immense growth of tagging in the past year:

Tags tracked by Technorati
Tags for Blogs

There was still a major problem, however - how to easily find the most interesting blogs on the subjects that you cared about. So, in September 2005, Technorati launched Blog Finder, a tags-based way for people to find the most authoritative blogs on a particular subject, allowing bloggers to tag their blogs with the subjects they felt were most relevant for themselves. In 4 months, over 850,000 blogs have been put into Blog Finder, making it the most comprehensive directory of blogs on the web. Over 2,500 categories have already attracted a critical mass of influential bloggers writing about them, from Politics and Technology to Gardening or Erotica. And more are created every day, making it easier for people to find the most interesting blogs in the topics they care about.

In summary:
  • Technorati now tracks over 27.2 Million blogs
  • The blogosphere is doubling in size every 5 and a half months
  • It is now over 60 times bigger than it was 3 years ago
  • On average, a new weblog is created every second of every day
  • 13.7 million bloggers are still posting 3 months after their blogs are created
  • Spings (Spam Pings) can sometimes account for as much as 60% of the total daily pings Technorati receives
  • Sophisticated spam management tools eliminate the spings and find that about 9% of new blogs are spam or machine generated
  • Technorati tracks about 1.2 million new blog posts each day, about 50,000 per hour
  • Over 81 million posts with tags since January 2005, increasing by 400,000 per day
  • Blog Finder has over 850,000 blogs, and over 2,500 popular categories have attracted a critical mass of topical bloggers

Tomorrow: Going beyond search and tags, to discovery.

You may have noticed that "Du bist Deutschland" has been the top search on Technorati for some time. Non-German-speaking readers likely aren't familiar with what's going on in the German blogosphere, so let me see if I can explain a bit.

A PR campaign has been going on for quite awhile in Germany with the slogan "Du bist Deutschland." The campaign has been quite controversial in Germany. Among other things, someone found a photo from around 1935 of Nazi propaganda using a similar phrase. As far as I can tell, this blog post is the discoverer of the photo (Correction: it was first discovered on a forum.). Also, see this post for some more background (in English).

All of this controversy has been going on since at least last November, but has been re-ignited lately. Last week, Jean-Remy von Matt, an executive at the organization running the campaign sent an email to his colleagues, which was soon posted to a blog by Jens Scholz (English translation on Jens' blog).

The second point of the email refers to blogs and has a another English translation on Scott Hanson's blog.

In short, he called blogs the "bathroom walls of the Internet." And bloggers "people who only exude*." As evidence for this filthiness, he suggests that people put "Du bist Deutschland" into technorati.com to see what bloggers have said about the campaign. As a result of this, a large number of people have been searching for "Du bist Deutschland" on Technorati.

For more explanation, see also Der Spiegel's take on this controversy.

* It appears the German word used here, absondern is a derogatory term for sharing one's thoughts, which can be translated as 'exude' or as one several coarser verbs.

Many blog publishers rely on ping relay service Ping-o-Matic to send blog notification updates including Technorati's ping server. Ping-o-Matic has suffered brief outages in the past and the site is now was recently completely offline. To ensure your blog notification updates are received by Technorati you should ping Technorati's servers directly.

If you use WordPress and have not changed your ping settings your update notifications are currently were not reaching any services, including Technorati and many online aggregators. To add Technorati's ping server to your list of update services please follow the instructions on our WordPress ping configuration page.

Technorati has contributed hardware and some coding help to the Ping-o-Matic team in their efforts to provide a service to the blogging community. We will continue to work with Ping-o-Matic to make blog update notifications quick and simple but we recommend bloggers add Technorati to your update notification settings to ensure the message reaches its final destination.

Update: As of Friday evening the service is currently back online.

The blogosphere is abuzz with Google's launch of their Blog Search. So far things look pretty interesting, and having a big traditional search player like Google working on blog search is a validation moment for the entire blogosphere.

This will mark a major milestone for the World Live Web. At Technorati, we have a tremendous amount of respect for the Google team and for everything they've done in the world of search. I'm sure that they'll continue to improve over the coming months, perhaps including tags, recent images and links, zeitgeists, blogger tools, and other types of semistructured data. I'm sure that they'll also start indexing the full-text of blog posts, not just the partial text found in most blog feeds.

I welcome the competition. We've got some tricks up our sleeves too - and there's no doubt that in the end, the competition will end up producing more innovation and better services for bloggers and readers.

Welcome to the party, Google!

Today I'll discuss the impact of weblogs on mainstream media, the impact of the A-List, and the power of the long tail. You can compare today's report with the one in March 2005 and October 2005.

First off, some terminology and an understanding of what we're measuring. The chart below illustrates a measure of influence or authority of a site or blog as measured by the number of people who are linking to it. Note that this is not a measure of page views or website "hits". Rather, Technorati looks at linking behavior as a proxy for attention and influence. In other words, the more people who link to a site or blog, the more influence it has on others.


Technorati blogs and MSM slide

As the chart above shows, the most influential media sites on the web are still well-funded mainstream media sites, like The New York Times, The Washington Post, and CNN. However, a lot of bloggers are achieving a significant amount of attention and influence. Blogs like bOingbOing, Daily Kos, and Instapundit are highly influential, especially among technology and political thought leaders, and sites like Gizmodo and Engadget are seeing as much influence as mainstream media sites like the LA Times. A note on counting: Some organizations with multiple domains or highly syndicated strategies like the Associated Press and Reuters, are underrepresented in this chart, given that their impact is not easily countable using our methods. An interesting statistic to note is the current placement of subscription sites like WSJ.com (the Wall Street Journal). While the WSJ has begun to offer some content outside of its subscriber-only site, the policy is clearly costing them some influence and attention in the blogosphere, as bloggers find it difficult to link to articles in the subscriber-only sections. Also interesting to note is that even though The New York Times and The Washington Post require free registration to view the articles, bloggers are still linking to the stories, and this behavior hasn't changed much in the past 6 months.

More to come later in the week, including all of the underlying data...

Today I will write about some of the darker sides of the blogosphere, including the increase in spam and fake blogs, comment and trackback spam. Along with the growth in the blogosphere (as reported in parts 1, 2 and 3 last week), Technorati has also been tracking an increase in the number of people who are trying to manipulate the blogosphere. First off, some defintions:

Spam Blogs

Spam blogs are blogs that are created in order to influence results on a search engine by filling the results with spam or fake postings. Sometimes it is done to influence page rank-type algorithms, which monitor the number of pages (in this case blog postings) what link to a page or a site. In the more general web sense, these are called "Link Farms". Sometimes it is to push higher rankings of those posts and blogs for certain keywords, also known as "keyword stuffing". There's been quite a bit already written about link farms and keyword stuffing, it is a pretty well-known technique used by some people to influence search ranking. It is also pretty easy to catch, and most search engines actively penalize or exclude these sites from their index. Here's some example spam blogs.

Fake Blogs

Fake Blogs are blogs that appear "blog-like" on the surface: They have numerous posts, usually around a particular area or subject, and at first glance look as if they were created by a person. However, these blogs are actually automated creatures created by programs usually in order to get highly targetting Adsense advertising, or in some cases are built to be become a portal for affiliate systems like the Amazon Associates program. They are created in order to perpetuate click fraud or sometimes as a part of a "make money fast" scam on the internet by again taking advantage of traffic brought to them by search engines and web rings. Here's some example fake blogs.

I should note that some fake blogs may very well contain interesting and relevant content, which opens a debate onto how useful or valuable they are. This is why I don't include fake blogs in with Spam blogs (as defined above) because it is debatable that these systems are actually providing readers some value.

Comment and Trackback Spam

Modern blogging systems allow for comments and trackbacks as ways of allowing readers or other bloggers to easily add their thoughts and comments to a post. Unfortunately, some spammers have been abusing these systems as well. Many hosting providers and tool makers have incorporated authentication mechanisms and captchas to make it more difficult to automate the tasks. They have also added moderation capabilities and many vendors have made these moderation system turned on by default on new blogs. Early this year, a number of search engines including Technorati adopted the rel="nofollow" microformat. This latest set of salvos have worked quite well in many cases, but there are thunderclouds on the horizon as research into defeating captcha systems has been effective, and my expectation is that this will continue to be an ongoing battleground in the future.

So what's being done about it?

The people who build spam and fake blogs think that they can get some kind of advantage - usually by getting additional search engine rankings or affiliate income by building these systems. In essence, they believe that there is an economics that spurs them on - and at Technorati, we've been working together with leading players to eliminate that economic incentive. We're working with the folks who run web advertising systems and at major affiliate programs to alert them of spammers as quickly as possible. We've been building real-time systems to identify spammers and fake blogs and sharing that information with other web search engines so that link farms and keyword stuffers see no increases in search rankings.

Now, that doesn't mean that some of these blogs won't slip through - it requires a lot of algorithms, deep thinking, and human intervention to build and monitor systems that deal with these problems. It is also an ongoing issue that needs time, care and attention as spammers come up with new and innovative ways to get game search engines and affiliate networks. It would be disingenuous of me to proclaim that the folks at Technorati have got it all solved. We don't. But we've been putting a lot of time and effort into building those systems, and we're going to continue to innovate as well.

Technorati doesn't index comments or trackback content or links, and we also support the nofollow tag (you'll note I used it when linking to the example spam and fake blogs above) to give greater control to bloggers who want to point to spam or fake blogs without implicitly endorsing the site.

We've also been working on a number of social methods to help filter through the blogosphere so that bloggers and readers can help to filter wheat from the chaff. Expect to see more from us on this in the coming months.

Web 2.0 Spam Squashing Summit

In February 2005, the first Web 2.0 Spam Squashing Summit was held in Silicon Valley. Key industry players such as AOL, Google, MSN, Six Apart and Yahoo were all in attendance at the standing room-only event, and it engendered a lot of industry cooperation and communication.

Working together with the same group of folks, the second Web Spam Squashing Summit will be held in the second half of September in Silicon Valley again. Final details are still being arranged, but representatives from Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Tucows, and WordPress have all confirmed their plans to attend the event.

More to come, including an open invitation to others in the industry, in the next few weeks. Watch this space.

Summary:
  • Along with the explosive growth in the blogosphere, there has also been a growth in spam blogs and fake blogs.
  • These blogs are almost always created by automated programs, not by people.
  • They are usually created with an economic incentive - to get better search engine rankings, or to create affiliate or advertising revenue.
  • Technorati has been working closely with major toolmakers, search engines, and hosting providers to quickly identify and stamp out spam and fake blogs.
  • The key to reducing blog spam is to eliminate economic incentives, and we are working with major advertising and affiliate programs to create roadblocks for spammers and creators of fake blogs.
  • Industry players including Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Technorati, Tucows, and WordPress and others are getting together in the second half of September for the second Web 2.0 Spam Squashing Summit.

Coming next: Blogs and the Mainstream Media.

Today's post is going to cover some new ground - Tags. This is new ground because Technorati started tracking and displaying blog post tags in January 2005.

A brief introduction to tags:

Tags are a simply categories or topics. Most blog tools make it easy to categorize your posts, and working with the microformats community, Technorati implemented a simple way to track and aggregate blog posts, photos, and links that are all categorized, or "tagged" with the same name. Unlike rigid taxonomy schemes that many people dislike using, the ease of tagging for personal organization with social incentives leads to a rich and discoverable system, often called a folksonomy. Intelligence is provided by real people from the bottom-up to aid social discovery. And with the right tag search and navigation, folksonomy may outperform more structured approches to classification, as Clay Shirky points out:

This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture.

For those of you interested in a deeper explanation, you can get more information on tags and Technorati's tagging implementation, including how it works and browse the top 250 tags in roman languages as well as across all languages.

Some Statistics:

First a look at the total number of blog posts with tags. The pickup rate has been nothing short of remarkable, over 25 million blog posts with categories or tags, as shown in the chart below:


Blog posts with tags July 2005

I can honestly say that no one at Technorati was expecting an adoption rate of that magnitude.

The chart below shows the number of tagged blog posts that we indexed each day from January through July of 2005:


Daily tag volume July 2005

Almost a third of each day's blog postings use tags or categories - just over 300,000 posts each day at the end of July. What is also interesting is that people are also busily creating the "long tail" of tags on a daily basis as well. In other words, lots of people are creating new tags that are built for specific purposes, like for conferences or travelogues - some are using tags to help build communities around a topic. There are even spammers who tag (more on that tomorrow, grr). Some bloggers are using tags as a way to help organize information around an area or topic, as in the folksonomy example cited above, and event organizers are encouraging this by suggesting tags to them for use in their blog posts, photos, and on social bookmarking services like del.icio.us and furl. The chart below shows the number of brand new tags tracked each day. Note how it starts off with a big spike, as nearly 100,000 unique tags were tracked in the first week.


New tags per day

The numbers dipped somewhat as most common words were soon used as tags. However, growth in non-english languages, especially asian languages such as Chinese and Japanese has increased the average number of new tags seen each day to about 12,000 per day.

Of course, because the act of tagging is such a new thing, making predictions on where it will go in the future is anyone's guess. I believe that as long as the tagging system is set up to encourage accountability (e.g. link-based tags that are inside of a blog post) and discourage gaming, the folksonomy created will continue to provide useful in helping even non-bloggers to help view a more organized world.

Summary:
  • Growth has been tremendous in the last 6 months: Technorati has tracked over 25 million tagged posts from January to July of 2005.
  • About 300,000 posts with tags were tracked each day at the end of July.
  • About a third of all blog postings use tags or categories.
  • People are tagging more than blog posts: Popular services include tagging photos and links (social bookmarks)
  • About 12,000 unique tags are discovered each day.
  • Tagging is growing in languages outside of English as well, including high adoption rates in Asian languages like Chinese and Japanese.

Oh, and one more thing: Thanks to our the computer visualization whizzes at the School of Art at Carnegie Mellon University, we came up with a video that shows the growth of tags in the blogosphere. You can see the most popular tags tracked each day as time goes from January (when things were still on a workbench) to late June 2005, when Technorati had tracked a total of about 20 million tagged posts.

This is the video that was shown at the AlwaysOn conference last month, and we've had numerous requests to put it up on the internet. Thanks to a very generous donation of storage and bandwith from our friends at Ourmedia.org and The Internet Archive who have put the video up on their servers. You can watch the 320x160 version or the full size video.

Please note, the full-size video is 61 megabytes, and the smaller video is 12.2 megabytes, so it may take a while to load on low bandwidth connections. The video is licensed under a Creative Commons Attribution-NonCommercial license so go ahead and remix, mash, and have fun with it, we had a blast making it.

Tomorrow: More on Spam...

Onwards and upwards! This is part 2 of the August 2005 State of the Blogosphere. Part 1 covered the overall growth of the blogosphere in terms of new blogs created. Today I'll discuss the number of posts made each day, also known as posting volume. Just to keep everyone updated on that set of statistics, here's what I wrote back in March, 2005:

To expand on my post yesterday on the overall growth of the number of weblogs, today I'm going to look at another important measure of the growth of the blogosphere, posting volume. A single post is a single entry to a weblog, whether it be a long essay or just a short entry, each is a post, and the posting volume is the aggregate number of posts per day. Just as it is important to note the increased growth in the number of weblogs out there, it is as or more important to see if blogging is a fad or if people are blogging at a sustained rate. The chart below shows that posting volume has been growing. (Compare with the chart from October 2004)

Here's that same chart updated with data through to the end of July 2005 (Compare with the chart from March 2005):


Technorati posts per day annotated with world event milestones

As you can see by the black trend line, posting volume has followed a strong upward trend. After a brief dip last winter, the average rate of postings has grown steadily such that at the end of July 2005, there were about 900,000 posts created each day. That's about 37,500 posts every hour, or 10.4 posts per second. It peaked at just over 1.1 million posts per day after the Live 8 concerts and Justice Sandra Day O'Connor announced her resignation from the US Supreme Court.

In fact, the posting volume has more than doubled in the 7 months from the beginning of January 2005 to the end of July 2005. Partly this is due to the tremendous popularity of simple hosted blog solutions like MSN Spaces, AOL Journals, Blogger, and LiveJournal, and we've seen a lot of people take up blogging because of the growth of tools like post-from-IM, a feature available for AOL and MSN users, where they can post from their instant messaging clients. There's also been a significant jump in tools making it easy to post to weblogs, including Flickr, TextAmerica, Buzznet, del.icio.us, and others, so posting can be as easy as tagging an interesting link or snapping a photo on your cameraphone.

I'd like to point out as well that Technorati's median time from post to index has now dropped to under 5 minutes. That means that on average, public blog posts are indexed by Technorati in less than 5 minutes after they are created or modified, and are thus available in our search and tag results. This is also part of the recent performance and scaling work we've been doing.

I always find it interesting to look at the spikes in posting volume as well, and see what they can tell us by looking at the number of posts around the current events that caused a significant reaction in the blogosphere. I've listed a few of them on the chart above, including the US political conventions last summer, the Indian Ocean Tsunami, the SuperBowl, Live8, and the London Bombings. Please note that the absolute number of posts is not indicative of importance of the event - remember that there are a lot more bloggers today than there were 6 months ago. However, it is very interesting to look at the percentage deviation from the norm that each spike represents - the bigger the relative spike, the more jarring the event was to the overall blogosphere.

On the larger chart you can also see the effect that weekends have on posting volume as well, generally causing a drop of 5-10% from weekday volume. Not shown on this chart is information in intraday posting volume: We see the largest number of posts each day between the hours of 7 a.m. and noon Pacific time, meaning between 10 a.m. and 3 p.m. Eastern time in the US.

Summary:
  • Technorati is tracking about 900,000 blog posts created every day.
  • That's about 10.4 blog posts per second, on average.
  • Median time from posting to inclusion in the Technorati index is under 5 minutes.
  • Significant increases in posting volume are due to increased mainstream use of easy hosted tools as well as simple posting interfaces like post-from-IM and moblogging tools.
  • Weekends tend to be slower posting days by about 5-10% of the weekly averages.
  • During the day, posting tends to peak between the hours of 7 a.m. and noon Pacific time (10 a.m. - 3 p.m. Eastern time).
  • Worldwide news events cause ripples through the blogosphere - not only in search volume, but also in posting volume.

More tomorrow - including the growth of tags.

Well, it is that time again! It has been almost 6 months since the last State of the Blogosphere, and so the team at Technorati and I have put together some high level information on what we've been tracking. Today I'll focus on the macro growth of the blogosphere, both in the number of bloggers out there, as well as in the growth of new blogs per day. You can compare the chart below to the charts from October 2004 and March 2005.

Cumulative number of Weblogs Tracked by Technorati

As of the end of July 2005, Technorati was tracking over 14.2 million weblogs, and over 1.3 billion links. Interestingly, this is just about double the number of blogs that we were tracking 5 months ago. In March 2005 we were tracking 7.8 million blogs, which means the blogosphere has just about doubled again in the past 5 months, and that the blogosphere continues to double about every 5.5 months.

MSN Spaces, Blogger, LiveJournal, AOL Journals, as well as a number of international hosted services are growing quickly, and use of software like WordPress and Movable Type to provide blogs continue to grow significantly. There's a growing number of WordPress-based hosted services that are arising, including Laughing Squid, DreamHost, and BlueHost, marking an interesting trend - that of ISPs and hosting providers using the GPL'ed software as a differentiating feature of their services. Moblogging sites like Textamerica and Buzznet have also been growing as well, as more people are blogging from their camera-enabled mobile phones. Growth has not only occurred in the US, but there has been a lot of blog growth in Japan, Korea, China, France, and Brazil, to name a few countries.

Here's a view of the number of new blogs created each day that Technorati is tracking, even after removing spam blogs (more on that later in the week) from our index:


Blogs created per day

You can see the charts from March 2005 and November 2004 to get an idea of how this is increasing, although all the data is included on the chart above. Technorati is now tracking about 80,000 new weblogs being created every day, which means a new weblog is created about every second. About 55% of all blogs are considered active - that is, 55% of all weblogs have had a posting in the last 3 months. In addition, 13% of all weblogs (currently 1.8 million blogs) update at least weekly.

Interestingly, the activity statistics have remained remarkably consistent over time - in November 2004, we reported that 55% of all blogs were active, which is just about the same number as are active today. I think that this shows that even as the blogosphere is growing at a geometric pace, the "stickiness" of the tools and the willingness to write hasn't changed much at all.

Summary:
  • Technorati was tracking over 14.2 million weblogs, and over 1.3 billion links in July 2005.
  • The blogosphere continues to double about every 5.5 months.
  • A new blog is created about every second, there are over 80,000 blogs created daily.
  • About 55% of all blogs are active, and that has remained a consistent statistic for at least a year.
  • About 13% of all blogs are updated at least weekly.

Tomorrow I'll give an update on posting volume, which is a better statistic to track the growth of blogging. Lots of people who start new blogs are kicking tires and thus the numbers displayed above could be indicative of a fad in progress - but watching the posting volume shows how many people are actually blogging on a day-by-day basis. I think that is a much better indicator that people are making blogging a habit and a part of their daily lives. Later in the week I'll also describe the rise of tags, the increase in spam (or fake) blogs and SEO, and give an update on the relative influence of blogs compared to the mainstream media.

View Archived Posts