Multi-Language Planet Sysadmin               

          blogs for sysadmins, chosen by sysadmins...
(Click here for English only)

February 20, 2009


7210 Review

InfoWorld reviews the 7210 NAS appliance - the ultimate NetApp killer :)

by milek ( at February 20, 2009 10:43 AM

Prawo Jazdy

BBC reports:

Details of how police in the Irish Republic finally caught up with the country's most reckless driver have emerged, the Irish Times reports.

He had been wanted from counties Cork to Cavan after racking up scores of speeding tickets and parking fines.

However, each time the serial offender was stopped he managed to evade justice by giving a different address.

But then his cover was blown.

It was discovered that the man every member of the Irish police's rank and file had been looking for - a Mr Prawo Jazdy - wasn't exactly the sort of prized villain whose apprehension leads to an officer winning an award.

In fact he wasn't even human.

"Prawo Jazdy is actually the Polish for driving licence and not the first and surname on the licence," read a letter from June 2007 from an officer working within the Garda's traffic division.

"Having noticed this, I decided to check and see how many times officers have made this mistake.

"It is quite embarrassing to see that the system has created Prawo Jazdy as a person with over 50 identities."

The officer added that the "mistake" needed to be rectified immediately and asked that a memo be circulated throughout the force.

In a bid to avoid similar mistakes being made in future relevant guidelines were also amended.

And if nothing else is learnt from this driving-related debacle, Irish police officers should now know at least two words of Polish.

As for the seemingly elusive Mr Prawo Jazdy, he has presumably become a cult hero among Ireland's largest immigrant population.

by milek ( at February 20, 2009 09:43 AM

Compound thinking

TG2 doc sprint this weekend

We’ve declared a final feature-freeze on TurboGears in the runnup to the 2.0 release.

But that doesn’t mean that there’s not still lots of work to do. We’re working on a updates to the TG website, radically improved documentation, and better marketing materials. So, we’re having another TG2 sprint this weekend, but this sprint will be focused on infrastructure documentation, and will be the most newbie friendly sprint ever since one of the primary tasks is to make sure that the docs make good sense to new users.

Documentation is hard because the people who write it generally are experts, and have forgotten what’s complicated/confusing to new users. So we really, really need fresh eyes and fresh perspectives.

And one of the most useful things you can do is read and comment on our docs:

If you have a few hours, please join us in #turbogears on IRC on freenode (see this tutorial if you’re new to IRC) and sign up for the sprint here:

by Mark Ramm at February 20, 2009 03:03 AM

Unix Admin Corner

the power of 3's

Time for another post. I have started another contracting assignmnet, I'm going to keep posting as much as I can.

If you haven't noticed most of the new stuff in Solaris 10 has just 3 commands or less. Here are a few examples, not sure if it was part of an initiative inside sun or what. And I'm sure I have missed a few.

svccfg - configure manifest and services
svcadm - enable, disable, reset, reload sercices
svcs - list services under svc control

zpool - configure, monitor, maintain your storage pool
zfs - manage, monitor, maintain your filesystems
zdb - get the hidden secret tidbits of your filesystems, hardly anyone use it

dtrace - it does it all.
lockstat - based on dtrace but is still just a glorified dtrace program

fmadm - fault management configuration tool
fmstat - report fault management module statistics
fmdump - fault management log viewer

by (jamesd_wi) at February 20, 2009 02:30 AM

LOPSA blogs

Tonight's Atlanta Linux Enthusiasts presentation

Tonight's presentation was by Derek Carter aka Goozbach on Cobbler. Derek is a Cobbler developer. You can get Presentation notes at his site.

by villyard at February 20, 2009 02:19 AM

February 19, 2009

OReilly Radar

ETech Preview: Science Commons Wants Data to Be Free

John Wilbanks has a passion for lowering the barrier between scientists who want to share information. A graduate of Tulane University, Mr. Wilbanks started his career working as a legislative aide, before moving on to pursue work in bioinformatics, which included the founding of Incellico, a company which built semantic graph networks for use in pharmaceutical research and development. Mr. Wilbanks now serves as the Vice President of Science at Creative Commons, and runs the Science Commons project. He will be speaking at The O'Reilly Emerging Technology Conference in March, on the challenges and accomplishments of Science Commons, and he's joining us today to talk a bit about it. Good day.

John Wilbanks: Hi, James.

JT: So science is supposed to be a discipline where knowledge is shared openly, so that ideas can be tested and confirmed or rejected. What gets in the way of that process?

This photograph is licensed to the public under the Creative Commons Attribution-Share Alike 3.0 license by Fred Benenson

JW: Well, most of the systems that scientists have evolved to do that: sharing, confirmation and rejecting, evolved before we had the network. And they're very stable systems, unlike a lot of the systems that we have online now, like Facebook. For science to get on the Internet, it has to really disrupt a lot of existing systems. Facebook didn't have to disrupt an existing physical Facebook model. And the scientific and scholarly communication model is locked up by a lot of interlocking controls. One of them is the law. The copyright systems that we have tend to lock up the facts inside scientific papers and databases, which prevents a lot of the movement of scientific information that we take for granted with cultural information.

Frequently, contracts get layered on top of those copyright licenses, which prevent things like indexing and hyperlinking of scholarly articles. There's also a lot of incentive problems. Scientists and scholars tend to have an incentive to write very formally. And the Internet, blogging, email, these are all very informal modalities of communication.

JT: What role does Science Commons play in improving communication between scientists?

JW: Well, if we're successful, what we want to do is to get to the point where the only problems we have in scholarly communication are technical. This is inspired by Jim Gray from Microsoft, who years ago said, "May all of your problems be technical." If we can get the law out of the way, and let the traditional norms of science, which really are the ideals of community and of sharing of information apply, you can't really claim credit for something in science unless you publish it after all. If we can bring those norms into the Internet age with things like standard copyright licenses that Creative Commons has developed, with explorations of new ways to track impact, bringing ideas like trackback that came from the blog world to the scientific communication world. If we can help convince people that the public domain is something to be cherished, and not a thing to be avoided at all costs when it comes to things like data.

If we can make biological material and other sorts of physical research materials move around the world, the way that Amazon moves books around. And if we can make the web work for data the way it works for documents. Right? Those are the things that we really want to do. And if we can do those things, we think that the innate nature of science, which is publishing, which is community-based, which is about sharing information and remixing information, those norms are going to take over if we can simply get the resistance out of the way.

So what we are trying to do is to intervene in places where legal tools, technical tools, policy tools can lower those barriers that are currently preventing the real emergence of the traditional scholarly norms on the internet.

JT: I wanted to go back for a second just to something you had mentioned in your previous answer. Scientific papers tend to be written, as you mentioned, in a very specific dense, passive voice manner. And I've talked to my wife who has a science background about it. And she says, "Well, that's just the way that you present information in science." Do you think that gets in the way?

JW: I think so. I think there's a little bit of a Guild mentality, you know, in terms of the language and structure and flow of these papers. It's taken me some time to learn how to read them. And it's artificially idealized I think. Because you're trying to present what happened in the lab one day as some fundamental truth. And the reality is much more ambiguous. It's much more vague. But this is an artifact of the pre-network world.

There was no other way to communicate this kind of knowledge other than to compress it. Now when we think about compression, we think about zip algorithms. But in the old days, compression meant writing it down on physical paper and mailing it to someone. That was the only way to condense and convey the knowledge in anything representing an effective way. But the problem is, we've digitized that artifact of the analog world. And so the PDF is basically a digital version of paper with all of the attendant technical benefits.

So what we need to do is both think about the way that we write those papers, and the words and the tone and how that really keeps people out of science. It really reduces the number of scientists. But we can also think about the possibilities the Internet brings us, if we think about the article as simply a token for several years of research, which include data, which include lab notebooks, which include research materials, software. All of these things that are put into the advertisement that is the paper. Those things can be really very powerful on the Internet. And we can also begin to think about interpreters. You know, there's an org called ACAWIKI, which I sit on the board of, whose goal is to write human-readable summaries of scientific papers in a wiki format. So we can begin to use some of these tools to really crack the code, to a certain extent, of scholarly writing. And people don't have to change right away. We can change their works for them, as long as the rights are available.

But if those papers aren't available, we can't summarize them on ACAWIKI. We can't hyperlink to them all the materials in the data sets. We have to wait for the publishers to give us those services, or to sell us those services.

JT: So in some sense, like the IRC chat that the researchers had about their research might be as valuable as the final paper itself.

JW: It would be a piece of the story, you know? We're wired for storytelling as people. We're not wired for statistical uncertainty. We try to fit things into narratives. And being able to understand the back-story is very important. So what happens in the hallways at conferences is very analogous to what happens in IRC chats or what happens on blogging and comments on wikis. And we're starting to see, for the first time, the emergence of blogs and wikis, especially in the newer disciplines, as something that's very akin to the hallway conversations or the bar conversations at a conference.

But until now, we've really never had the ability to put all of that stuff together with the data. I mean can you imagine the human genome before a computer? Even if we could've sequenced it? Right? Someone just hands you page after page after page of As, Ts, Cs and Gs.

JT: The infamous Encyclopedia Britannica of genome.

JW: Right. I mean the computer enables all of these things, but the scientific practices are so stable that they're really resisting change. And so I think of it as the way that we communicate scientifically has evolved as this formal system. And like most good systems, it's stable against disruption. And it's stable against bad disruption. But it's also stable against good disruption. So the entire system, each time you look through, is there a legal problem or an incentive problem or a workload problem, each time you sort of peal those layers, you typically find yourself back at the beginning. When you solve the one round, you're back at a new legal problem.

And we sort of have to keep pushing the rock up the hill. And the digital Commons methodology, sort of broadly speaking, of standard contracts, distributed workloads like in Wikipedia, where there’s a lot of people doing it. And the incentive problem drops away if you have enough people, because it doesn't matter why any one person contributes to Wikipedia as long as enough do. That trio, which is the same trio of problems, can become a trio of solutions. And so we have to sort of focus -- at the same time we try to change the existing system, we have to cultivate the growth of these systems that use the same tools to build open systems.

JT: How does the desire to share knowledge coexist with the desire to monetize discoveries?

JW: That's a good question. Right now, they're pretty tightly linked in a lot of places. And that does create problems. The degree of those problems is a matter for very heated debate. I tend to fall on the side that I don't really have a problem with monetizing discoveries. I don't really have a problem with the idea of patents. I have a problem with the way patents sometimes work in day-to-day business. But for the most part, those are things that are tough to solve in licensing methodologies. Those need to be solved by patent reform, things like the crowd sourcing of prior art of patents which I think is a fantastic idea, which is going to solve a ton of problems. The USPTO is badly overworked. If the crowd can help them find prior art, then a lot of patents that aren't really novel won't make it through anymore.

And I think those are the -- because the debate over monetization tends to turn on patents. I have a much bigger problem with people who try to lock up taxpayer funded literature. The way that we have had this difficulty in getting the clinical trial papers that our tax dollars have funded, that really burns me, because that's monetizing at the granular level in a way that makes it very hard to do sort of A) first read the stuff you paid for with your tax dollars, but B) take that information and really make it digital. Hyperlink it. Convert it. Mix it. Rip it. Spindle. Mutilate. All of that good stuff that we do with data and information on the web. That's the stuff that I want people doing. I mean it depresses me that we have so much more innovative programming researchers going into Facebook than we do into clinical trial data. And into the life sciences. But in many ways, that's a function of the culture, of trying to attach value to every datum and every article, financial value. Instead of thinking about the collective value we could get if that stuff was a common resource.

When it comes to people getting patents out of that, if those are meaningful patents, if they're valid, I don't have that much of a problem with, as long as they're not licensed abusively.

JT: Right. I mean in one sense, a patent is kind of an open sharing of information because in patenting something, you have to kind of open kimono everything about it.

JW: Right. And we just launched a project today with Nike and Best Buy, which is trying to rebuild some of the traditional academic research exemptions available to it to patent portfolios and private companies, especially for sustainability technology. And as part of that, we're going to be exploring how to deal with one of the core patent problems, which is what people would call the hold up problem or the thicket problem which is when -- right now what happens is there is no research exemption, and there are no sort of click and download and sign contracts for patents. And that means that what happens is basically you go create a value, but you create value while you're violating someone's patents. And then after you've created value, you have to go negotiate, which is the worst time to negotiate. Because they've done nothing and you've made it valuable and now you have to give them money.

Now if you could get to the point where before you tried to create value, you could simply scan the patent landscape, identify some of the technologies you wanted to work with, and acquire the rights before you created value, or at least understand what those rights would cost you, that significantly lowers a lot of the problems that we're dealing with. And if you could put that together with some meaningful crowd sourced review of prior art and some judicious narrowing of the scope of patents for things like genes, I think the patent system works a lot better right now than the copyright system does. I think it needs tweaking in a lot of ways, not sort of a dedicated common use licensing system like the GPL.

JT: In some ways, I was just thinking what you're really talking about is something more analogous to, if I go and say I want to build a .NET based application, I don't go build it and then Microsoft says, "Oh, that's pretty valuable so you owe us a million dollars. But if it isn't valuable, you only owe us $10,000."

JW: Yeah. It's not utterly dissimilar to that. I think that's the kind of thing that could make the patent system we have function a lot better. I mean could you imagine if every time you built software, then you had to go get the rights from Oracle to use the database? I mean it'd be great for Oracle.

JT: Yeah. I was going to say I think Oracle likes that business model, but --

JW: Well, Oracle might like that business model, but I would argue that the total number of people who'd be willing to write code to Oracle goes up as a function, right? So in the sciences, the example would be the polymerase chain reaction which is sort of the Xerox machine for DNA. This was a fundamental invention. It enabled the emergence of the biotech industry. But it was made available in a nonexclusive patent license that was standard. And everyone signed it. Whereas if you'd had to negotiate, if you'd had to go argue, only people who are really skilled at negotiating would've gone and gotten the rights.

And a lot of application -- a lot of that sort of churn of stuff that didn't turn out to be useful, that would've been chilled. And so maybe the one in 100 idea that exploded and became a major technology wouldn't ever have tried because no one would've believed it. It's very hard to quantify what doesn't happen as a result this stuff. You can't get a number on it. And that's one of the hardest things economically about making the argument is you're trying to argue about how much innovation isn't happening.

JT: That's kind of like the how much music isn't sold because of piracy.

JW: Right. It's hard to figure that out. And that's why we tend to try to focus on the opportunities that are available to make money, as well as to have the open innovation happen. We want both of those things to happen. Markets are not an evil thing. But you want those markets to be operating on a set of rules that really trends towards social good, too. And the commons, the voluntary contractual private commons appears to be a pretty good way to get some of those market forces going in that direction.

JT: One area that is clearly under attack is the traditional model of the expensive scientific journal, through mechanisms like the Public Library of Science. How successful is that movement being?

JW: Well, I mean I would say that it's become an adolescent? Which means it's trying to steal dad's car, and it's acting up. It's made it out of early childhood, that's for sure. The Public Library of Science has become a very high-impact, very respected journal publisher. It's at the highest levels of scientific quality. And their business model is still developing. And I think that their new PLoS ONE venture, which is a new online only thing, and their upcoming hubs work which is going to build communities, those are going to be really interesting things to watch.

In terms of sort of proving itself from a business perspective, BioMed Central, who has nearly 250 journals, I believe, under Creative Commons licenses, was sold in December to Springer. My understanding is that BMC's annual revenues were in the 15 million pounds per year range. Again, not using any sort of copyright transfer when they were bought by Springer. And so that really was, I think, a vindication of the capability of a for-profit model that was open. And I love to point to Hindawi, which is in Egypt, which is also profitable, which has another few hundred journals under C.C. by license. So we're certainly seeing some proof points that this can be high-quality and this can be profitable. But there's still a lot of uncertainty as to how the existing journals adapt to that. It's much easier to start from scratch with a new model than it is to change midstream. I mean, if you think back to DEC and how hard the transition to the microcomputer and the personal computer was for DEC. This is no less fundamental of a change than the change from the mainframe to the microcomputer in terms of models for these publishers.

And so it's very scary. And I think we have to be open and honest and accept that as a valid emotion, and try to work with the existing publishers and especially with the scholarly societies who don't have the million dollars to invest in R&D on scientific publishing that a big publisher might have. These are operating to the bone. And so we need to work with them, and help them find ways to make the transition in a way that doesn't destroy them at their core.

The danger is that this was the way the music industry went. I mean that something like iTunes comes along and kills the industry. We don't want that to happen. We want a healthy ecosystem. We want a competitive market. We want robust publishing houses inside scholar societies. But we want to move that into a direction that allows this sort of remix and mash up of information in science. And so we just have to help them find models, publishing models, business models, legal models that help them make that transition and be part of the solution with them.

JT: The volumes of data that are being produced in science today are somewhat staggering. You have petabytes for the Large Hadron Collider, if they ever get the thing running, as well as huge amounts accumulating in genomic and bioinformatic databases. How can the scientific community effectively share these kinds of massive collections?

JW: You know, it's hard to say. I mean some of these telescopes that are going up in the next five years, they're talking about a petabyte a minute, you know? It's just staggering. And so that's stuff that from just a physical perspective, pushing those bits around is going to be slow. And so we're not going to have lots of copies of these things. But I think that we have a couple of things that have to happen. One is that the communities involved have got to come to some agreement on meaning. And by meaning, I mean sort of standard names for things and relationships between things. Ontologies. Hierarchies. Taxonomies.

Things like data models for the SQL database but at a global web scale. Because in the absence of those, these are just piles of numbers and letters. And that's hard to do. It's really hard if you expect there to ever be sort of a final agreement on that list of names and relationships. So we've got to find both technical ways and social ways to have lots of different points of view represented and evolving and integrating.

And so if everyone locks up their own point-of-view, if everyone locks up their ontology, if everyone locks up their data model, then it's unlikely that we're going to get to this world where you can sort of pop one in and pop one out. So I think that's the first thing we have to have, is that we've got to have a web of data. And a web of data means we need common names for things, common URLs, common ways to reference things. And that is beginning to happen. Those battles are being fought on obscure list serves and places like the Web Consortium and in different scientific disciplines. But those are going to become hot button topics outside the sort of core Semantic Web geek community in the next couple of years because they just have to. Otherwise, this data just becomes, again, a pile of numbers.

Another thing that has to happen is we're going to have to develop some meaningful ways to federate that data, so that it's not vulnerable to capture or to failure. And when you're talking about data at that scale, it becomes important to understand what to keep and what to forget. And we're not very good right now at forgetting stuff. It's not in our culture, because it just felt like we could just keep storing everything. But if you're talking about an Exabyte a day, and you have ten projects doing an Exabyte a day, the cost of storing and serving that is such that it's unlikely that it's going to be widely mirrored. So we have to find some way to either federate that stuff or to figure out what to forget. And I think figuring out what to forget might be the hardest part.

JT: Right. That's almost like the guy who's got a million boxes in his attic. And when you ask him if he'll throw it out, "Well, I might need it some day."

JW: Right. You never know. So if we're going to do that, then we have to federate. We have to figure out how to deal with preservation and federation because our libraries have been able to hold books for hundreds and hundreds and hundreds of years. But persistence on the web is trivial. Right? The assumption is well, if it's meaningful, it'll be in the Google cache or the internet archives. But from a memory perspective, what do we need to keep in science? What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data? Is it the normalized data? Is it the software used to normalize the data? Is it the interpretation of the normalized data? Is it the software we use to interpret the normalization of the data? Is it the operating systems on which all of those ran? What about genome data?

So we did a lot of serial analysis of gene expressions called SAGE. It was looking at one gene at a time and seeing what it did. We can now look at the entire human genome. Should we keep the old data? We have a new machine, gives us high resolution, more accurate data. Should we keep the old stuff? Right? No one's really dealing with these questions yet.

JT: Right. I mean you think about in astronomy data, you may have higher resolution data, but that old data may have the asteroid that was moving through at that particular time it was taken, and you'd really like to have that data.

JW: Right. So discipline by discipline, the norms over what to forget and what to keep are going to be very different. Right? Anything that measures time or moments in time, you know, geographic time, astronomic time, that's probably going to be really valuable. But 50-year-old physics experiments, maybe not so much. Fifty-year-old genomics experience, maybe not so much.

JT: Unless it's drifting the genome, in which case, it would be.

JW: Right. I mean this is -- you and I could talk about this for about two hours. It's a really hard question. There's typically a really good argument on every side.

JT: Right. It's whoever has the best budget for disk drives.

JW: Right. And ideally, what we could say is, "Okay. Well, we can delete the old data if there's a physical sample." Right? Because we can go back and recreate the genome data from an even smaller piece of it to check genetic drift. Right? So suddenly, we've gone from the world of the digital back to the world of the physical samples as well.

JT: So is there some data that's just too sensitive or too potentially data to share generally? Where do issues like privacy or national security factor into a project like Science Commons?

JW: So privacy is very important. There's a very strong privacy right in the United States on personal health information. And certainly, data about people that can be identified about their health, about their lives, that is stuff that really ought to be in the control of the individual. So from a legal perspective, from a copyright perspective, it would be in the public domain, but it would be subject to privacy regulation. And in my ideal world, people would be able to make informed choices about when and how that data got shared.

As it is -- and the informed part of that tends to be pretty hard to achieve. For a long time, your health records were less protected than your video rental records. So the regulation of data is sort of scatter shot. But I really hope in particular that individuals are empowered to take charge of their own data. When it comes to national security, I tend to think that if the government collects data that's national security data, then it's sort of up to them whether or not they want to release that stuff. When it comes to things like infrastructure, it's pretty easy to find all the data you want about power plants just using Google Earth. And the genie is sort of out of the bottle in terms of geographic information and a lot of other information.

It's incredible what you can find on the web if you're relatively good at web searching. And so my instinct is that we should use the power of the community and the crowd to try to mitigate risk rather than trying to suppress information to prevent the emergence of risk. I think the risk is there, I think, for things like someone could reconstruct the 1918 flu genome. So we need to share as much as we know about the flu genome so that if that happens, we can intervene.

JT: Right. I was, in fact, talking earlier tonight with someone who's involved in synthetic biology and you get into this question if those sequences are available and now you have commercial services where you basically send a sequence and get back DNA, that you're getting into the realm where it gets to be kind of easy to do that kind of stuff or easier.

JW: Yeah. And I mean I draw these statements from my conversations with Drew Endy, in particular, in the synthetic bio community which is that that information is available. You can get -- it's like 40 cents a letter at, I think, to sequence genes now, to synthesize them. You can get it done all over the world. You can get it done in Pakistan, you know? What we have to be is ready to deal with it. And that requires the information sharing being robust enough that if something happens, we can rapidly identify it, understand the risk, and intervene against it.

JT: All right. Back to your questions. What are the challenges ahead for Science Commons? And what do you think are the most significant short-term impacts the project will make?

JW: I mean the biggest challenge, I think -- I mean there's a constant challenge of funding because funding non-profits is always hard. In this economy, it's almost lethally hard. But beyond that, I think the biggest challenge is what we started with which is that the existing systems for science are pretty robust against disruption. And working against that, trying to get to a point where scientists see the value of sharing and indeed, believe that they can out compete other scientists if they share. But that's the biggest challenge because you've got to do so many things simultaneously. You've got to deal with legal problems, both contract and intellectual property problems. You've got to deal with incentive problems. You've got to deal with workload and labor problems. You've got to deal with the Guild culture and the Guild communication systems, all of that at once. And that's really hard. So getting through this collective action problem is probably the hardest thing we've had to do from the beginning and will continue to be the hardest thing we have to do.

In terms of in the short-term, at ETech, we're going to announce a pretty major partnership around some of our technology work with a major technology company. So we're going to issue a big press release and all of that, so I won't get into detail. But we're going to be announcing the integration of our open source data integration project which we call the NeuroCommons, which we hope will become a major cornerstone of the Semantic Web and integration of that with one of the world's largest software companies. And that's going to be, I think, a big first step. We're also looking to get a collection of major biological materials available for the first time under something that looks an awful lot like a one-click system. So that no matter who you are or where you work, you'll be able to order the kinds of materials that were previously only available to members of sort of elite social networks. And that's also going to be coming out in the coming weeks.

JT: So I'll be able to get like Amazon Prime to deliver my restriction enzymes for me?

JW: Exactly. Exactly. Under a Science Commons contract. Under something that looks an awful lot like a Creative Commons contract. And then today, we announced a project with Best Buy and Nike around how to share patents and recreate the research exemption in the United States. And we expect that to be a pretty big deal this year as well. The last thing is that in the coming weeks, we expect -- and I don't know if this will happen before ETech or right around ETech -- we expect one of the world's largest pharmaceutical companies to make a dedication of hundreds of millions of dollars worth of data to the public domain.

Now, we can't take credit for that. They made this decision before us. But we are going to help them do it. And I think that's going to be something that is as big as IBM embracing Linux in the late 90s in terms of really building an open biology culture.

JT: So beyond the announcement you just mentioned, you're going to be speaking at the Emerging Technology Conference on your work. Can you give us a feel for what you're going to be talking about?

JW: Sure. I mean I'm going to go through a lot of the stuff that we've talked about here. I'm going to go through the ways we think the system is resistant to disruption, the reasons why a digital commons is a good tonic for that problem and some experience from the road. We're going to talk about what it's been like to go out and actually try to build the fundamental infrastructure for a Science Commons and give some examples of our work and then make that big announcement about the partnership with the technology company.

JT: Well, I've been speaking today with John Wilbanks who is the Vice President for Science at Creative Commons in charge of the Science Commons project. He'll be speaking at the Emerging Technology Conference in March. Thank you for taking the time to talk to us.

JW: Thanks a lot.

by James Turner at February 19, 2009 11:59 PM

DanT's Grid Blog

Omniture Summit '09

I'm out at the Ominture Summit '09 in Salt Lake City this week, and I'm very pleased to report that it doesn't suck. I was very worried that I was paying three grand for a two-day sales pitch, but I took a leap of faith and registered anyway. Turns out that feedback from previous years' conferences have inspired Omniture to dial back the product pitches and user-oriented content. This year, there has been a wealth of useful information on Internet marketing in general and very few product sales pitches.

And as an added bonus, as I'm writing this post, I just won a Corsair vintage radio from Vintage Tub & Bath for being quick to raise my hand.

The big take-aways for me have been:

  • I think I finally get Twitter. I singed up for a Twitter account, just to play with it, but I hadn't quite comprehended how Twitter is useful for a company or organization. Twitter has been a big theme at this conference. Everyone is trying to figure out what to do with this untested new marketing channel. Look for more from me there in the near future.
  • An important point that I missed before is that to be successful in the brave new world of social networking, you have to be a full participant. It's not enough to just broadcast. The communication has to be 2-way.
  • Building a community is a lot like creating viral media. There's no formula. You can't just create a community. All you can do is seed the ground and hope something grows. You can, however, do a lot to encourage the right things to grow and to help things along. In the end, it really does still come down to content.
  • Brand has to be pervasive. Martin Lindstrom calls it "smashable brand," meaning that the brand should be recognizable by even the smallest fragment. Is your website obviously your if you take away the logos and products? His new book, Buy-ology, looks pretty interesting. We all got free copies, so I'll let you know if it was after I've had a chance to read it.
  • Mobile is the next marketing frontier. Makes me glad I don't have a data plan.

by templedf at February 19, 2009 11:20 PM

Managing Product Development

Announcement: Project Portfolio Book is in Beta

The project portfolio book is in Beta! For those of you who have not heard of a Pragmatic Bookshelf Beta book, here are the limitations:

  • It’s not laid out correctly
  • Although I’ve looked for typos, I have not discovered and fixed them all
  • The bibliography is not yet included (it will be on later betas, but not this one)
  • Not all the chapters are there (yes, I have drafts, but the drafts are not ready for beta)
  • It could be wrong in places! (For example, Daniel pointed out to me the other day that I had defined double elimination incorrectly. Sigh.)

On the other hand, if you would like to see the book evolve, I invite you to participate in the beta. I already have errata which I will be fixing between the coughing and sneezing of my “vacation” cold.

by johanna at February 19, 2009 08:10 PM

Linux Poison

KDE 4.2 on Windows (Installation & Configuration)

The KDE on Windows Project is aimed at native porting of the KDE applications to MS Windows. Currently Windows 2000, XP, 2003 and Vista are supported. The preferred way of installing KDE apps under Windows is the KDE-Installer. When you run KDE-installer for the first time, you'll see the welcome screen. Since it's your first launch leave the checkbox below unchecked. Proceed to the next screen

by Nikesh Jauhari ( at February 19, 2009 06:43 PM

LOPSA blogs

Update: Still ignoring

Yep, still ignoring this blog.

But no more than my other blogs.

by jdetke at February 19, 2009 06:11 PM


The Backup Tool

In my previous blog entry I wrote an overview of an in-house backup solution which seems to be a good enough replacement for over 90% of backups currently done by Netbackup in our environment. I promised to show some examples on how it actually works. I can't give you output from a live system so will show some examples from a test one. Let's go thru couple of examples then.

Please keep in mind that it is still more like a working prototype than a finished product (and it will most certainly stay that way to some extend).

To list all backups (I run this on an empty system)
# backup -l

Let's run a backup for a client mk-archive-1.test

# backup -c mk-archive-1.test
Creating new file system archive-2/backup/mk-archive-1.test
Using generic rules file: /archive-2/conf/standard-os.rsync.rules
Using client rules file: /archive-2/conf/mk-archive-1.test.rsync.rules
Starting rsync
Creating snapshot archive-2/backup/mk-archive-1.test@rsync-2009-02-19_15:11--2009-02-19_15:14
Log file: /archive-2/logs/mk-archive-1.test.rsync.2009-02-19_15:11--2009-02-19_15:14

Above you can see that it uses to config files - one is a global file describing includes/excludes which are run for all clients and the second file which describes an include/exclude file for that specific client. In many cases you don't need to create that file - the tool will create an empty one for you.

Let's list all our backups then.

# backup -lv
mk-archive-1.test 1.15G 1.15G 1.75x 35 (global)
mk-archive-1.test@rsync-2009-02-19_15:11--2009-02-19_15:14 1.15G 0 1.75x

The snapshot definies a backup and I put the start and end date of the backup in its name.

If you want to schedule a backup from a cron you do not need any verbose output - there is an option "-q" which keeps the tool quiet.

# backup -q -c mk-archive-1.test
# backup -lv
mk-archive-1.test 1.15G 1.16G 1.75x 35 (global)
mk-archive-1.test@rsync-2009-02-19_15:11--2009-02-19_15:14 1.15G 6.63M 1.75x
mk-archive-1.test@rsync-2009-02-19_15:16--2009-02-19_15:16 1.15G 0 1.75x

Now lets change the retention policy for the client to 15 days.

# backup -c mk-archive-1.test -e 15
# backup -lv
CLIENT NAME REFER USED RATIO RETENTION 1.15G 1.16G 1.75x 15 (local) 1.15G 6.63M 1.75x 1.15G 0 1.75x

To start an expiry process of old backups (not that there is something to expire on this empty system...):

# backup -E

Expiry started on : 2009-02-19_17:21
Expiry finished on : 2009-02-19_17:21
Global retention policy : 35
Total number of deleted backups : 0
Total number of preserved backups : 0
Log file : /archive-2/logs/retention_2009-02-19_17:21--2009-02-19_17:21

You can also expire all backups or for a specific client according to a global and a client specific retention policies, you can generate reports, list all currently active backups, etc. The current usage information for the tool looks like:

# backup -h

usage: backup {-c client_name} [-r rsync_destination] [-hvq]
backup [-lvF]
backup [-Lv]
backup {-R date} [-v]
backup {-E} [-v] [-n] [-c client_name]
backup {-e days} {-c client_name}
backup {-D backup_name} [-f]
backup {-A} {-c client_name} [-n] [-f] [-ff]

This script starts remote client backup using rsync.

-h Show this message
-r Rsync destination. If not specified then it will become Client_name/ALL/
-c Client name (FQDN)
-v Verbose
-q Quiet (no output)
-l list all clients in a backup
-v will also include all backups for each client
-vF will list only backups which are marked as FAILED
-e sets a retention policy for a client
if number of days is zero then client retention policy is set to global
if client_name is "global" then set a global retention policy
-L list all running backups
-v more verbose output
-vv even more verbose output
-R Show report for backups from a specified date ("today" "yesterday" are also allowed)
-v list failed backups
-vv list failed and successful backups
-E expire (delete) backups according to a retention policy
-c client_name expires backup only for specified client
-v more verbose output
-n simulate only - do not delete anything
-D deletes specified backup
-f forces deletion of a backup - this is required to delete a backup if
there are no more successful backups for the client
-A archive specified client - only one backup is allowed in order to achive client
-c client_name - valid client name, this option is mandatory
-n simulate only - do not archive anything
-f deletes all backup for a client except most recent one and archives client
-ff archives a client along with all backups
-I Initializes file systems within a pool (currently: archive-1)


In order to immediatelly start a backup for a given client:

backup -c XXX.yyy.zz
backup -r XXX.yyy.zz/ALL/ -c XXX.yyy.zz

Above two commands are doing exactly the same - the first version is preffered.
The 2nd version is useful when doing backups over ssh tunnel or via a dedicated backup interface
when it is required to connect to different address that a client name. For example, in order
to start a backup for a client XXX.yyy.zz t via ssh tunnel at localhost:5001 issue:

backup -r localhost:5001/ALL/ -c XXX.yyy.zz


backup -E - expire backups according to retention policy
backup -e 30 -c global - sets global retention policy to 30 days
backup -l - list all clients in backup including their retention policy

by milek ( at February 19, 2009 05:43 PM

mikas blog

Event: Computeranimation Live-Modelling / Medienkünstler im Gespräch - Graz

Wann: 19. März 2009
Wo: MedienKunstLabor im Kunsthaus Graz
Was: open source creative commons Gespräche Party Live-Modelling event Talk Blender

Der Art Director von BigBuckBunny ist in Graz und zeigt, was mit Blender machbar ist. Weitere Details unter

[via Dorian Santner]

by mika at February 19, 2009 05:11 PM

OReilly Radar

Ignite Show: Jason Grigsby on Cup Noodle

Today we are launching the first episode of the Ignite Show. The Ignite Show will feature a different speaker each week. This week's speaker is Jason Grigsby doing a talk that was originally performed at Ignite Portland. Jason takes fun look at how Cup Noodle was created and how the team had to embrace constraints and new ideas to create this new food.

Ignite will be released for free weekly. It's available on YouTube (user: Ignite), on our Ignite site (file) and via iTunes (we'll be in the store shortly). It is being released under Creative Commons Attribution-Share Alike 3.0 United States License.

Ignite has spread to over 20 cities in the past two years. The third Ignite Boulder happened last night. The fifth Ignite Portland will happen tonight and New York's third is on Monday. We want to highlight speakers from around the world with the show. If your town or city has lots of geeks throw an Ignite to bring them together!

The format of Ignite is 20 slides that auto-advance after 15 seconds. When you are on stage giving an Ignite talk this can be quite exhilarating (sometimes terrifying). The added adrenalin really adds to the presentation and I think that will come through on the small screen.

Thanks to Ignite Organizers, sponsors and attendees from around the world for making this show possible. Thanks to Social Animal for editing the show, Bre Pettis for co-hosting the first episode (and starting Ignite with me), and Sam Valenti of Ghostly Records for letting us use Michna's Swiss Glide. Thanks to everyone at O'Reilly who has supported Ignite through the years especially Mary, Jennifer, Laura A., Laura P., Cali, Roberta, Mike, pt, Jesse, Sara, Laura B and Tim.

by Brady Forrest at February 19, 2009 05:09 PM

Cheap hack

How to Write a Linux Virus

Everyone knows there aren't any viruses for Linux. Or something like that. Some go on to the conclusion that Linux is immune to such things because it's just so much better-designed. You hear similar things about Macs. Blogger foobar on Geekzone had enough of such claims and has written a vague guide for "How to write a Linux virus in 5 easy steps." There's less to the article on how to write malware for Linux than there is in a lesson on how malware in the real world actually works and how vulnerable Linux is to it.

February 19, 2009 04:17 PM


25 Tutorials To Get You Started With Blender

Blender UI

If you are an aspiring Graphic designer you should be already familiar with blender by now, but if you are thinking about being a game developer or a graphic designer you should know that Blender is not only the best free and open source choice but also rivals all commercial 3d applications out there. You can look at two very detailed comparison here and here. Blender is the only free 3d application compared against the heavyweight industry favorites like 3d max (which costs around $5000+) and the only second application to support all three major operating system.

Why Blender?

  • Its free. Better than paying $5000 for a software.
  • It rivals commercial software both in functions and features.
  • Small installation footprint of 9mb. AutoCAD needs 2GB.
  • Large community support with last version generating 800,000+ downloads.
  • Intuitive user interface.
  • Python scripting API for game logic.
  • Third party free plug-in, textures, renderers and scripts available.

In simple words, and with all due respect to GIMP, Blender is closer to commercial products in terms of performance and features than GIMP is to Photoshop.

Hopefully these beginners tutorials of Blender will start your journey towards becoming the next rockstar graphic designer.

  1. Blender 3D: Noob to Pro
  2. Creating a Heart in 10 Steps
  3. Modeling a realistic Human Body
  4. Making a Platenoid
  5. Creating a Die
  6. Positioning Image textures using empties
  7. Cutting through steel
  8. Bump maps for beginners
  9. Fundamentals of UVmapping
  10. Creating a logo
  11. Making a rain effect
  12. Introduction to the Game Engine
  13. Your First Animation (part 1part2)
  14. The Blender Sequence Editor
  15. Special Effect With Blender Sequence Editor
  16. Creating a Dolphin
  17. Volcano Tutorial
  18. Fluid Simulation (Video)
  19. Creating simple animation
  20. Creating Asteroids
  21. High-tech corridor
  22. UV mapping and textures
  23. Texturing tutorial
  24. Creating fireworks
  25. Toon Shading

I hope you enjoy these tutorials as I have going through (most of) them.

If you liked this article, please share it on, StumbleUpon or Digg. I’d appreciate it.


by Pavs at February 19, 2009 03:02 PM

Debian Admin

Download links for Debian Linux 5 Lenny ISO / CD / DVD Images

Debian GNU/Linux version 5.0 has been released after 22 months of constant development and available for download in various media format. Debian GNU/Linux is a free operating system which supports a total of twelve processor architectures and includes the KDE, GNOME, Xfce, and LXDE desktop environments. It also features compatibility with the FHS v2.3 and software developed for version 3.2 of the LSB.

Read the rest of Download links for Debian Linux 5 Lenny ISO / CD / DVD Images (200 words)

© Admin for Debian Admin, 2009. | Permalink | No comment | Add to
Post tags: , ,

Related posts

by Admin at February 19, 2009 02:46 PM

Cheap hack

Do Windows Forensics in Windows FE

It's not exactly news, but it's news to me: There is a secret version of Windows for forensic testing called Windows FE (Forensic Environment). No links for you, but there are lots of forensics blogs talking about it. Microsoft isn't because it's not a Microsoft product. In fact, you can build your own Windows FE with the instructions in this Word document: How to Build Windows FE (Forensic Environment) with the Windows Preinstallation Environment 2.1. According to those instructions:
Only two registry modifications are necessary to change Windows PE into Windows FE—Windows Forensic Environment. With the exception of non-Microsoft drivers or forensic applications, there is nothing in Windows FE that is not contained in Vista SP1 or Windows 2008 installation media.
First you start by installing either the Windows OEM Preinstallation Kit or the Automated Installation Kit (AIK) for Windows Vista SP1 and Windows Server 2008. To get the OPK you need to be enrolled in the Microsoft Partner Program, but anyone can download the AIK, which is designed to assist in creating automated install images for enterprises. Either of these creates a WinPE boot disk that you can modify using the instructions to make a Windows forensic examination environment running in Windows itself. Once you install the AIK, follow the instructions in the "How to Build Windows FE ..." Word document. You'll be mounting an image in the kit, modifying it, recommitting it to an image and burning it to disk. There are a couple of black box sections to the instructions: First, if you're going to need particular device drivers for the forensic environment you'll need to load them into it. The instructions show you how, but you'll need to know what drivers you'll need. Second, the kit doesn't come with any actual forensic examination tools. You'll need to get some. The examples listed in the instructions are "Forensic Acquisition Utilities, X-Ways Forensics, Encase Forensics" and it implies that these have been tested in the WinFE environment. I really want to do this myself, and I'll be building one as soon as I can.

February 19, 2009 02:08 PM


How to encrypt your Linux backups

Linux Security

We covered the creation and extraction of compressed archives such as tar on a Linux machine. A lot of Linux users use these compression formats for backups purposes. Although this compresses pretty well it does not secure the backup. To do that you need to add a password, or to encrypt it. Let’s look at a simple form of securing your backup when you create an archive.

Note: these steps apply to files and folders of any kind - not just ‘backups’.

A quick recap of the compression and extraction of the tar.gzformat. To compress a directory called todays_backup do the following:

# tar -zcf todays_backup.tar.gz todays_backup

This command will compress the directory todays_backup into the compressed file todays_backup.tar.gz. To decompress it use the following command:

# tar -zxf todays_backup.tar.gz

Now to the fun part. Let’s look at how we can add a basic level of encryption to the process we used above. To compress the directory todays_backup with protection do the following:

# tar -zcf - todays_backup|openssl des3 -salt -k yourpassword | dd of=todays_backup.des3

Replace yourpassword with a password of your own. Keep the password to yourself, and keep it carefully. The above command will generate a file called todays_backup.des3. This file can only be decompressed using this password.

To extract your protected archive file todays_backup.des3 use the following command:

# dd if= todays_backup.des3 |openssl des3 -d -k yourpassword |tar zxf -

Make note of the trailing - at the end. It is not a typo, but a requirement for this command to work. Replace yourpassword with the password you used while encrypting the file. Executing the above command will extract the compressed file todays_backup.des3 into a directory todays_backup. Use this encryption with care. As I said earlier, the only way you can retrieve your data once secured is by using the password, so do not lose this password under any circumstances.

Related Articles at Simple Help:

How to encrypt your Linux backups - Simple Help

by Sukrit Dhandhania at February 19, 2009 11:30 AM




The Bi-Cycle is a unique two headed tandem bicycle concepted by industrial designer Elad Barouch. Both riders pedal. Both riders steer. It's like a mix of exercise and couples' counseling that allows two cooperating individuals to look sweet upon the seat.

I wasn't really joking about the couples' counseling part. In a Bike Hacks interview last December, Elad described his inspiration for the bike:

So the idea for final shape of the Bi-Cycle really came since it was the ultimate way to explain my observation about the way to resolve disputes which is in short, at first we need to establish trust and learn to communicate, then we can start moving forward, once we are moving we can master our communication skills and than that is left is pure fun. This is the reason why when riding the Bi-Cycle, both riders have complete control on the steering and the pedaling, making their influence on the riding the same thus creating the need to communicate.

Bike Hacks' Interview With Elad Barouch

by Jason Striegel at February 19, 2009 11:00 AM

Google Blog

It's Girls Day at Google

Today we're celebrating Introduce a Girl to Engineering Day, or Girls Day, as part of National Engineers Week (E-Week) in the U.S. For the second year in a row, we've partnered with the National Girl Scouts to bring girls to six Google offices around the country, where they'll participate in fun activities designed to educate them about engineering, specifically computer science. Googlers, many of them Google Women Engineers, are hosting the guests of honor and leading workshops covering all kinds of topics, including solar powered energy, image processing and a demo of Google Earth. At the end of the day, all of the participants will receive a limited edition "Introduce a Girl to E-Week" patch that they can add to their Scout sashes.

Introduce a Girl to Engineering Day participants in 2008.

Introduce a Girl to Engineering Day is just one important part of E-Week, which was founded by the National Society of Professional Engineers (NSPE) and is celebrating its 50th anniversary this year. By the end of the week, Google offices will have hosted more than 600 students at events designed to expose them to science, technology, engineering, and math (STEM). The students who participate in our E-Week events are from partner organizations that also focus on STEM education for girls, underrepresented minorities, and the economically disadvantaged. Here's hoping each of these students will walk away feeling inspired to pursue studies in these fields.

by A Googler ( at February 19, 2009 10:42 AM

OReilly Radar

Four short links: 19 Feb 2009

Art, astronomy and more fun for you in today's four short links:

  1. Found in Space -- there's an astronomy bot on Flickr that identifies stars in the night sky, and from the unique positions of the stars figures out what bit of the night sky is looked at and then adds notes for interesting parts of the sky visible in the shot. A brilliant use of computer vision techniques to add value to existing data. (via Stinky).
  2. 99 Secrets Twittered -- Matt Webb is posting a secret a day from Carl Steadman's 99 Secrets, an early piece of art on the web. Matt's explanation is worth reading. Ze Frank really made me realize that every web app is a medium for art, for provoking human responses, and now I keenly watch for signs of art breaking out.
  3. Internet Ephemera -- a brief muse on "if we start with the assumption that everything we put online is ephemeral, how does that change what we put online?"
  4. Pockets of Potential (PDF) -- a 52-page PDF talking about opportunities for supporting learning with the mobile devices already in kids' lives (via Derek Wenmoth).

by Nat Torkington at February 19, 2009 10:27 AM

Aaron Johnson


Thoughts on Air Force Blocking Internet Access

Last year I wrote This Network Is Maintained as a Weapon System, in response to a story on Air Force blocks of blogging sites. Yesterday I read Air Force Unplugs Bases' Internet Connections by Noah Shachtman:

Recently, internet access was cut off at Maxwell Air Force Base in Alabama, because personnel at the facility "hadn't demonstrated — in our view at the headquarters — their capacity to manage their network in a way that didn't make everyone else vulnerable," [said] Air Force Chief of Staff Gen. Norton Schwartz.

I absolutely love this. While in the AFCERT I marvelled at the Marine Corps' willingness to take the same actions when one of their sites did not take appropriate defensive actions.

Let's briefly describe what needs to be in place for such an action to take place.

  1. Monitored. Those who wish to make a blocking decision must have some evidence to support their action. The network subject to cutoff must be monitored so that authorities can justify their decision. If the network to be cut off is attacking other networks, the targets of the attacks should also be monitored and use their data to justify action.

  2. Inventoried. The network to be cut off must be inventoried. The network must be understood so that a decision to block gateways A and B doesn't leave unknown gateways C and D free to continue conducting malicious activity.

  3. Controlled. There must be a way to implement the block.

  4. Claimed. The authorities must know the owners of the misbehaving network and be able to contact them.

  5. Command and Control. The authorities must be able to exercise authority over the misbehaving network.

You might notice the first four items are the first four elements of my Defensible Network Architecture 2.0 of a year ago.

Number five is very important. Those deciding to take blocking action must be able to exercise a block despite objections by the site. The site is likely to use terms like "mission critical," "business impact," "X dollars per hour," etc. The damage caused by leaving the malicious network able to attack the rest of the enterprise must exceed the impact of lost network connectivity to the misbehaving network.

It is usually much easier to wrap impact around a network outage than it is to determine the cost of sustaining and suffering network attacks. Loss of availability is usually easier to measure than losses of confidentiality or integrity. The easiest situation is one where downtime confronts downtime, i.e., cutting off a misbehaving site will allow its targets to restore their networks. This would be true of a malicious site conducting a DoS attack against others; terminating the offending denies his network availability but restores the victim's availability. That is why sites are most likely to allow network cutoffs when rogue code in one site is aggressively scanning or DoS'ing a target, resulting in the target losing services.

Does your enterprise have a policy that allows cutting off misbehaving subnets?

Richard Bejtlich is teaching new classes in Europe in 2009. Register by 1 Mar for the best rates.

by Richard Bejtlich ( at February 19, 2009 08:22 AM

OReilly Radar

Twitter Drives Traffic, Sales: A Case Study

Back in December, Dell reported that offers from its Dell Outlet Twitter account had led to more than $1 million in revenue. A small percentage for a company that books $16B in revenue annually--but a nice number nonetheless, particularly in a dreary economy.

Question is: are they the only ones?

I haven't yet found anyone else claiming to have micromessaged their way to a number with six zeroes. But I did have an interesting conversation recently with a company that used Twitter to drive a 20 percent increase in sales in December, and additional growth in February. Here's the story.

Namecheap, a 70-person company headquartered in LA, is a domain name registrar that's been in business for nine years. They rely mostly on word-of-mouth advertising and have just two people who do marketing (one of whom is devoted to SEO); almost everyone else provides customer service.

Michelle Greer, their sole marketing specialist, has been on Twitter personally since 2007, and she thought the service might be a good fit for Namecheap. To convince the CEO, she showed him what Tony Hsieh, the Zappos CEO, was doing on Twitter, including promotions. He gave her the go-ahead to experiment.

Michelle set up a Namecheap Twitter account, and in December, to launch it, she ran a contest: once an hour, she posted a Christmas-related trivia question (she's used TweetLater to preschedule the posts and a book to help her come up with the 600+ trivia questions). To win, you had to be one of the first three @replies with the correct answer. The prize was credit a for one-year domain registration; to receive it, you needed a Namecheap account.

The company considers the contest a success. People got addicted to it, battling to get in the first replies. And they Twittered and blogged about it, too, helping Namecheap's follower count jump from 200 to over 4,000 in the one month and bumping the company's PageRank, too.

So what about the actual business numbers? Namecheap's site traffic increased more than 10 percent in December, driving a 20 percent increase in domain registrations. In addition, Michelle says, "The increase in Twitter followers allowed us to see a 30 percent increase in traffic when we ran a Super Bowl promo on Twitter [in February]."

The contest had costs: primarily Michelle's time and intense attention for the whole month of December. Still, it's no surprise that Namecheap is trying more contests. And they're not the only ones. This week, our friends at Boing Boing launched a Tweet Week contest, giving away cool stuff to help build their Twitter followerships--which can, in turn, help drive blog traffic. Meanwhile, a consortium of four dog-focused businesses--Paw Luxury, Best Bully Sticks, Ask Spike Online and Four Legged Media--are starting the Barkhunt tonight, a scavenger hunt that will last for just one hour, with clues going up on Twitter every five minutes. (I swear, I didn't pick that one because I'm a dog person; if you know of a cat-related Twitter contest, add a comment or @reply me, and I'll update the post.)

Likely, it'll be a while before a contest drives $1 million in revenue through Twitter. But they're not the only way to make non-spam money through the service, and it's interesting to see companies experimenting with the medium.

by Sarah Milstein at February 19, 2009 07:29 AM

Donkey On A Waffle

Blackhat 2009 Papers and Presentations

The papers and presentations from Blackhat 2009 are becoming available as we speak. They can be found HERE. I plan to devour and comment on some of them this week... (assuming I get the time).

February 19, 2009 05:00 AM

OReilly Radar

Hulu's Superbowl Ad and the Boxee Fight

[A note to start. My company, Wesabe, is funded in part by a venture firm, Union Square Ventures, which is one of the funders of Boxee, a character in the drama described below. That said, I've never met or spoken with anyone from Boxee, and have only ever talked to Union Square about them to ask for an invite. I don't have any access to any inside information about Boxee. This post is based instead on the time I spent working at Lucasfilm from 1997 to 1999. Well, really, the following isn't based on Lucasfilm itself, but instead on my conversations with the major studios (of which Lucasfilm is not one -- Fox/Disney/etc., who control distribution, are) about this topic of video on the Internet, which was just starting to be hotly debated at the time.

Some of the comments below also come from participating in discussions about copy protection and Divx, which, if you'll remember, was at that time a self-destructing DVD-like format that would let the studios control how long you could watch their entertainment. No, seriously, it started to self-destruct when it was exposed to air and these people all thought it was certain to win over DVD. Wrong, but instructive.]

The secret to understanding why Hulu's "content providers" -- and boy do they love being called that -- have instructed Hulu to block Boxee users from their "content" -- again, not what they would call it -- isn't some big secret. In fact, it was broadcast during the Superbowl, in Hulu's excellent Superbowl ad:

[Update: I'm told you can't see that embedded video unless you live in the US. If you don't, can you see this YouTube copy? I'll laugh if that works. Let me know. Update again: Yeah, you can watch the Hulu ad from anywhere on YouTube. That's awesome. It's even Hulu's YouTube account that posted it!]

Here's the relevant part, as spoken by Alec Baldwin:

Hulu beams TV directly to your [sardonic gesture to the camera] portable computing devices, giving you more of the cerebral-gelatinizing shows you want, any time, anywhere, for free.

Emphasis added: portable computing devices. Not to your TV -- from your TV. To your dumb-ass laptop, you smelly, hairy, friendless, gamer-freak nerd. (Sorry, I hate to talk about you that way, but that's how they think of the Internet. I think you smell great.) To your TV is something completely different, and from the "content providers'" point of view, completely wrong. Aren't Apple and Tivo and YouTube bad enough as it is?

Boxee was featured in an awesome New York Times article one month ago, with a picture of their product on a big-screen TV, and Hulu's logo clearly visible in the upper right corner. I can almost hear some lawyer somewhere in Hollywood screaming, "I thought Hulu was a WEB SITE! I do NOT see a WEB BROWSER in this PHOTOGRAPH!" at the sight of it. Boxee's blog post on the controversy says they heard from Hulu about this two weeks ago; I'd bet Hulu heard from that lawyer two weeks before that -- the morning the article appeared. Those calls are fun.

You'd think the "content providers" would know already this -- Boxee -- would happen; even with Hulu gone from Boxee, I can still watch Hulu on my TV, albeit with a much lamer interface. Hooking a computer to a TV is easy enough. Maybe they did know, and just waited to see how Boxee would get along until it got too high-profile to ignore. I doubt it, though. Most entertainment lawyers don't go for the idea, "Let's allow it for a little while and see what happens." They instead argue, "Let's stop it immediately and see if we have a better option we can control more, later." I'd guess Hulu had a deal to show "content" on computers, and the "content providers" balked when those computers started talking to their precious televisions.

So why does Hulu exist at all? Hulu must have seemed like the "better option" for getting people to watch TV ads on their computers -- better, perhaps, than the iTunes Music Store selling the same "content" piecemeal and getting price control over video as they have for music. Or, perhaps, than YouTube, selling and showing ads without the studios necessarily involved in any way. Let's control ads on the Internet by putting them on our "content" through Hulu, an entertainment industry company, not a smelly nerd company. Great. It's a plan.

Maybe BitTorrent came up in the discussion, but I doubt it -- BitTorrent is for smelly nerds. This isn't about you folks. This is about the mass market. Those people can't be disrupting their TV watching with some WEB SITE they saw in the NEW YORK TIMES.

So that's my guess about why Hulu blocked Boxee: those ads you see on Heroes are higher margin when you see them on your TV than when you see them on Hulu, and the only reason they're on Hulu is to make money from Heroes when you watch it online, so Apple or Google doesn't make that money instead. They were meant for your "portable computing devices" and not your precious TV. Now go back to the couch until we call for you again.

I'm sure Hulu is totally pissed. They pretty much said just that in a somewhat more stilted way. The real insult, though, is calling the people who made them cut Boxee off "content providers." They might as well have told the studios they are the moral equivalent of the guy schlepping reels around the projector booth. Someone will win this war eventually, they seem to be saying, and you could have helped make it us. Now you have a choice: someone else -- not you, someone smart -- will win instead, or you can change your mind.

That's pretty much my view, too. DVDs (mentioned in the note at the start) became a big boon for the studios, once their crazy ideas about self-destructing Divx discs went the way of the Dodo. The studios have a very long history of betting against technology people want, and on technology people don't want. This is just another such case. The technology people want always wins in the end -- no duh -- and usually benefits the businesses who fought that technology to the death. Here's hoping the technology people want -- Boxee -- doesn't wind up benefiting the studios fighting it now.

by Marc Hedlund at February 19, 2009 04:56 AM

February 18, 2009

LOPSA blogs


We are happy to announce a LOPSA Birds of a Feather at SCaLE. This is your chance to hang with Jesse Trucks, Chris St. Pierre, and all the other crazy SAs representing LOPSA at SCaLE this year.

The BoF will be from 6:00 - 7:00PM on Friday night in the Logan Room. So be sure to wander over after the ZenOSS happy hour.

See for the details on time and locations.

If you have any questions about the BoF or our other activities at SCaLE this year, please email

by solarce at February 18, 2009 11:45 PM


Tab navigating the Mac UI

It was driving me nuts when I got my MacBook that I could not use the tab button to navigate across a form. Let alone forms, on a browser page, I could only tab between the address bar and the search field. To go anywhere else, I had to reach for the mouse. Aaargh!!Surely the operating system that was God's gift to humanity could do better than that! It took a fair amount of Googling for me

by Sri Sankaran ( at February 18, 2009 09:36 PM

Google Webmasters

State of the Index: my presentation from PubCon Vegas

It seems like people enjoyed when I recreated my Virtual Blight talk from the Web 2.0 Summit late last year, so we decided to post another video. This video recreates the "State of the Index" talk that I did at PubCon in Las Vegas late last year as well.

Here's the video of the presentation:

and if you'd like to follow along, here are the slides:

You can also access the presentation directly. Thanks again to Wysz for recording this video and splicing the slides into the video.

by Michael Wyszomierski ( at February 18, 2009 04:23 PM

Donkey On A Waffle

Vulnerability Discovery - A popularity contest

I just read a new blog post on the The Top Ten Vulnerability Discoverers of All Time - by Gunter Ollman at the Frequency X Blog. I have the utmost respect for the X-Force folks, many of the best researchers and security practitioners in the world today have come from this camp over the course of the last 15+ years. And to be completely honest, I understand why this information would be of interest to the blog readers (I probably would have published it as well had I owned it). However, I hate what it represents...

At one point in history, vulnerability research and discovery was about fixing the bugs and stopping the bad guys from abusing the holes. Somewhere along the line it became a game of "I'm cooler... I found the most interesting flaw!". And finally, as if that wasn't bad enough, it appears as if the latest bragging right is "I found the MOST flaws!". My thoughts on this is.. "Who Cares?!". Let's get back to fixing things because it's the right thing to do. Let's get back to working with the vendors to make the computing world safer. Let's stop worrying about flaw counts and who's the most uber. Sadly.. I don't think we can go back in time - R.I.P. the good old days.

February 18, 2009 03:00 PM


Simple Help Survey: Win a $25 Amazon Gift Certificate

I would be insanely grateful if you could take 3 minutes (I timed it) to complete this survey. If you enter your email address at the end of the survey, you’ll be entered into a contest to win a $25 USD Amazon Gift Certificate. It’s also worth mentioning that you do not need to enter your name or email address to complete this survey, and absolutely no personally identifiable information is gathered. If you do opt to enter the contest, your email address will never be sold, used to spam you, solicit information etc. All of the email addresses will be deleted after the contest winner has been selected. Thanks very much in advance!!

Related Articles at Simple Help:

Simple Help Survey: Win a $25 Amazon Gift Certificate - Simple Help

by Ross McKillop at February 18, 2009 02:00 PM

Linux Poison

Bluetooth Device Manager - Blueman

Blueman  is a GTK+ Bluetooth Manager Blueman is designed to provide simple, yet effective means for controlling BlueZ API and simplifying bluetooth tasks such as: * Connecting to 3G/EDGE/GPRS via dial-up * Connecting to/Creating bluetooth networks * Connecting to input devices * Connecting to audio devices * Sending/Receiving/Browsing files via OBEX * Pairing Blueman also integrates with

by Nikesh Jauhari ( at February 18, 2009 01:43 PM


How to resolve the ‘/bin/rm: Argument list too long’ error


root@dwarf /var/spool/clientmqueue # rm spam-*
/bin/rm: Argument list too long.

Ever seen this error in Linux when you have too many files in a directory and you are unable to delete them with a simple rm -rf *? I have run into this problem a number of times. After doing a bit of research online I came across a neat solution to work around this issue.

find . -name 'spam-*' | xargs rm

In the above instance the command will forcefully delete all files in the current directory that begin with spam-. You can replace the spam-* with anything you like. You can also replace it with just a * if you want to remove all files in the folder.

find . -name '*' | xargs rm

We have covered the Linux find command in great detail earlier. Xargs is Linux command that makes passing a number of arguments to a command easier.

Related Articles at Simple Help:

How to resolve the ‘/bin/rm: Argument list too long’ error - Simple Help

by Sukrit Dhandhania at February 18, 2009 01:00 PM

The Daily ACK

AppViz []

Helps iPhone developers download and visualize their application sales.

by aallan at February 18, 2009 12:11 PM

OReilly Radar

Four short links: 18 Feb 2009

A day of optimism (or is it pessimism), mobile, and local. Enjoy!

  1. How Are You Coping With Anxiety Collapse? (BoingBoing) -- the economy, collapse, potential Depression, world to wrack and ruin was a repeating theme of Kiwi Foo Camp this year. We had a debate, the moot being "New Zealand is Fucked". You'll be glad to know that the opposing team won, but largely on the grounds that "the rest of the world will be worse off than us". Best line was at the end, when moderator Russell Brown said, "ok, let's head back for a drink" and the final speaker of the opposing team pushed his glass across the table and said, "here, have mine--it's half full." Anyway, a timely and pressing subject and the human stories in the comments are fascinating.
  2. Let my board and me become as one: the Wii balance board/Google Earth mashup -- groovy UI hack that lets you surf the world via Google Earth.
  3. Exporting the Past into the Future (Matt Jones) -- wonderful exploration of location-based services from an eminently human point of view. "Where you are right now has limited value".
  4. Top 10 UK Android Mobile Apps -- how utterly banal the items on the list are, proof positive that Android has made it into the mainstream.

by Nat Torkington at February 18, 2009 11:27 AM

Year in the Life of a BSD Guru

Women Needed for Research Survey

Whitney Powell, a student at Appalachian State University, is conducting an independent study/research project on the effects of gender socialization on women in open source. If you are a woman who uses or is involved with open source and would like to participate, you can find further details here.

February 18, 2009 11:01 AM


Box2D JS - Javascript 2D physics library


Chances are, you've played a Flash game or two that makes use of a 2D Newtonian physics engine. If you're a Javascript coder, though, there's no reason why you should feel left out of the fun.

Box2D JS is a Javascript port of the popular Box2DFlashAS3 library. This is cool for a couple of reasons. First, it brings a simple 2D physics API to Javascript. Almost as important, it's the exact same API that Flash developers have been using, so there's an existing set of documentation and a lot of sample code can be easily ported.

Box2D JS

by Jason Striegel at February 18, 2009 11:00 AM

Tumbleweeds's Rants

Fun with Squid and CDNs

One neat upgrade in Debian’s recent 5.0.0 release1 was Squid 2.7. In this bandwidth-starved corner of the world, a caching proxy is a nice addition to a network, as it should shave at least 10% off your monthly bandwidth usage. However, the recent rise of CDNs has made many objects that should be highly cacheable, un-cacheable.

For example, a YouTube video has a static ID. The same piece of video will always have the same ID, it’ll never be replaced by anything else (except a “sorry this is no longer available” notice). But it’s served from one of many delivery servers. If I watch it once, it may come from

But the next time it may come from And that’s not all, the signature parameter is unique (to protect against hot-linking) as well as other not-static parameters. Basically, any proxy will probably refuse to cache it (because of all the parameters) and if it did, it’d be a waste of space because the signature would ensure that no one would ever access that cached item again.

I came across a page on the squid wiki that addresses a solution to this. Squid 2.7 introduces the concept of a storeurl_rewrite_program which gets a chance to rewrite any URL before storing / accessing an item in the cache. Thus we could rewrite our example file to

We’ve normalised the URL and kept the only two parameters that matter, the video id and the itag which specifies the video quality level.

The squid wiki page I mentioned includes a sample perl script to perform this rewrite. They don’t include the itag, and my perl isn’t good enough to fix that without making a dog’s breakfast of it, so I re-wrote it in Python. You can find it at the end of this post. Each line the rewrite program reads contains a concurrency ID, the URL to be rewritten, and some parameters. We output the concurrency ID and the URL to rewrite to.

The concurrency ID is a way to use a single script to process rewrites from different squid threads in parallel. The documentation is this is almost non-existant, but if you specify a non-zero storeurl_rewrite_concurrency each request and response will be prepended with a numeric ID. The perl script concatenated this directly before the re-written URL, but I separate them with a space. Both seem to work. (Bad documentation sucks)

All that’s left is to tell Squid to use this, and to override the caching rules on these URLs.

storeurl_rewrite_program /usr/local/bin/
storeurl_rewrite_children 1
storeurl_rewrite_concurrency 10

#  The keyword for all youtube video files are "get_video?", "videodownload?" and "videoplaybeck?id"
#  The ".(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\?" is only for pictures and other videos
acl store_rewrite_list urlpath_regex \/(get_video\?|videodownload\?|videoplayback\?id) .(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\? \/ads\?
acl store_rewrite_list_web url_regex ^http:\/\/([A-Za-z-]+[0-9]+)*.[A-Za-z]*.[A-Za-z]*
acl store_rewrite_list_path urlpath_regex .(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)$
acl store_rewrite_list_web_CDN url_regex ^http:\/\/[a-z]+[0-9]

# Rewrite youtube URLs
storeurl_access allow store_rewrite_list
# this is not related to youtube video its only for CDN pictures
storeurl_access allow store_rewrite_list_web_CDN
storeurl_access allow store_rewrite_list_web store_rewrite_list_path
storeurl_access deny all

# Default refresh_patterns
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0

# Updates (unrelated to this post, but useful settings to have):
refresh_pattern*.(cab|exe)(\?|$) 518400 100% 518400 reload-into-ims
refresh_pattern*.(cab|exe)(\?|$) 518400 100% 518400 reload-into-ims
refresh_pattern*.(cab|exe)(\?|$) 518400 100% 518400 reload-into-ims
refresh_pattern (Release|Package(.gz)*)$        0       20%     2880
refresh_pattern .deb$         518400   100%    518400 override-expire

# Youtube:
refresh_pattern -i (get_video\?|videodownload\?|videoplayback\?) 161280 50000% 525948 override-expire ignore-reload
# Other long-lived items
refresh_pattern -i .(jp(e?g|e|2)|gif|png|tiff?|bmp|ico|flv)(\?|$) 161280 3000% 525948 override-expire reload-into-ims

refresh_pattern .               0       20%     4320

# All of the above can cause a redirect loop when the server
# doesn’t send a "Cache-control: no-cache" header with a 302 redirect.
# This is a work-around.
minimum_object_size 512 bytes

Done. And it seems to be working relatively well. If only I’d set this up last year when I had pesky house-mates watching youtube all day ;-)

It should of course be noted that doing this instructs your Squid Proxy to break rules. Both override-expire and ignore-reload violate guarantees that the HTTP standards provide the browser and web-server about their communication with each other. They are relatively benign changes, but illegal nonetheless.

And it goes without saying that rewriting the URLs of stored objects could cause some major breakage by assuming that different objects (with different URLs) are the same. The provided regexes seem sane enough to not assume that this won’t happen, but YMMV.

#!/usr/bin/env python
# vim:et:ts=4:sw=4:

import re
import sys
import urlparse

youtube_getvid_res = [

youtube_playback_re = re.compile(r"^http:\/\/(.*?)\/videoplayback\?id=(.*?)&(.*?)$")

others = [
    (re.compile(r"^http:\/\/(.*?)\/(ads)\?(?:.*?)$"), "http://%s/%s"),
    (re.compile(r"^http:\/\/(?:.*?)\/(?:.*?)\/(.*?)\?(?:.*?)$"), ""),
    (re.compile(r"^http:\/\/(?:(?:[A-Za-z]+[0-9-.]+)*?).(.*?).(.*?)\/(.*?).(.*?)\?(?:.*?)$"), "http://cdn.%s.%s.SQUIDINTERNAL/%s.%s"),
    (re.compile(r"^http:\/\/(?:(?:[A-Za-z]+[0-9-.]+)*?).(.*?).(.*?)\/(.*?).(.{3,5})$"), "http://cdn.%s.%s.SQUIDINTERNAL/%s.%s"),
    (re.compile(r"^http:\/\/(?:(?:[A-Za-z]+[0-9-.]+)*?).(.*?).(.*?)\/(.*?)$"), "http://cdn.%s.%s.SQUIDINTERNAL/%s"),
    (re.compile(r"^http:\/\/(.*?)\/(.*?).(jp(?:e?g|e|2)|gif|png|tiff?|bmp|ico|flv)\?(?:.*?)$"), "http://%s/%s.%s"),
    (re.compile(r"^http:\/\/(.*?)\/(.*?)\;(?:.*?)$"), "http://%s/%s"),

def parse_params(url):
    "Convert a URL’s set of GET parameters into a dictionary"
    params = {}
    for param in urlparse.urlsplit(url)[3].split("&"):
        if "=" in param:
            n, p = param.split("=", 1)
            params[n] = p
    return params

while True:
    line = sys.stdin.readline()
    if line == "":
        channel, url, other = line.split(" ", 2)
        matched = False

        for re in youtube_getvid_res:
            if re.match(url):
                params = parse_params(url)
                if "fmt" in params:
                    print channel, "" % (params["video_id"], params["fmt"])
                    print channel, "" % params["video_id"]
                matched = True

        if not matched and youtube_playback_re.match(url):
            params = parse_params(url)
            if "itag" in params:
                print channel, "" % (params["id"], params["itag"])
                print channel, "" % params["id"]
            matched = True

        if not matched:
            for re, pattern in others:
                m = re.match(url)
                if m:
                    print channel, pattern % m.groups()
                    matched = True

        if not matched:
            print channel, url

    except Exception:
        # For Debugging only. In production we want this to never die.
        print line


  1. Yes, Vhata, Debian released in 2009, I won the bet, you owe me a dinner now. 

by tumbleweed at February 18, 2009 10:29 AM


Linux Tips: bash completion: /dev/fd/62: No such file or directory

This post will show how to deal with the issue I had on a newly installed debian lenny xen virtual machine. I used xen-tools to create the vm using the deboostrap method, and all was fine. I installed the bash-completion package, as probably most of you, I can’t live without bash completion, but quickly found out that it was broken. Any attempt to perform a filelist completion was failing with this error:
vm11:~# tail -f /va<TAB>-bash: /dev/fd/62: No such file or directory
-bash: /dev/fd/60: No such file or directory

and this basically makes the bash completion useless. On a quick look I could easily see that the /dev/fd link was not there and that was the main cause of the problem. Still on one older lenny vm I had for a couple of months this was not happening (from what I can tell because it had an older version of the /etc/bash_completion file). There are several ways to fix this starting with the obvious one to downgrade /etc/bash_completion but I didn’t like that, so I looked for some other ways.

Manually create the /dev/fd link

This is a quick fix: we create the /dev/fd link manually:
ln -s /proc/self/fd /dev/fd
and bash completion will start working immediately.

Still, after the vm is restarted this link will be lost as /dev is not a persistent storage. We could of course add the link creation command to rc.local, but at this point I asked myself why udev was not creating that file automatically as it was present under /dev/.static/dev/fd?

Udev was missing

Then I realized that udev package was not installed, and this was the reason for the static devices not being created. I manually installed udev using:
aptitude install udev
and this was the solution for my problem. Hopefully you will find this post useful if you have the same problem, and it will save you the trouble to go into looking yourself for a solution for this.

by - Marius - at February 18, 2009 09:57 AM


Mormons funding Hawaii civil union attack website?

Remember how I talked about Hawaii Family Forum and its efforts to stop civil unions in Hawaii? Well, it turns out that they’re getting some help. And if you’ve read my post on California’s Proposition 8 you’ll recognize the funders.

Andrew Cooper (via Rick MacPherson’s post) dug up some unsurprising news: a horrid website trying to foment dissent for the civil union bill is based in Orem, Utah. 98% of religious adherents in the Provo-Orem area are Mormons.

That means the odds are quite good that the good ol’ fancy underwear Mormons are bankrolling efforts to continue discrimination.

Shocked? I’m not. The Mormon church is extremely rich and uses its fortune to try to force its religion down everybody’s throat, whether by funding missionaries to spread their gospel to everybody who will hear or by trying to cram discriminatory laws through our legislatures.

Adding to this, Mr. Cooper has also found similarities between this attack site and one attacking Honolulu’s proposed rail system. That connection is a little more bizarre, I have to admit.

by Brad at February 18, 2009 07:58 AM


Need Inodes? (18 Feb 2009)

Need to manipulate many many files quickly? ZFS has some advantages."In ZFS inodes are allocated on demand and so the question came up, how many files can I store onto a piece of storage. I managed to scrape up an old disk of 33GB, created a pool and wanted to see how many 1K files I could store on that storage. " creating files on ZFS

February 18, 2009 07:44 AM

The Hive Archive

Links for 2009-02-17 []

February 18, 2009 06:00 AM

Adnans SysDev


Stop it with the X- Already!

(for comments by: Rob Sayre, Jess Austin, DeWitt Clinton, Subbu Allamaraju, Julian Reschke, Nick Lothian, Kris Zyp, Max, Anonymouse, Julian Reschke, Mark Nottingham, John Haugeland, Mark Nottingham, Kris Zyp, Mark Nottingham, Mark Nottingham, Mark Baker, Mark Nottingham, Roy T. Fielding, see this entry's page.)

Sometimes, it seems like every time somebody has a great idea for a new HTTP header, media type, or pretty much any other protocol element, they do the same thing. Rather than trying to figure out how to fit into how the rest of the world operates, getting adequate review and socialising their proposal, they just stick a bloody X- on the front and ship it.

The IETF has no-one to blame but itself for this situation, of course. X- was a convention designed for experimentation (hence the ‘x’). However, the problem is in transitioning from experimental to production; as soon as your header (or whatever) escapes the lab and has to interoperate with another system, it’s no longer experimental, it’s on the Internet. Oops.

RFC4288 tried to address this situation for media types;

However, with the simplified registration procedures described above for vendor and personal trees, it should rarely, if ever, be necessary to use unregistered experimental types. Therefore, use of both “x-” and “x.” forms is discouraged.

Likewise, RFC3864 makes the same attempt for HTTP and all other message headers, albeit not so explicitly; instead, by setting up a provisional header repository whose procedure consists of sending an e-mail, we hoped that people wouldn’t feel the need to use X-.

Perhaps in vain, unfortunately. Dan points out how Palm has decided to extend HTML;

A widget is declared within your HTML as an empty div with an x-mojo-element attribute.

<div x-mojo-element=”ToggleButton” id=”my-toggle”></div>


Let’s Make it Easy

Because of this, I think we (the standards community ‘we’) need to over-communicate how formats and protocols should be extended, and under what conditions; indeed, this is one of the explicit goals we wrote into the HTTPbis charter. Only then can we justifiably be angry with people who get it wrong.

In the meantime;

  1. If you’re trying to introduce a new HTTP/SMTP/etc. header, the minimum bar is sending an e-mail to get it into the Provisional Header Repository, as per RFC3864.
  2. If you’re trying to introduce a new media type, have a look at RFC4288 and consider the vnd tree.
  3. If you’re trying to extend HTML, talk to the XHTML, HTML5 and Microformats folks about that.

In none of these cases (or, for that matter, any other case) should an “X-” show its ugly little four-armed, dashed self.

A Few Words on URI-Based Extensibility

As Dan says, the TAG has weighed in on the proper way for formats to enable extensibility, using URIs. This is very relevant to the discussions we’re having about the Link header, since one of the goals is to retrofit URI-based extensibility onto the link relation type identifiers, so that people can ground them in a global namespace.

Atom started this practice, and so far, it’s working reasonably well. There may be a couple of bumps on the road; as DeWitt points out, there are an awful lot of bare-word link relations in use in HTML already. As such, I’m suspecting that we’re going to actually have three kinds of relation type identifiers in reality; absolute URIs for well-identified extension relations, “short” registered identifiers for common well-identified relations, and “short” unregistered relations for ad hoc, locally-scoped and uncoordinated extension relations.

Also, I’d be remiss if I didn’t point out that too much extensibility can be anti-social. Which is why it’s not only necessary how to extend something, but when not to. Otherwise, everyone will be talking past each other using their own proprietary languages.

by Mark Nottingham at February 18, 2009 04:44 AM

OReilly Radar

Anatomy of "Connect"

I'm here at Webstock in New Zealand working on my talk for tomorrow (Open, Social Web) and one of the things I've been thinking about is all of the different "Connect" applications and products that have recently sprung into existence. I mean, we have Facebook Connect, Google Friend Connect, MySpace (thankfully not "Connect") ID, TypePad Connect, RPX and I'm sure the list goes on. I'm trying to break down all of these products - ignoring the underlying open or proprietary technologies that make them tick - toward a straw man definition of a "Connect" application:

  1. Profile: Everything having to do with identity, account management and profile information ranging from sign in to sign out on the site I'm connecting with.
  2. Relationships: Think social graph. Answers the question of who do I know on the site I've connected with and how I can invite others.
  3. Content: Stuff. All of my posts, photos, bookmarks, video, links, etc that I've created on the site I've connected with.
  4. Activity: Poked, bought, shared, posted, watched, loved, etc. All of the actions that things like the Activity Streams project are starting to take on.

In my mind, the Goals of all of these "Connect" applications are focused on helping people discover new content, people they already know as well as new people with similar interests. They also all help to reduce some of the major pain points when it comes to decentralization of social networks; signing up for a new account, eliminating the manual process of filling out your profile, uploading a photo and going through that madness of "re-friending" your friends time and time again. While all of these features aren't new, how this style of application combines them all certainly seems to be. If 2008 was the year of social application platforms (Facebook Platform and OpenSocial), perhaps 2009 will be all about "Connect" - whatever that means.

(I've put together an example of this using Facebook Connect and Citysearch as it seems to be the most complete example that I can find.)

by David Recordon at February 18, 2009 04:20 AM

Penguim's Place

Vulnerabilidade no sudo

Foi descoberta uma vulnerabilidade no sudo que atinge os seguintes sistemas: Ubuntu 8.04 LTS Ubuntu 8.10 Um erro na manipulação de mudanças nos privilégios dos grupos permite que um atacante local pertecente a um grupo incluido numa lista de “Executar como ” no arquivo /etc/sudoers, obtenha previlégios de root. Esta falha não atinge o arquivo padrão instalado junto [...]

by penguim at February 18, 2009 02:35 AM


How to purchase US iTunes content in Canada


Thanks to my friend Armen for this trick. This tutorial will guide you through the steps required to buy or rent content that’s only available in the US version of the iTunes Store, from Canada. I would be hugely appreciative if you could give this a try if you live in Australia, England, France, Italy or any country other than Canada and let me know if it works by leaving a comment.

  1. The most important part of this trick is to purchase a MasterCard gift card (a temporary/disposable MasterCard). These are available in many stores across Canada, including Macs and Shoppers Drug Mart. They come in $25, $50 and $100 denominations.
  2. mastercard gift card
    click to enlarge

  3. Launch iTunes. Select iTunes Store from the left navigation, and if you already have a Canadian account, click on the ‘your email address’ button in the upper right corner.
  4. Click the Sign Out button.

  5. click to enlarge

  6. Now back in the iTunes Store, click the Sign In button.
  7. Click the Create New Account button.

  8. click to enlarge

  9. On the intro screen, click the Continue button.

  10. click to enlarge

  11. You’ll be asked to enter your credit card and address info. Don’t. Look for the line that says If the billing address is not in Canada, click here, and click the click here link.

  12. click to enlarge

  13. In the Select a country: drop-down list, select US. Now click the Change Country button.

  14. click to enlarge

  15. You’ll be taken back to the iTunes Store, where once again you’ll have to click the Sign In button.
  16. And again, click the Create New Account button.

  17. click to enlarge

  18. Notice that the iTunes Store now thinks you’re in the US. Enter in the gift card MasterCard info, and a US billing address. You can probably just make one up, but you might want to confirm that the ZIP code and city match up. Update: using an address in Oregon is a good idea as they have no state tax. Thanks @staciebee!

  19. click to enlarge

  20. Once you’ve filled in all the appropriate info, click the Done button.

  21. click to enlarge

  22. Now you’ll be signed in to the US iTunes Store. Navigate to whatever content it is you want to buy/rent and go for it!

  23. click to enlarge

  24. You should have no problems downloading the movies/music/TV show etc.

  25. click to enlarge

Related Articles at Simple Help:

How to purchase US iTunes content in Canada - Simple Help

by Ross McKillop at February 18, 2009 01:17 AM

Ubuntu Geek

Howto Install Elements for Compiz Fusion

Elements is a plugin for Compiz Fusion 0.7.4 which integrates all the features of the popular Snow, Autumn, Fireflies, and Stars plugins, plus an all new feature, Bubbles. Written from the ground-up with only open source software, Elements is designed to be free, fast, and fun. It’s also fully customizable. You want flower petals falling in spring? Draw the petals and use them with the Autumn feature. Want to have toasters flying towards you at warp speed? Take a picture of your toaster and use it with the Stars feature. Feel really adventuresome? Take a look at the code and make it your own. Elements is completely free and released under the GNU General Public Licence (GPL).
Read the rest of Howto Install Elements for Compiz Fusion (91 words)

© admin for Ubuntu Geek, 2009. | Permalink | 3 comments | Add to
Post tags: , ,

Related Articles

by admin at February 18, 2009 12:48 AM

Sam Ruby

White House FeedBack Loop

David Glasser: Looks like it might be fixed now?

The “it” in this case is the White House feed.  Before it used the same UUID in each entry.  That would have been valid, but only if the intent was that the blog only contained one entry, and that entry was replaced in place on each new post.  Clearly that is not the intent, so it was invalid.

On Sunday, I deployed a change to the FeedValidator to indicate this error.

Today the feed has been corrected.


Now I doubt that the people who maintain the software that runs the White House blog read this blog.  Perhaps they care about the Feed Validator.  More likely it was just a coincidence.  But in any case, they have gone from something that an Asshole might claim is spec compliant, to being a Moron, and in such a way that isn’t likely to cause anywhere near as much problem as before, and furthermore in a way that escaped notice of the Feed Validator.

Until now.  The UUIDs aren’t sufficiently unique, but more importantly, they aren’t UUIDs.

Apparently, they are in good company, because on page 6 of RFC 5005 in the second example, the feed/entry/@id has the wrong number of characters in the first part, and non-hex digits in the last two parts.

In any case, I’m not certain that messing with the White House in this matter is a good idea.  Should I happen to mysteriously disappear, let’s just say that it has been nice knowing you.

February 18, 2009 12:34 AM

February 17, 2009

Geeking with Greg

Jeff Dean keynote at WSDM 2009

Google Fellow Jeff Dean gave an excellent keynote talk at the recent WSDM 2009 conference that had tidbits on Google I had not heard before. Particularly impressive is Google's attention to detail on performance and their agility in deployment over the last decade.

Jeff gave several examples of how Google has grown from 1999 to 2009. They have x1000 the number of queries now. They have x1000 the processing power (# machines * speed of the machines). They went from query latency normally under 1000ms to normally under 200ms. And, they dropped the update latency by a factor of x10000, going from months to detect a changed web page and update their search results to just minutes.

The last of those is very impressive. Google now detects many web page changes nearly immediately, computes an approximation of the static rank of that page, and rolls out an index update. For many pages, search results now change within minutes of the page changing. There are several hard problems there -- frequency and importance of recrawling, fast approximations to PageRank, and an architecture that allows rapid updates to the index -- that they appear to have solved.

Their performance gains are also impressive, now serving pages in under 200ms. Jeff credited the vast majority of that to their switch to holding indexes completely in memory a few years back. While that now means that a thousand machines need to handle each query rather than just a couple dozen, Jeff said it is worth it to make searchers see search results nearly instantaneously.

The attention to detail at Google is remarkable. Jeff gleefully described the various index compression techniques they created and used over the years. He talked about how they finally settling on a format that grouped four delta of positions together in order to minimize the number of shift operations needed during decompression. Jeff said they paid attention to where their data was laid out on disk, keeping the data they needed to stream over quickly always on the faster outer edge of the disk, leaving the inside for cold data or short reads. They wrote their own recovery for errors with non-parity memory. They wrote their own disk scheduler. They repeatedly modified the Linux kernel to meet their needs. They designed their own servers with no cases, then switched to more standard off-the-rack servers, and now are back to custom servers with no cases again.

Google's agility is impressive. Jeff said they rolled out seven major rearchitecture efforts in ten years. These changes often would involve completely different index formats or totally new storage systems such as GFS and BigTable. In all of these rollouts, Google always could and sometimes did immediately rollback if something went wrong. In some of these rollouts, they went as far as to have a new datacenter running the new code, an old datacenter running the old, and switch traffic between datacenters. Day to day, searchers constantly were experiencing much smaller changes in experiments and testing of new code. Google does all of this quickly and quietly, without searchers noticing anything has changed.

The raw computational power is staggering already -- thousands of machines for a single request -- but what is to come seems nearly unbelievable. Jeff said that Google's machine translation models use a million lookups in a multi-terabyte model just to translate one sentence. Jeff followed by saying that Google's goal is to make all information in all languages accessible regardless of which language you choose to speak. The amount of processing required is difficult to fathom, yet it seems the kind of computational mountain that might cause others to falter calls out to Googlers.

In all, a great talk and a great start to the WSDM 2009 conference. If you want to catch Jeff's talk yourself, and I highly recommend you do, the good people at were filming it. With any luck, the video should be available on their site in a few weeks.

In addition to this post, you might enjoy Michael Bendersky's notes on Jeff Dean's talk. It appears Michael took about as detailed notes as I did.

by Greg Linden ( at February 17, 2009 09:06 PM

OReilly Radar

Web 2.0 Expo: Launchpad Extended, Developer Discount and Ignite!

web expo outside sign

The Web 2.0 Expo is our annual West Coast gathering of web technologists. As always there a lot of ways to participate -- many of which will not cost you a dime, all of them can be quite valuable to you.

Launchpad: We are giving 5 companies 5 minutes on stage at the Expo Launchpad this year. While the definition of a launch has gotten cloudy in this age of public betas, we're looking for new companies or products that make us take notice. And while venture capital has been the focus in past years, the reality of the market is that companies must gain the attention of customers. So our judging panel and criteria this year focus more on what is essential and transformational in today's market than on ability to get funded. The judges this year are Matt Marshall (VentureBeat), Marshall Kirkpatrick (ReadWriteWeb), myself and our sponsor Microsoft Bizspark's Anand Iyer.

We are extending the
Launchpad's submission deadline to 2/19. Put your hat in the ring!

Developer Discount: We're offering a special discount to the full conference for developers. If you're a developer you still need to learn from each other and share your stories. More than ever you'll need to learn how to communicate with marketing. You'll want to learn how to write faster code, be secure and, of course, be greener. You'll want to find out how to get value from data, how to process that data on less servers than before, and how to respect your users by giving them control of their data.

To get 20% off admission to the Web 2.0 Expo use websf09dev at checkout. This offer is open until 2/20.

Ignite: On March 31st we are going to hold our annual Ignite Expo at the DNA Lounge. Each speaker gets 20 slides that auto-advance after 15 seconds. We'll have ~16 speakers. The best Ignite talks are ideas, hacks, lessons, or war stories. Submissions are due by 3/9. If you're not familiar with Ignite our community site has videos from previous events.

expo logo

This year's Web 2.0 Expo is strong. Our keynotes will feature Padmasree Warrior (Cisco), Will Wright (Maxis), and Vic Gundotra (Google). Our Developer, Marketing, Design, Mobile, Ops, Enterprise and Security tracks are packed with great speakers. I hope to see you there.

Photo Courtesy Duncan Davidson

Updated with Anand Iyer's name.

by Brady Forrest at February 17, 2009 07:34 PM

Blog o Matty

Bash tips

I read through the bash tips on the hacktux website, which brought to light the fact that you can do basic integer math in your bash scripts. This is easily accomplished by using dual parenthesis similar to this: four=$(( 2 + 2 )) echo $four This is good stuff, and I need to replace some old `bc …` [...]

by matty at February 17, 2009 07:07 PM

OReilly Radar

State of the Computer Book Market 2008, Part 1: The Market

As described in Computer Book Sales as a Technology Trend Indicator, and our other posts on the State of the Computer Book Market we have an updated series of posts that show the whole market's final 2008 numbers. Remember this data is from Bookscan's weekly top 3,000 titles sold. Bookscan measures actual cash register sales in bookstores. Simply put, if you buy a book in the United States, there's a high probability it will get recorded in this data. Retailers such as Borders, Barnes & Noble, and Amazon make up the lion's share of these sales.

Book Market Performance

Before we get to the specifics of the computer book market, let's get some context by looking at the whole book market for the Week ending 12/28/2008. Everything that is printed, bound and sold like a book, from Harry Potter and The Tales of Beedle the Bard to Breaking Dawn and Outliers is represented in the table below.

Overall Book Market - EVERYTHING -Week Ending: 2008-12-28

Category Sales '000's Share YoY

Adult Non-Fiction




Adult Fiction




Juvenile Non-Fiction




Juvenile Fiction
















As you can see, the computer market is only 1% of total unit sales in bookstores and online retailers. The Computer category was the only category down [-8%] year-over-year.

Immediately below is the year-on-year trend for the entire computer book market since 2004, when we first obtained reliable data from Bookscan. Please remember the data is for all publishers and NOT just O'Reilly. The slightly-thicker red line represents the 2008 data.

Computer Book Market  - 2004-2008

Click on the image to get a larger view.

As you can see, the clear seasonal pattern we've pointed out before still exists. That is, we have a strong start that declines through the summer, spikes for the Fall 'Back to School season' and finishes strong. The trend line for each year closely mirrors the year before, with remarkably consistent weekly ups and downs.

What you won't see on this chart is that the computer book market cratered in 2001, shrinking twenty percent a year for three years until it stabilized in 2004 at about half the size that it was in 2000. (We only have reliable data going back to 2004.) We are hoping that the cratering we experienced in the second half of 2008 will not be as pronounced or long as 2001 because the current economic conditions are not squarely for computer books centered around Tech. That being said, 2008 was the worst performing year since we've been collecting the Bookscan data. The chart immediately below shows total units by year for the Computer book category. As you can see, 2008 was the worst year for unit sale in the computer book market.

Computer Book Market - Overall Units by Year

So what was the news in 2008? The year got off to healthy start, but around July the sales trended downward and for the remainder of the year never rose above the equivalent week in 2007. In the first half of 2008, the Market was up 67,000 units compared to 2007 but in the second half, 417,000 fewer units were sold in 2007 -- so the overall market decreased by 350,000 units. The majority of this shortfall happened during the fourth quarter of 2008 where nearly 300,000 fewer units were sold.

Another way to look at the market is with our Treemap visualization tool. This tool helps us pick up on trends quickly, even when looking at thousands of books. It works like this:

The size of a square shows the market share and relative size of a category, while the color shows the rate of change in sales. Red is down, and green is up, with the intensity of the color representing the magnitude of the change. The following screenshot of our treemap shows gains and losses by category, comparing the fourth quarter of 2008 with the fourth quarter of 2007.

Treemap Computer Book Market - Quarter 4 2008

So what are all the boxes and colors telling us? First remember this is comparing the last quarter of 2008 with the last quarter of 2007. This snapshot of the treemap looks like a blood bath -- which again shows that in the fourth quarter of 2008 the computer book market took a significant beating. There were very few brights spots [bright green] during the last quarter of 2008. In the fourth quarter of 2007, Microsoft's Vista was finally getting some success in the market and topics on Web 2.0 were emerging and selling well. Those topics have now cratered, any "silver lining" found 2008 is thanks to Apple.

For the whole of 2008, Mac OS was the number one growth area for units, followed by Mobile Phone [iPhone] followed by Social Media, and Mac Programming. Apple has enjoyed an amazing ride the past few years and, in my opinion, for good reasons. Another area to watch is the RIA space, with entries from Adobe [Flex/Flash/Air] and Microsoft [Silverlight]. Silverlight has gone from a spec on the Treemap to a fair-sized bright green box. Its appearance and growth is reminiscent of Sharepoint a few years back.

I find it useful to organize the trends into classifications that are High Growth Categories [Bright Green], Moderate Growth Categories [Dark Green to Black], Categories to Watch [all colors], and Down Categories [Red to Bright Red]. Most of these descriptions are self-explanatory except perhaps the Categories to Watch. This group contains titles that we've found are not typically susceptible to seasonal swings,as well as areas that we have are on our editorial Radar. If there are categories you want to get on our watch list, please let me know.

The table below highlights and explains some of the data from the chart above, although the data below is for all of 2008. The Share column shows the total market share of that category, and the ROC column shows the Rate of Change. So, for example, you can see that Mac OS books represent 4.07% of the entire computer book market, and were up 28.70%.

High Growth
Share ROC Notes

Mac Programming



Small catgory led by several titles centered on Cocoa and Objective C for developing iPhone Apps.




Small category that saw 10 new 2008 titles on Vmware and VMware Infrastructure.

Mobile Phone



Good-size category led by six new 2008 titles on the iPhone.

Computers and Society



Small and volitile category with 44% new titles; 15% less than last year. Titles don't live long in the area.

Social Web



Decent-size catgory led by Blogging and Wiki books; Wordpress leads with six titles.

Moderate Growth




Web Site Topics



Good-size category where 36 new titles appeared and 25 dropped out. Led by Web Analytics, Joomla, and Drupal.

Mac OS



Fairly large category with a monster book in Pogue's Mac OS X Leopard: The Missing Manual which sold three times as many units as the #2 book.




Solid category with 43 new titles and only 19 fell out of the space. O'Reilly has 4 out of 10 bestsellers. Surprisingly, MS Press had only 1 title in the top ten.




Solid category dominated by Sharepoint titles. 25 new titles, and 13 fell out of the rankings.

Categories to Watch




Office Suites



Decent-size category dominated by 74 new Office 2007 [PC] and 2008 [Mac] titles while 23 Office 2000, and 2003 titles finally fell out of category.

Digital Photography



Large category with 5 titles selling more than 20k units; 76 new tiles moved into the category and 109 [mostly CS2] titles fell out of the rankings.




Large category with 9 titles selling more than 10k units; 26 new Excel 2007 titles and 3 fell out.

Rich Web Interface



Large category dominated by ActionScript (3), Flash (3) and Javascript (3) in the top ten. But only 3 titles topped 10k units.

Down Categories - Not Hot Topics




Web Page Creation



Large category with 42 new titles published but 30 that fell out of the rankings. 4 of the top 5 titles sold fewer units in 2008 than in 2007.

Windows Consumer



Large category with 9 out of the top 10 titles selling fewer units in 2008 than in 2007; 30 new titles and 57 titles fell out of the rankings.




Decent-size category with the middle of the pack failing. Titles selling 2k-3k units dropped to the 600-700 range. 27 new titles, 26 that fell out.

Microsoft Programming



Decent size category with the middle of the pack failing. Titles selling 2k-3k units dropped to the 600-700 range. 30 new titles, 32 that fell out.

iPod + iTunes



Decent size category with the 2 of the top 5 titles accounting for 30k fewer units between them than they sold in 2007. 8 new titles and 8 that fell out of the rankings.

Part two of this series will give a closer look at the technologies within the categories. Part three will be about the publishers, winners and losers. And part four will contain more analysis of programming languages.

by Mike Hendrickson at February 17, 2009 05:57 PM

Off Planet

Starbucks VIA Ready Brew

Just add water. Nearly 20 years in development and touted as "a breakthrough in instant coffee", Starbucks VIA Ready Brew ($20/24 single-serve packets) brings full-bodied, authentic coffee flavor in an...
Visit Uncrate for the full post.

by (author unknown) at February 17, 2009 05:32 PM

Google Mac

What's New for iPhone

By Jason Toff, Google Mac Team

Over the past few weeks, a number of enhancements have been made to Google's offerings for the iPhone.  We wanted to highlight some of those improvements here.

Google Sync Beta
Google Sync allows you to get your Gmail Contacts and Google Calendar events to your phone.  Once you set up Sync on your phone, it will automatically begin synchronizing your address book and calendar in the background, over-the-air, so you can attend to other tasks. Sync uses push technology so any changes or additions to your calendar or contacts are reflected on your device in minutes.

Learn more on the Google Mobile blog or try Sync at

Tasks lets you easily create and manage to-do lists in Gmail and on your iPhone.  While on the go, you can view tasks, add tasks, and mark them as completed. These changes are automatically reflected in Gmail. Using your iPhone, you can also add, edit, and delete entire lists.

Learn more on the Gmail blog or get started with tasks at

Google Book Search
Over 1.5 million public domain books in the US (and over half a million outside the US) are now available for perusing on your iPhone.  You can search for a title, author, or subject. Or you can browse the list of "Featured books" and various categories like business, the classics, travel, and more.

Learn more on the Google Book Search blog or start reading at

Edit Docs
Last week, we launched new capabilities to Google Docs for your iPhone that allow you to add new rows, edit existing cells, sort by columns, and filter by terms. Now you don't have to wait until you get to your computer to update a spreadsheet.

Learn more on the Google Docs blog or start editing at

by Jason Toff ( at February 17, 2009 03:14 PM

Off Planet

Tick Off Your Tasks With Printable Checklists

Written by Simon Mackie.

According to Google, the biggest competitor to Gmail Tasks is good old-fashioned pen and paper. I still use written lists when planning projects and to make notes to myself. Unfortunately, my handwriting is appalling, which renders my lists useless for other people and, sometimes, even myself! Hence, I was really pleased to come across a little web app called Printable Checklist, which offers a dead simple way of making checklists that you can print out (hat tip to Unclutterer, which originally found it on Lifehacker).

Browse to the Printable Checklist site and you’re presented with a title and your first list item, ready to edit:

The editable Printable Checklist

The editable Printable Checklist

Click on the title to edit it, click on a list item to edit it, add more items using the “Add item” button. That’s all there is to it.

Printable Checklist in action

Printable Checklist in action

When you’re finished with your list you can print it out. Ingenious and simple!

Green your IT. Save Money. Save the Planet » Register at $295 / $495 regular »
Hear Microsoft, IBM, Dell and Cisco execs at GigaOM’s Green:Net.

by Simon Mackie at February 17, 2009 02:58 PM

Google Blog

Stop bouncing: tips for website success

This is the first post in a series on The Power of Measurement. In this economic climate, these posts are designed to cover ways to make your website as successful as possible. Over the course of the next few weeks, our in-house Analytics guru, Avinash Kaushik, and others will demystify the world of website analytics and offer tips for getting the most out of your metrics. -Ed.

Would you believe me if I said you don't need a Ph.D. to understand your website data? No? Believe it. Free tools like Google Analytics can help simplify website data so that you can better understand what visitors are doing when they arrive on your site.

One of the coolest innovations in understanding your website has been to provide delightful metrics on your web data so that you can make direct changes to your site. In lesson one of our series on The Power of Measurement, we will learn about bounce rate and how understanding it can improve your website.

You may be used to reading about how many “hits” a site or a page has received. But reporting a "hit" meant something back in 1985 when it was essentially a pageview (the number of times your webpage was viewed). Today, you will find that each web page gets many "hits," rendering the metric meaningless. While the number of "hits" a page received used to be the best measure of success, we now have more in-depth and detailed metrics to analyze the performance of our web pages.

Bounce rate is insightful because from the perspective of a website visitor, it measures this phenomenon: "I came; I puked; I left." (OK, technically it also means the number of sessions with just one pageview.) While metrics like visitors show the number of people who came to your site, bounce rate will tell you how many of those people were unimpressed and left your site without taking any action (not even dignifying the site with a single click!).

Bounce rate has these attributes:
1) It is really hard to misunderstand. It measures the number of people who landed on your site and refused to give you even one single click!
2) It is available in most web analytics tools, including our own Google Analytics.
3) It is quick and easy to use. Bounce rate will help you understand where and how to make changes on your website in under an hour.

Now, let's make this real. If you have a Google Analytics account, you'll see this when you log in:

This means that about 77 percent of website visitors came to the site, "puked," and left. Ouch. Based on that, you may need to light a fire somewhere, as things need fixing. Here are two simple and specific ideas:

Tip #1: Find out where your visitors are coming from and which of these sites sends visitors with the highest bounce rate. To do so, all you have to do is go to "Traffic Sources" (in Google Analytics, or whatever tool you are using), click on "Referring Sites," and boom!

In about fifteen seconds you know which sites are your “best friends forever” (BFFs), and where you need to look a tad deeper. By identifying the sites that are sending you visitors with high bounce rates, you can investigate the reasons why (the campaigns, the context in which your link is placed, the ads) and make changes to ensure that visitors find what they are looking for when they come to your site.

However, it may not just be the campaigns that turned your readers away; it could be the specific page that your visitors landed on. That leads to my Tip #2: Go to “Content” (labeled as such in Google Analytics) and click on "Top Landing Pages" report:

You can see different pages of your website on the left and the corresponding bounce rates on the right. Remember, you don't decide the homepage of your website. When people search, the engine finds the most relevant page on your site and that's the homepage. If you have 50,000 pages on your website, you have 50,000 homepages. The report above is showing the top ten pages of your website and which ones might be letting you down by not engaging your visitors enough to get even one click!

In under an hour you can discover which sources are your BFFs and which pages on your site need some sprucing up. This will ensure lower bounce rates, higher engagement with your site, and perhaps even higher revenue. To learn about other ways in which you can use bounce rate effectively, check out this article on my web analytics blog, Occam's Razor.

Good luck!

by A Googler ( at February 17, 2009 02:09 PM

OReilly Radar

ETech Preview: Creating Biological Legos

If you've gotten tired of hacking firewalls or cloud computing, maybe it's time to try your hand with DNA. That's what Reshma Shetty is doing with her Doctorate in Biological Engineering from MIT. Apart from her crowning achievement of getting bacteria to smell like mint and bananas, she's also active in the developing field of synthetic biology and has recently helped found a company called Gingko BioWorks which is developing enabling technologies to allow for rapid prototyping of biological systems. She will be giving a talk entitled Real Hackers Program DNA at O'Reilly's Emerging Technology Conference, March 9-12, in San Jose, California. And she's joining us here today. Thank you for taking the time.

RESHMA SHETTY: No problem. Happy to be here.

JAMES TURNER: So first of all, how do you make bacteria smell nice, and why? I get an image of a commercial, "Mary may have necrotizing fasciitis, but at least her hospital room smells minty fresh."

RS: Well, the original inspiration for the project was the fact that for anybody who works in a lab, who works with E. coli, when you grow cultures of the stuff, it just smells really bad. It smells really stinky, basically. And so our thought was, "Hey, why don't we reengineer the smell of E. coli? It'll make the lab smell minty fresh, and it's also a fun project that gets people, who maybe aren't normally excited about biology, interested in it because it's a very tangible thing. I can smell the change I made to this bacteria."

JT: So what was the actual process involved?

RS: So the process was, you basically take a gene, we took a gene from the petunia plant, which normally provides an odor to the flower, and you place that gene into the E. coli cell. And by supplying the cell with an appropriate precursor, you make this minty smell as a result. So it's fairly straightforward.

JT: Your degree, biological engineering, is a new one to me. How is it different from biochemistry or microbiology or genomics or any of the other traditional biotech degrees?

RS: Well, biology and biochemistry, and so on, are concerned with studying the natural world. So I'm going to go out and figure out how the natural world works. Biological engineering, instead, is really all about saying, "Hey, we have this natural world around us. Biology is, in some sense, a new technology through which we can build new engineered biological systems." Right? So the idea is, what's the difference between physics and electrical engineering? Electrical engineers want to go build. So in biological engineering, we're interested in going and building stuff, too. But using biology, rather than physics, as the underlying science of it.

JT: Explain a little bit about the field of synthetic biology.

RS: So synthetic biology is a new field that's developed over the past few years among a group of engineers and scientist all over the world who are saying, “Huh, you know, right now it's really actually quite hard to engineer biological systems. Even just to put pieces of DNA together can be a fairly laborious and manual process that's pretty error-prone. So how do we make that process easier? How do we make it so that an undergraduate or a team of undergraduates can go engineer E. coli to smell like wintergreen and banana in just a summer?” Typically, people usually assume that those types of projects are just too hard to do, because the tools we have essentially suck. So synthetic biology is focused on the effort of making biological engineering easier.

JT: What areas do you see synthetic biology having the largest short-term impact on?

RS: Well, I think you're already seeing some of the impacts in, for example, the biofuel space. So there are a lot of folks interested in saying, "Hey, instead of pulling oil out of the ground, why don't we just make it from a vat of engineered microbes?" And the project that's the most intriguing to folks right now probably is a company called Amyris Biotechnologies, where they have a pathway for making an antimalarial drug. This is a drug that you can naturally find and extract from the wormwood plant, but these plants are pretty rare. And it's really expensive to manufacture this drug from the plant. And so, in order to develop more cures, or essentially develop more of this drug and get the cost down cheap enough so that it's actually an accessible cure for malaria for use in third world developing countries, they engineered microbes to produce the antimalarial drug. And so this is the poster child application of synthetic biology; by making stuff cheaper, you can essentially better people's lives.

JT: A concern that some people raise about the ease of which one can order designer biology these days is that it's becoming more likely, either by accident or design, for something particularly nasty to enter the environment. What's your take on that?

RS: Well, for us, what we're really interested in doing at Gingko is making biological engineering easier. And obviously, one of the aspects of what that means is you're essentially democratizing access to the technology. You're making it so that more and more people can come in and engineer biological systems. Now just like with any technology, by making it easier and making it more accessible, you're both promoting huge advantages, and there are going to be areas for concern.

How do we know that the next time around when we have an outbreak of Avian flu, or whatnot, how do we know that the traditional "academic" labs and research institutes around the world are going to be prepared to respond? Maybe we can develop a wider network of people who can work towards engineering biological systems for good. You're creating a larger community of people, that you can tap into to come up with useful things for society. So from our perspective, yes, we are making biology easier and we're democratizing access to it, but we're also working to make that community of folks who are doing this work as constructive as possible, and trying to create a culture essentially where people are trying to use these technologies for good rather than for harm.

JT: I guess my concern is that if you look at the history of computers and software engineering, the easier it gets to design things, and especially when you look at things like computer viruses, it's gotten to the point now where essentially, there are the equivalent of these “send us a sequence and we'll give you DNA [companies].” There's “send us what you want your computer virus to do and we'll send you back a computer virus.” I'm just a little concerned that the track record of humanity, when given easy access to new technologies, has not been great.

RS: Well, what's the alternative to what you're suggesting? Should we all get rid of our computers so that we don't have the potential for computer viruses? You have to understand that, yes; there were some costs that came about with the computer revolution. But there were also huge benefits. You're giving people access to information in a way that they never had before. So, in some ways, you can think about it that computers save people's lives. If I have a rare disease and my doctor doesn't happen to know how to diagnose it, I can go Google online and look for my symptoms, and potentially find the right doctor to go to to help cure myself, right?

So, the problem with every technology is that you have to take the bad with the good. So what we can do, basically, is to try to bias the technology into folks who are working around that technology towards good as much as possible. And that's what I and others are actively working to do. So your question is -- you're ignoring all the good that has come out of things like making software programming easier and more open.

JT: The point's well taken. One last question on the subject and then we'll move on. My wife has told me -- she took organic chemistry in college, and was told that basically once you have a degree like that, expect that the government's going to keep an eye on you later on in life, if you're ordering things, for example. Has there been any thought or talk about, for example, Homeland Security keeping an eye on what's going on in this field?

RS: Well, I would say that the relationships have been actually much more positive than that. I think the idea has been for researchers in the field, and for folks from government, and folks from industry, to get together and figure out, "Hey, there's a lot of good that can come out of this. But there is also some potential for accidents and harm. How do we work together to create an environment where the most constructive things happen?" So I would say that there has certainly been discussions with folks from government. But it's not so much been a “how do we tamp down on this or how do we regulate this”, but “how do we work together to minimize the risk of something bad happening.”

JT: So changing topics, are kids who are entering secondary schools today prepared for a career in biotech? And what would you like to see change in the way biology is taught?

RS: Well, there's a lot that can be said about the US education system, especially when it comes to science. But I would say that the coolest thing about synthetic biology is that it's a very creative process, right? People get to go in and think about, “Hmm, if I wanted to design a biological system, what could I go build? Maybe I want to engineer E. coli that can take a bacterial photograph on a plate. Maybe I want to engineer E. coli to smell like wintergreen and bananas. Maybe I want to engineer a system that can detect arsenic contamination in well water so that folks in Bangladesh can test whether their wells are contaminated.”

There's a huge potential for creativity. And so one of the things that I love about synthetic biology and biological engineering is that there's a huge capacity to inspire young people to be creative and to get into science. And I think we're seeing a lot of that with young folks who are interested in synthetic biology and trying to figure out “how do I get into this?”

JT: Do you think that the teachers at that level are up to the challenge of assisting with this stuff? Or are the kids going to have to be Heinlein-esq, and go off on their own to do it?

RS: As with anything, I think there's going to be a spread. There are teachers who are actively looking to how to integrate these types of educational materials into their curriculums. They'd love to be able to integrate these types of ideas. The way that the community is trying to foster that is basically by making a lot of the materials and the research and the work that goes on as open as possible.

So, for example, I was a founder of It's a wiki, basically, where biological engineers and scientists can post information about their work. Folks in the synthetic biology community have really taken to that, and basically posted their ideas and their work and their protocols, and by making this information available, you make it so that teachers and educators from all around the world can basically reuse that material in their own teaching. I think, for enterprising teachers who want to make use of or who want to incorporate synthetic biology into the curriculums, there are avenues to that. We still need better materials, don't get me wrong. But I think we're trying to do all we can to make it easier for educators to teach about the field.

JT: Your company, Gingko BioWorks, and I'm quoting from your website here, is focused on improving biology as a substrate for engineering. When you take the market-speak away, what does that really mean in terms of products and services? And who do you see your major client base as?

RS: So what Gingko's trying to do is make biology easier to engineer. All of the founders of Gingko are actually engineers from other fields. So I was a computer scientist. We have a chemical engineer, a mechanical engineer, an electrical engineer and another computer scientist as among our founders. So the way we think about biology and engineering biology is, we think about it in terms of the design cycle. I want to be able to design a biological system. Then I want to be able to build it. And then I want to be able to test and see whether it worked. And I want to go around that loop as fast as possible.

So what Gingko's trying to do is initially focus on the construction step. To say, "Hmm, if I want to build a biological system, I need a set of parts. Essentially, I need my Legos which I can mix and match in order to build my engineered biological system. So I need my part set and I need a way to assemble those parts as quickly as possible into different biological systems so I can see which one works." We think of it as essentially a platform for rapid prototyping of biological systems. And so that's what Gingko is doing right now is developing the parts set and developing the technology for rapidly assembling parts into systems.

JT: So if I, or your typical Make magazine reader, said, "Gee, I'd like to go try this stuff out," what kind of a setup do you need these days? Is it something that somebody with a few hundred dollars and the inspiration and a basic background could go set up? Or are you still talking about a lab full of glassware to do this?

RS: Well, it depends on exactly what the person wants to do. I think some basic experiments can be done pretty cheaply with an enterprising person using EBay and whatnot. But the thing I should point out is that in terms of do-it-yourself biology, or amateur biological engineering, there are regulations in certain places in this country in terms of doing genetic engineering. Such as taking DNA from one organism and putting it into another. So you would need, essentially, a lab facility to do some of the work, according to federal regulations. The situation's not entirely clear, but I would say as a word of caution, there are some regulations in place that you should think about following if you're interested in this type of thing as an amateur.

JT: So I guess what we need is the equivalent of a place where you can go when you're repairing your car, that has the lift and everything.

RS: Exactly. Yeah. So there are lots of folks who are interested in developing essentially the equivalent of hacker spaces or community labs, where people can come together and think about how to have the right tools and equipment for engineering biological systems. So there's a group in Cambridge here that's already working on that problem.

JT: So you can go and say, "Charlie, could I borrow a cup of restriction enzymes?"

RS: Exactly.

JT: So can you give us an idea of what we can expect to hear at your ETech talk?

RS: Well, at ETech what we're really looking forward to doing is chatting with folks about the technology and possibilities, as well as giving people an idea of what's possible. So we're going to do a little demo with some folks, where they get to probably engineer some bacteria to turn red, is what we're going to try to do. So give people some idea of what's involved in biological engineering.

JT: It sounds like it's going to be a lot of fun. So it's going to be a very hands-on type of thing it sounds like?

RS: Yeah. Yeah. just listening to people talk can be a bit boring, so we want to give people a chance to play a little bit.

JT: All right. Well, I've been talking to Reshma Shetty who is one of the founders of Gingko BioWorks. She'll be speaking at O'Reilly's Emerging Technologies Conference in March, speaking on Real Hackers Program DNA. Thank you so much for talking to us.

RS: Thank you. It was a pleasure.

by James Turner at February 17, 2009 01:20 PM


Disruptive Backup Platform

In many data center environments where commercial backup software like Legato Networker or Veritas Netbackup are deployed they are mostly used to backup and restore files. Then for a minority of backups a special software is being used for better integration like RMAN for Oracle database. In recent years a nearline storage has been an interesting alternative to tapes - one of the outcomes is that all commercial backup software support it one way or the other.

But the real question is - do you actually need the commercial software or would it be more flexible and more cost effective to build your backup solution on open source software and commodity hardware? In some cases it's obvious to do so - for example a NAS server/cluster. It does make a lot of sense to set-up another server, exactly the same with the same HW and storage and replicate all your data to it. In case you loose your data on your main server you could restore a data from the spare one or make your spare one a live server which will make your "restore" almost instantaneous compared to a full copy of data. Then later-on once you have fixed your main server it could become your backup one. Not only it provides you with MUCH more quicker service restoration in case you lost data but it almost certainly will be cheaper than a legacy solution based on commercial software, tape libraries, etc.

Above example is a rather special case - what about a more general approach? What about OS backups and data backups which do not require special co-operation with an application to perform a backup? In many environments that covers well over 90% of all backups. I would argue that a combination of some really clever open source software can provide a valuable and even better backup solution for the 90% of servers than legacy approach. In fact some companies are already doing exactly that - one of them for example is Joyent (see slide 37), and there are others.

Recently I was asked to architect and come up with such a solution for my current employer. That was something we had in mind for some time here just never got to it. Until recently...

The main reason was a cost saving factor - we figured out that we should be able to save a lot of money if we build a backup platform for 90% cases rather then extend current legacy platform.

Before I started to implement a prototype I needed to understand the requirements and constrains, here are most important ones in no particular order:
  • it has to be significantly more cost effective than legacy platform
  • it has to work on many different Unix platforms and ideally Windows
  • it has to provide a remote copy in a remote location
  • some data needs to end-up on tape anyway
  • only basic file backup and restore functionality - no bare metal restore
  • it has to be able to scale to thousands of clients
  • due to other projects, priorities and business requirements I need to implement a working prototype very quickly (couple of weeks) and I can't commit my resources 100% to it
  • it has to be easy to use for sysadmins
After thinking about it I came up with some design goals:
  • use only free and open source software which is well known
  • provide a backup tool which will hide all the complexities
  • the solution has to be hardware agnostic and has to be able to reliably utilize commodity hardware
  • the solution has to scale horizontally
  • some concepts should be implemented as close to commercial software as possible to avoid "being different" for no particular reason
Based on above requirements and design goals I came up with the following implementation decisions. Notice that I omitted a lot of implementation and design details here - it's just an overview.

The Backup Algorithm

The idea is to automatically create a dedicated filesystem for each client and each time data has been copied (backed-up) to the filesystem create a snapshot for the filesystem. The snapshot will in effect represent a specific backup. Notice that next time you run a backup for the same client it will be an incremental copy - actually once you did your full backup for the first time all future backups will always be incremental. This should provide less load on your network and your servers compared to most commercial backup software when for all practical purposes you need to do a full backup on regular basis (TSM being an exception here but it has its own set of problems by doing so).
In order to expire old backups old snapshots will be removed if older than a global retention policy or a specific policy for a client.

  1. create a dedicated filesystem (if not present) for a client
  2. rsync data from the client
  3. create a snapshot for the client filesystem after rsync finished (successfully or not)
Retention policy:
  1. check for global retention policy
  2. check for a local (client specific) retention policy
  3. delete all snapshots older than the local or global retention policy


Because we are building an in-house solution it has to be easy to maintain for sysadmins. It means that I should use a well known and proven technologies that sysadmins know how to use and are familiar with.

I chose Rsync for file synchronization. It has been available for most Unix (and Windows) platforms for years and most sysadmins are familiar with it. It is also being actively developed. The important thing here is that rsync will be used only for file transfer so if another tool will be more convenient to use in a future it should be very easy to start using it instead of rsync.

When it comes to OS and filesystem choice it is almost obvious: Open Solaris + ZFS. There are many very good reasons why and I will try to explain some of them. When you look at the above requirements again you will see that we need an easy way to quickly create lots of filesystems on demand while using a common storage pool so you don't have to worry about pre-provisioning your storage for each client - all you care about is if you have enough storage for a current set of clients and retention policy. Then you need a snapshoting feature which has minimal impact on performance, scales to thousands of snapshots and again doesn't require to pre-provision any dedicated space to snapshots as you don't know in advance how much disk space you will need - it has to be very flexible and it shouldn't impose any unnecessary restrictions. Ideally the filesystem should also support transparent compression so you can save some disk space. It would be perfect to also have a transparent deduplication. Additionally the filesystem should be scalable in terms of block sizes and number of inodes - you don't want to tweak each filesystem for each client separately - it should just work by dynamically adjusting to your data. Another important feature of the filesystem should be some kind of data and metadata checksumming. This is important as the idea is to utilize commodity hardware which is generally less reliable. The only filesystem which provides all of the above features (except for dedup) is ZFS.


Ideally the hardware of choice should have some desired features like: as low as possible $/GB ratio (a whole solution: server+storage), should be as compact as possible (TB/1RU should be as high as possible), should be able to sustain at least several hundreds MB/s of write throughput and should provide at least couple of GbE links. We settled down on Sun x4500 servers which deliver on all of these requirements. If you haven't check them yet here is some basic spec: up-to 48x 1TB SATA disk drives, 2x Quad-core CPUs, up-to 64GB of RAM, 4x on-board GbE - and all of this in 4U.

One can configure the disks in many different ways - I have chosen configuration with 2x disks for operating system, 2x global hot spares, 44x disks arranged in 4x RAID-6 (RAID-Z2) groups making one large pool. That configuration provides really good reliability and performance while maximizing available disk space.

The Architecture

A server with lots of storage should be deployed with relatively large network pipe by utilizing link aggregation across couple GbE links or by utilizing 10GbE card. The 2nd identical server is to be deployed in a remote location and asynchronous replication should be set-up between them. The moment there is a need for more storage one deploys another pair of servers in the same manner as the first one. The new pair is entirely independent from the previous one and can be build on different HW (whatever is best at the time of purchase). Then additional clients should be added to new servers. This provides horizontal scaling for all components (network, storage, CPU, ...) and best cost/performance effective solution moving forward.

Additionally a legacy backup client can be installed on one of the servers in pair to provide tape backups for selected or all data. The advantage is that only one client license will be required per pair instead of hundreds for each client. This alone can lower licensing and support costs considerably.

The backup tool

The most important part of the backup solution will be a tool to manage backups. Sysadmins shouldn't play directly with underlying technologies like ZFS, rsync or other utilities - they should be provided with the tool which will hide all the complexities and allow them to perform 99% of backup related operations. It will not only make the platform easier to use but will minimize the risk of people making a mistake. Sure there will be some bugs in the tool at the beginning but every time a bug will be fixed it will be for the benefit of all backups and it the issue shouldn't happen again. Here are some design goals and constrains for the tool:
  • has to be written in a language most sysadmins are familiar with
  • scripting language is preferred
  • has to be easy to use
  • has to hide most implementation details
  • has to protect from common failures
  • all common backup operations should be implemented (backup, restore, retention policy, archiving, deleting backups, listing backups, etc.)
  • it is not about creating a commercial product(*)
(*) the tool is only for internal use so it doesn't have to be modular, it doesn't need to implement all features as long as most common operations are covered - one of the most important goals here is to keep it simple so it is easy to understand and maintain by sysadmins

Given the above requirements it has been written in BASH with only a couple of external commands being used like date, grep, etc. - all of them are common tools all sysadmins are familiar with and all of them are standard tools delivered with Open Solaris. Then entire tool is just a single file with one required config file and another one which is optional.

All basic operations have already been implemented except for archiving (WIP) and restore. I left the restore functionality as one of the last features to implement because it is relatively easy to implement and for the time being the tricky part is to prove that the backup concept actually works and scales for lots of different clients. In a meantime if it will be required to restore a file or set of files all team members know how to do it - it is about restoring a file(s) from a RO filesystem (snapshot) on one server to another after all - be it rsync, nfs, tar+scp, ... -they all know how to do it. At some point (rather sooner or later) the basic restore functionality will need to be implemented to minimize a possibility of doing a mistake by a sysadmin while restoring files.

Another feature to be tested soon is a replication of all or selected clients to remote server. There are several ways to implement it where rsync and zfs send|recv seem to be best choice. I prefer zfs send|recv solution as it should be much faster than rsync in this environment.

The End Result

Given the time and resource constrains it is more of a proof of concept or a prototype which has already become a production tool, but the point is that after over a month in a production it seems to work pretty good so far with well over 100 clients in regular daily backup while more clients are being added every day. We are keeping a closed eye on it and will add more features if needed in a future. It is still work in progress (and probably always will be) but it is good enough for us as a replacement for most backups. I will post some examples of how to use the tool in another blog entry soon.

It is an in-house solution which will have to be supported internally - but we believe it is worth it and it saves a lot of money. It took only a couple of weeks to come up with the working solution and hopefully there won't be much to fix or implement after some time so we are not expecting a cost of maintaining the tool to be high. It is also easier and cheaper to train people on how to use it compared to commercial software. It doesn't mean that such an approach is best for every environment - it is not. But it is compelling alternative for many environments. Only time will tell how it works in a long term - so far so good.

Open Source and commodity hardware being disruptive again.

by milek ( at February 17, 2009 12:43 PM

The Daily ACK

New for the iPhone, AWS Calculate

Following on from my previous iPhone application which allows you to monitor the status of various cloud computing backend services, and continuing with the Cloud Computing themed applications. I'd like to announce the release of my next iPhone application onto the App Store...

AWS Calc for the iPhone 3G and iPod touch.

Cloud Computing makes it easy to build applications that run reliably, even under heavy loads. But the larger the load, the higher your cost.

The AWS Calc application allows you to estimate your monthly costs based on your current Amazon Web Services (AWS) usage levels, and then lets you estimate how much a sudden usage spike could cost.

by Al. ( at February 17, 2009 12:40 PM

Year in the Life of a BSD Guru

Security Assessment of TCP

The Centre for the Protection of National Infrastructure in the UK has published a document which is the result of a security assessment of the IETF specifications of the Transmission Control Protocol (TCP), from a security point of view.

February 17, 2009 11:48 AM


Invisible watermark


Watermarking images can sometimes be a decent way to allow posted content to make the rounds online, while ensuring that the source of the content is correctly attributed. One thing that's always bugged me about watermarking, however, is that the original source site becomes all peppered with ratty attributions or logos that detract from the quality of the posted image. In this scenario, the watermarked image makes the content owner's site look shoddy, and this is a bad thing.

AJ sent in this nifty idea for making invisible watermarked images, where the watermark only appears when the image is downloaded or copied:

With about 10 lines of HTML and CSS, you can have an image on your site, watermark free. Then, when it's pulled off, a watermark suddenly appears like magic! A precisely positioned DIV with an image background that cancels out just the watermark is placed over the watermarked image, and when they overlap, you (mostly) see the un-watermarked original image. Tada! It's not cross-browser tested, so stay aware.

I tested this with a slightly easier alternative. Just cut out a whole rectangle of the source image where the watermark will go, then place a relatively positioned div with that background over the watermarked output. It's completely imperceptible until the image is copied.

Try dragging the image above to your desktop and you'll see what I mean.

The only downside is that it's a bit tedious, but there's no reason this feature couldn't be automated in the image upload facilities of most blogging software. Also, this isn't going to keep someone from screen-grabbing the whole thing, or carefully reassembling the two images manually. It does, however, make it easy for honest people to easily attribute the source, all while improving the look of the creator's site. Anyone going to the trouble of subverting this would probably be cropping off your watermarks anyway.

AJ's Watermark Overlay Trick

by Jason Striegel at February 17, 2009 11:00 AM

OReilly Radar

Four short links: 17 Feb 2009

Four Tuesday quickies:

  1. The Technology Behind Coraline -- 3D stop-motion movie used a 3D printer to make the dolls and things like drops of water.
  2. Some OSCON Proposal Tips (Alex Russell) -- good advice for anyone submitting a talk to a technical conference.
  3. Oscar Predictions You Can Bet On -- Nate Silver of FiveThirtyEight turns his attention to the Oscars.
  4. Web Hooks and the Programmable Web of Tomorrow -- a epic presentation of different ways to offer and use callbacks, URLs on your site that a remote service can hit when something happens on their service. (via Stinky)

by Nat Torkington at February 17, 2009 10:59 AM


tsatest and incrementals

Today I learned how to tell TSATEST to do an incremental backup. I also learned that the /path parameter requires the DOS namespace name. Example:

tsatest /V=SHARE: /path=FACILI~1 /U=.username.for.backup /c=2

That'll do an incremental (files with the Archive bit set) backup of that specific directory, on that specific volume.

by riedesg ( at February 17, 2009 10:45 AM

Year in the Life of a BSD Guru

Desktop NetBSD

Andrew Doran and Jared D. McNeill have started a Desktop NetBSD project to "make it possible to install a useful desktop system in under 15 minutes, responding to only a few prompts in the process".

February 17, 2009 09:30 AM


SELinux and Smack modules for Linux containers (17 Feb 2009)

A common response when someone first hears about containers is "How do I create a secure container?" This article answers that question by showing you how to use Linux Security Modules (LSM) to improve the security of containers. In particular, it shows you how to specify a security goal and meet it with both the Smack and SELinux security modules.

February 17, 2009 07:44 AM

The Hive Archive

Links for 2009-02-16 []

  • Restricted Stock Units (RSU) Sales and Tax Reporting - The Finance Buff
  • Pydev
    Want a better Pydev? Why not give a small donation? (paypal) What is Pydev? Pydev is a plugin that enables users to use Eclipse for Python and Jython development -- making Eclipse a first class Python IDE -- It comes with many goodies such as code completion, syntax highlighting, syntax analysis, refactor, debug and many others

February 17, 2009 06:00 AM

OReilly Radar

Google's PowerMeter. It's Cool, but don't Bogart My Meter Data

Last week I read this piece in the New York Times about Google's PowerMeter, their entry into the smart meter game. The story was picked up in quite a few places but neither the NYT piece or related articles from other outlets expanded much on Google's underlying press release. Google's FAQ isn't very satisfying either; it has no depth so I didn't really know what to make of it. When I finished reading it I was left with an inchoate unsettled feeling and then I forgot about it. But on Friday evening I had a random conversation about it with a colleague who works in the meter data management (MDM) space. By the time we were through talking about what Google might be doing I had arrived at a position of love / hate. I'll explain the love first.

In terms of the attention this brings to energy consumption at the household level, I really love what Google is doing with this initiative. As they put it:

"But smart meters need to be coupled with a strategy to provide customers with easy access to near real-time data on their energy usage. We're working on a prototype product that would give people this information in an iGoogle gadget."

I agree completely. It's not exactly the same thing, but I've been amazed by how much my behavior behind the wheel changed once I started leaving average mpg permanently visible on my car's dashboard display. In short order I went from speed racer wannabe to one of those guys that gets harassed by co-workers for driving too slow. "Hey, can you hypermile on the way back from lunch? I'm starving."

While I am not sure that a gadget on the web will have the same right-there-in-front-of-my-eyes impact that my car's LCD display has, I'm convinced that Google has hit on something important. After all, today most of us have no idea how many kilowatts we use, what we use them for, or how much we're paying per kilowatt. We use power in our homes the way I used to drive my car.

Unfortunately, Google's FAQ doesn't really answer any questions about how the service works. But from statements like "Google is counting on others to build devices to feed data into PowerMeter technology" we can deduce that Google is proposing to correlate the total power reported by your smart meter with the data collected from individual loads inside the home. This is really cool, because not only does it make the information more generally accessible to you (in an easily accessible gadget), it proposes to tell you what it is in your house that is using that power, and when.

Google can do this because many national and state governments have begun to mandate smart meter programs. Most of us will probably have one on the side of our house pretty soon (especially if the stimulus bill speeds things up). Smart meters improve on their predecessors by automating meter reading, reporting consumption in intervals (typically 15 minutes), and they can send "last gasp" failure notifications in the event of power outages.

But, just like their dumb ancestors, they will be owned by the utility. This means that the data generated will ultimately be under control of the utility and hosted in their systems. The meter will talk to a utility data collector and from there its data will enter the utility's MDM system. The MDM will do a bunch of stuff with the data. However, from the point of view of you, the consumer, it will primarily send it to the billing system which will now be able to account for time of day pricing. Also, it will send those last gasp signals to the outage management system so that outage reporting will be automatic. This will make analysis and response faster and more accurate. Google appears to be leveraging their position and market power to make deals with the utilities to access that data on our behalf.

The biggest reason for smart meter initiatives is demand management. The utilities have to carry expensive excess capacity so that they can meet peak loads. If they can use interval metering coupled with better pricing and feedback systems, they may be able to change our usage patterns and smooth that load which will reduce the necessary peak capacity overhang. Also, as alternative energy sources with less predictable availability like wind power come on line the utilities will need more "load shaping" options. Ultimately they might be able to reach directly out to your smart appliances and turn them off remotely if they need to.

The laws that are mandating smart metering are focused on this demand side management. Practically speaking, most utilities will close the consumer feedback loop by offering a simple portal on the utility's web site that will let you monitor your usage in the context of your bill. However, this isn't the part of the system the utilities are excited about. The hardware and the meters are the sexy part. The contracts to build the consumer portals are probably going to go to low cost bidders who will build them exactly to low band pass requirements. In some cases they may provide provisions for customers to download historical data into a spreadsheet if they want to. A few enterprising customers will probably take advantage of this feature, but this is the hard way to do the kinds of correlations Google has in mind.

What should be apparant by now, is that the government is mandating a good idea, but they are mandating it from a utilty-centric rather than customer-centric point of view. There is naturally some overlap between utility and customer interests, but they are not identical. The utility is concerned about managing capital costs. They look at the interval data and the customer portal as a way to influence your time-of-use behaviors. They really don't care how much power you use, they just don't want your demand to be lumpy. On the other hand, we just want our bills to be low.

So, Google's initiative offers to take your data from the utility, combine it with data coming from devices in your home, and visualize it much more you-centrically. There offering will do a better job than the utility's portal illuminating structural efficiency problems in the home as well as usage pattern problems once utilities start implementing variable pricing. In short, while the utility is attempting to influence your "when I use it" decision making, Google is offering to help you make better "what I plug in" decisions along with the stuff the utility cares about.

So, what's not to like?

Google needs two distinct sources of data to make this initiative work. They need access to your data via the utility that owns your smart meter. Plus they need data from equipment manufacturers that are going to make your appliances smart or provide your home automation gadgets. It doesn't bother me at all that they get this data, as long as the utility makes it available for anyone else that might be able to innovate with it too, including me. You never know, I might want to use it for a home made gadget that sets up an electric shock on my thermostat any time my last eight averaged readings are above some arbitrary threshold, you know, just to make me think twice before turning it up.

The little bit of info that Google provides on this initiative is at their .org domain, but there is virtually no information about how to participate in data standards making, API specification, device development, or that kind of thing. If you want to participate, you pick whether you are a utility, device manufacturer, or government, fill out a form and wait for Google to get back to you. Imagine, the government fills out a form to participate in Google's initiative. Google has out governmented the government.

As I described already, governments are insisting on demand side management, but there don't appear to be any requirements to provide generic API's for meter readings or meter events. It's enterprise thinking rather than web platform thinking and we run the risk of your data being treated like utility "content." "In other news today HBO struck an exclusive deal with XYZ electric for all of their meter content, meanwhile Cinemax surprised industry watchers by locking up ABC Electric. As was reported last night, all of the remaining utility's signed with Google last week."

I'm guessing that Google is probably following the same pattern that they are using in the transit space and making (exclusive?) deals with the utilities to consume your data. You'll have to log into the utilty portal to approve their access (or check a box on your bill). But Google, or other big players that can afford to buy in, will probably be the only choice(s) you have. There is no evidence on that they are trying to create an eco-system or generalized approach that would let you, the owner of the data, share it with other value added service providers. If the utilities implement this under government mandate it will suck. If they install smart meters with stimulus package money and still don't provide eco-system API's it will worse than suck.

Any thoughts on how this plays out on the smart appliance / home automation side? Are there healthy open standards developing or is there danger of large scale exclusivity on that side of the equation too?

Google will be more innovative with this data than the electric utilities, I have no doubt about that. But I can easily imagine other companies doing interesing innovating things with my meter data as well. Especially as Google achieves utility scale themselves. If my electric utility is going to create a mechanism to share my data with companies like Google, I want them to make a generalized set of API's that will let me share it with anyone.

A quick note to policy makers in states who haven't yet finalized their programs. When you think about what to mandate, consider a more consumer-centric model (if it's easier, think of it as a voter-centric model). You should be shooting for a highly innovative and generative space where contributions and innovations can come from large and small firms alike, and where no one should be structurally locked out from participation. Don't lock us into a techno-oligarchy where two or three giant firms own our data and the possibility of innovation. If you insist on widely implemented consumer controlled API's and a less enterprise-centric model, you will not only encourage broader innovation at the consumer end, but you can use it to enhance competition on the generation side too.

Well, Google isn't really saying what they are doing, so maybe I got it wrong. Maybe they are about to get all "spectrum should be free" and roll out all kinds of draft API's specifications for comment. If you think I got it wrong, don't hesitate to let me know in the comments.

Update (2/17): Asa pointed out in the comments that Google does provide more about their intent in their comments to the California Public Utilities Commission. I missed that link before and it gives some useful hints.

Most interesting is the repeated reference to Home Area Networks (HAN). In the original post I assumed Google was taking current smart meters as a given and obtaining data from the utility MDM after it went through their data collectors. That looks like it was incorrect. Instead Google probably wants your meter to to talk to your HAN via wireless(?) and then on to them from there.

If Google can use their market position to make that data accessible off the HAN rather then from the utility MDM I think that's a good thing. Mostly because it makes possible the direct consumption and analysis of the data on my side of my home network's NAT / firewall. I didn't really touch on privacy considerations in the original post, but given that PowerMeter appears trivial from a computational point of view, I'd much rather run it locally rather than share my every light switch click with Google. If I want to know how I'm doing relative to peers I can share that data then, in appropriately summarized form.

The other point in the CPUC comments is this statement: "PowerMeter... we plan to release the technical specifications (application programming interfaces or API) so anyone can build applications from it."

This is great, but I would love to see the API's sooner rather than later. They aren't really PowerMeter API's after all, if I'm reading the situation correctly, these are proposed API's and data specifications for smart meters and smart devices. The API's that Google (and others) will be consuming, not the ones they are offering. If a whole ecosystem is going to be enabled through those API's, then the ecosystem should have a hand in developing them.

In summary, if Google manages to create a level playing field for the development of an ecosystem based on this data, I'll applaud them. Some people will use their service and, like they do with other Google services, trade privacy for targeted ads. Others will choose other approaches to using the data that provide those functions without exporting as much (or any) data.

by Jim Stogdill at February 17, 2009 03:50 AM


I Heard You Like Linux So We Put Linux In The Linux, So You Can Type While You Type

I was pushing the limit on my new system and couldn’t resist installing several OS in seamless mode on several virtual machine and see how it goes. Imagine a system where you can move files around and copy/paste from one application to another application of different OS. This is what geek dreams are made of…

Click Image for larger View

If you don’t appreciate the reference to the internet meme, I understand; the rest can enjoy. :)

by Pavs at February 17, 2009 02:37 AM

February 16, 2009

LOPSA blogs

LISA 2000 BoF Notes: Documentation for Growing Sysadmin Teams

(The things you find when you're spring cleaning, lemme tell ya... yeesh! In addition to posting this here for posterity, I'm also now cross-posting to my personal blog archive at -- some of this is rather dated now, but some of it is still as pertinent as it was over eight years ago.)

These are notes from my informal talk and resulting discussion from the Birds of a Feather (BoF) session from the USENIX LISA Conference in 2000. Special thanks to Jessica Cole for typing notes during the BoF, and to Mike C for inspiration!

0) Overview of the Presentation

  1. Background/Problem
  2. Goals
  3. Baby Steps
  4. Brain Pages
  5. Encourage to Write
  6. Encourage to Use
  7. Next Steps
  8. More Ideas?

1) Background/Problem

Each SA is only or best expert for a set of services

  • Hide and Seek
    • fixes are delayed while the rest of the team looks for the expert following an alarm or complaint
  • Documentation by one, for one
    • email
    • file fragments
    • post-it notes
    • palms
    • comments in programs
  • Once a set is accumulated, it persists
  • The proverbial bus, or headhunter, or vacation

As the group grows, working together should make it more effective...

  • Is the team greater or less than the sum of its parts?
  • Does the team load-balance and fail-over for high-availability?
  • Or does it crash?

2) Goals


  • Avoid stepping on toes
  • Prevent falling through cracks
  • Foster skill building
    • mentoring
    • peer training
    • justify conferences and classes - all internal training opportunities have
      already been fully utilized!

3) Baby steps

Creating and Fostering the Documentation Process

  • The documentation process goes hand-in-hand with the operations process
  • Using the documentation process *will* uncover problems with the operations
  • Documentation is one step before automation
  • Get both teams and management buy-in
  • See proceedings from related and helpful LISA 2000 Tutorials
    • S16
    • M11
    • M14
    • T11
    • T14
  • Consider designating a documentation project lead (aka Doc Nag, or Documentation
    Specialist, or Documentation Evangelist)

A note to our international readers: it was brought to my attention that non-native English speakers don't know what I meant by the word 'Nag', so here's a definition from Merriam-Webster's Collegiate Dictionary:

A nag...

  one who nags habitually 

To nag...

  Inflected Form(s): nagged; nag·ging
  Etymology: probably of Scandinavian origin; akin to Old Norse gnaga to gnaw; akin to Old English gnagan to gnaw
  Intransitive Senses:
    to find fault incessantly, COMPLAIN
    to be a persistent source of annoyance or distraction
  Transitive Senses:
    to irritate by constant scolding or urging, BADGER, WORRY
  - nag·ger noun
  - nagging adjective
  - nag·ging·ly /'na-gi[ng]-lE/ adverb

Project Leader Tasks

  • Choose a format
    • HTML with example templates
    • Manually updated index file
    • Staff-only advertising / limit access
    • Full text search
    • Document rollout via one peer review
    • Brain page theme - humor, commonly played card game in our team's spare time

4) Our First Attempt: Brain Pages

  • Page Title
    • Alludes to scope and target audience... for instance, All About pages may include:
      • services overview
      • service level statement
      • alarms
      • policy
      • reasoning behind service decisions
    • Some example titles:
      • All About foo
      • Debugging foo
      • Adding/Editing/Upgrading foo
  • Common Fields
    • author
    • maintainer
    • last edit date
    • table of contents
    • links to user docs and brain docs
    • procedure (explain everything, even down to permission details)
    • who can execute
    • when should execute
    • known problems
    • known exceptions
    • gaps in service or documentation (being partially done is no excuse not to document at all!)
    • revision history
    • contact email address for requests for changes/additions/deletions to document
    • "was this document helpful to you?" ask for feedback on the pages

5) Encourage to Write

How to get them written?

  • Doc Nag as ghost writer
  • Doc Nag as editor
    • offer to help collect data from post-its, email, code notes, etc to help take it off their hands
    • program called "script" to capture all procedures in a logfile for documentation
  • Doc Nag as shepherd through peer review (example: take the reviewing peer out for coffee so they have time to read over the doc without interruptions)
  • Try the procedure! Make sure it works!
  • Hand of pieces of the documentation process to Jr SAs
    • watch and write and/or collect and edit material from Sr SAs
    • signoff by Sr SAs is required!
    • images/photos/diagrams
    • call it in-house training!

5) Encourage to Use

  • Tie into alarms
    • trouble tickets link to documentation on how to fix the alarmed problem
  • Peer review increases awareness of the documentation project
    • positive (note that it can help reduce their calls/emails for problems)
    • negative (one person hasn't documented when everybody else has)
  • Carrots from management
    • Ask for flex time to go away for documentation, or after finishing some docs
    • Ask to work from home or outside under a tree with a laptop for a few hours a week to write
    • Ask for either or both of those for your Jr SAs in training that are transcribing your illegible notes into documented procedures
    • When one service is rolled out and documentation finished, ask for a new and interesting project that you really want to do, now that you've handed maintenance of the service off to the team!
  • Quantify vs. anecdotes (measure hits on website pages, measure tickets solved by docs)
    • financial or small awards as motivation; annual review, salary negotiations, flex-time

6) Next Steps

  • More automation
  • Better indexing/searching
  • Revision control and history
  • Contribute/share knowledge via the SAGE and LOPSA mailing lists

    7) Additional ideas?

    The items below came up in our 20 minute open discussion... and the list
    will continue to grow as more people contribute, as least until I get around
    to better categorizing it!

    • When starting a doc project, keep the framework as simple as possible. Starting with a huge Oracle backend database and customizable submission tool is biting off too much at once!
    • Use professional writing style...
    • Don't be cute at the expense of being informative.
    • Use the SEE format: Statement, Explanation, Example, just as Michael said in his Tutorial session!
    • One tangible metric for judging the effectiveness, and to quantify the benefits, of a documentation system is the rollout time for new sysadmins!
    • Check out the Babble paper in the LISA 2000 Proceedings
    • POD - Plain Old Documentation
    • Test the usability of each document by having a Jr SA do it while the creator watches silently - make sure both understand expectations for this nerve-wracking exercise!
    • The program called "script" to capture all procedures in a logfile for
      documentation is really useful
    • Give power to people executing the script to change and/or recommend changes to the document (otherwise you will have an undocumented modification of docs culture among your Jr SAs)
    • When the procedure does not work as documented, page and/or call the maintainer sysadmin, even if it's 3:00 AM! That won't have to happen many times before the document gets fixed!
    • HTML cleanup tool for from Frontpage and Word - and Dreamweaver
    • w3m has an HTML to ASCII text plus color converter
    • Set up a mail alias to send all requests for changes or additions to the documentation (and/or cc the email address to the docs project coordinator)
      for anything that would be helpful
    • If you don't have a better system within reach, set up an alias and a hypermail archive for procedures and fixes documentation
    • Set a valid-until date on each document; renew it by peer review by maintainer and executors on some regular interval
    • Also accept documentation requests from other groups/team members
    • Consider web-based interface to submit to documentation
    • FAQ-o-Matic
    • WIKI or SWIKI
    • PHP annotated manual pages
    • Don't force authors to deal with specialized authoring interface or tools if they don't want to
    • XML tools convert easily to other flavors
    • Tie it more closely to the ticket system
    • SDF (built on POD) Simple Document Format, builds a Table of Contents automatically
    • Flat text files are a valid good start, and maybe the ultimately-portable format
    • Avoid editor wars
    • Realize there is a distinction between getting data and storing data in a documentation project
    • cfengine or script must be able to recreate service - as condition of a service being "production" - has reduced the volume of documentation at one company
    • ZOPE may be interesting and/or helpful

    Last Update: 18:30 Dec 11, 2000

    by adeleshakal at February 16, 2009 10:17 PM

    Blog o Matty

    Tracing block I/O operations on Linux hosts with blktrace

    I’ve been spending a bunch of time with Linux lately, and have found some really nifty tools to help me better manage the systems I support. One of these tools is blktrace, which allows you to view block I/O operations to the devices connected to your system. To see just how useful this utility [...]

    by matty at February 16, 2009 09:37 PM

    Google Blog

    From the height of this place

    I originally wrote this email for internal consumption; Presidents' Day here in the US and President Obama's recent inaugural address got me thinking about the future of the Internet, Google, and the challenges that lie ahead. The note borrows from a host of US presidential inaugural addresses to illustrate some of its points (thanks to former President Clinton for the title). Quite a few Googlers suggested I share it externally, so here it is, with just a few minor edits. - Jonathan Rosenberg

    Dear Googlers -

    Today is Presidents' Day here in the United States, when we honor the birthdays of two of our country's greatest leaders, George Washington and Abraham Lincoln. A few weeks ago many of us were lucky to witness, either in person or via TV or the web, a masterful inauguration speech by the newest President, Barack Obama. The speech was rife with poignant points and subtle historical allusions: "We the people" came directly from the US Constitution, while "all are equal, all are free, and all deserve a chance to pursue their full measure of happiness" echoes both the Declaration of Independence and Abraham Lincoln's Gettysburg Address. (Many of these nuances were only revealed to me upon reading the transcript.)

    As expected, President Obama aptly captured the wary mood of the nation. After all, we are in the midst of what is likely the worst economic situation of our lifetimes. In the US alone, 2.6M people lost their jobs in 2008, followed by nearly 600,000 more last month, and on the Monday following the inauguration companies around the world, including Caterpillar, Pfizer, ING, and Phillips announced job cuts totaling over 75,000. Add to that our dependence on fossil fuels, the resulting (and accelerating) climate change, and national security concerns, and you can feel the gravity of this pivotal moment. Eric Schmidt has called these times 'uncharted waters': none of us has been here before.

    President Obama asserted that we will face the moment with what he called new instruments and old values, values that have been "the quiet force of progress throughout history" and which must, once again, define our character. While this reference to the national character of the US was no doubt inspiring for Americans, the mention of "new instruments" was far more relevant to Google. In a way, I felt like he was talking about the Internet, which is the most powerful and comprehensive information system ever invented.

    Consider its predecessors. The famed Library at Alexandria (that's Egypt, not Virginia - some of you have GOT to get out more ;) ) was built circa 323 BC for an educated public, which actually meant very few people since the skills of literacy were deliberately withheld from the majority of the population. For several centuries monks were the keepers of the written word, painstakingly transcribing and indexing books as a means of interpreting the word of God. They were prized as much for their ability to write small, which saved on expensive paper, as for their piety.

    The first universities came about in the 4th century AD, the first formal encyclopedias didn't appear until the 16th century, the first truly public libraries appeared in the 19th century and proliferated in the 20th. Then suddenly comes the Internet, where, from the most remote villages on the planet, you can reach as much information as is held in thousands of libraries. Access to information has completed its journey from privileged to ubiquitous. At Google we are all so immersed in daily introspective exercises like product reviews, our GPS [Google Product Strategy] meetings, and budget exercises that it's easy to forget this.

    We shouldn't. In fact, since the challenges the world faces are, to a large degree, information problems, I believe the Internet is one of the "new instruments" that the President and the world can count on. And how do a great many people use the Internet? What is the first place many of them go when they conduct research, seek answers, do their work and communicate with their friends and family? Google. Ours is much more than a passing role in this next phase of history, rather we have the responsibility and duty to make the Internet as great as it can possibly be. Fortunately, that is pretty much what we all set out to do every day anyway, but now there's just a little extra pressure. Not your average 9-to-5 job.

    At Google we are all technology optimists. We intrinsically believe that the wave upon which we surf, the secular shift of information, communications, and commerce to the Internet, is still in its early stages, and that its result will be a preponderance of good.

    As we look toward the pivotal year ahead, here are a few observations on the future of the Internet for all of us to assess, consider, and carry as we do our work. (I have occasionally borrowed the inaugural words of previous presidents, sometimes cited, as with Bill Clinton's phrase which I appropriated for my title, and sometimes not.) To paraphrase President Obama, these things will not happen easily or in a short span of time, but know this my colleagues: they will happen.

    All the world's information will be accessible from the palm of every person
    Today, over 1.4 billion people, nearly a quarter of the world's population, use the Internet, with more than 200 million new people coming online every year. This is the fastest growing communications medium in history. How fast? When the Internet was first made available to the public, in 1983, there were 400 servers. Twenty five years later: well over 600 million.

    In many parts of the world people access the Internet via their mobile phones, and the numbers there are even more impressive. More than three billion people have mobile phones, with 1.2 billion new phones expected to be sold this year. More Internet-enabled phones will be sold and activated in 2009 than personal computers. China is a prime example of where these trends are coming together. It has more Internet users than any other country, at nearly 300 million, and more than 600 million mobile users — 600 million! Twenty-five years ago, Apple launched the Mac as "the computer for the rest of us." Today, the computer for the rest of us is a phone.

    This means that every fellow citizen of the world will have in his or her pocket the ability to access the world's information. As this happens, search will remain the killer application. For most people, it is the reason they access the Internet: to find answers and solve real problems.

    Our ongoing challenge is to create the perfect search engine, and it's a really hard problem. To do a perfect job, you need to understand all the world's information, and the meaning of every query. With all that understanding, you then have to produce the perfect answer instantly. Today, many queries remain very difficult to answer properly. Too often, we force users to correct our mistakes, making them refine their searches, trying new queries until they get what they need. Meanwhile, our understanding of the interplay between high-quality content, search algorithms, and personal information is just beginning.

    Why should a user have to ask us a question to get the information she needs? With her permission, why don't we surf the web on her behalf, and present interesting and relevant information to her as we come across it? This is a very hard thing to do well, as anyone who has been presented with a where-the-heck-did-that-come-from recommendation on Amazon or Netflix can attest to, but its potential is huge.

    While we're working on improving the quality of search, the web is exploding. Our infrastructure has to keep up with this growth just to maintain our current level of quality, but to actually make search smarter, our index and infrastructure need to grow at a pace FASTER than the web. Only then will we be able to reject the idea that we have to choose between latency, comprehensiveness, and relevancy; we will have the ability to preserve all our ideals.

    Solving search is a long-term quest for perfection, but the transition of information from scarce and expensive to ubiquitous and free will conclude far sooner. We will then bear witness to a true democratization of information, a time when almost everyone who wants to be online will be online, able to access virtually every bit of the world's information. This is great for our business, even greater for all the users. In fact, it's difficult to overestimate how important that moment will be. As Harry Truman said, "Democracy of information alone can supply the vitalizing force to stir the peoples of the world into triumphant action." (OK, I added the "of information" part!)

    Everyone can publish, and everyone will
    One thing that we have learned in our industry is that people have a lot to say. They are using the Internet to publish things at an astonishing pace. 120K blogs are created daily — most of them with an audience of one. Over half of them are created by people under the age of nineteen. In the US, nearly 40 percent of Internet users upload videos, and globally over fifteen hours of video are uploaded to YouTube every minute. The web is very social too: about one of every six minutes that people spend online is spent in a social network of some type.

    Publishing used to be constrained by physical limitations. You had to have a printing press and a distribution network, or a transmitter, to publish to any sort of critical mass, so broadcasting was the norm. No more. Today, most publishing is done by users for users, one-to-one or one-to-many (think of Twitter, Facebook, Wikipedia, and YouTube). Free speech is no longer just a right granted by law, but one imbued by technology.

    The era of information being more powerful when hoarded has also passed. As our economist Hal Varian has noted, in the early days of the Web every document had at the bottom, "Copyright 1997. Do not redistribute." Now those same documents have at the bottom, "Copyright 2009. Click here to send to your friends." Sharing, not guarding information, has become the golden standard on the web, so not only can anyone publish, but virtually everyone does. This is both good and bad news. No one argues the value of free speech, but the vast majority of stuff we find on the web is useless. The clamor of junk threatens to drown out voices of quality.

    Meanwhile, those voices are struggling. The most obvious example is newspapers, which have historically been the backbone of quality original reporting, a post they have mostly maintained throughout the Internet explosion. But news isn't what it used to be: by the time a paper arrives in the morning it's already stale. As written communication has evolved from long letter to short text message, news has largely shifted from thoughtful to spontaneous. The old-fashioned static news article is now just a starting point, inciting back-and-forth debate that often results in a more balanced and detailed assessment. And the old-fashioned business model of bundled news, where the classifieds basically subsidized a lot of the high-quality reporting on the front page, has been thoroughly disrupted.

    This is a problem, but since online journalism is still in its relative infancy it's one that can be solved (we're technology optimists, remember?). The experience of consuming news on the web today fails to take full advantage of the power of technology. It doesn't understand what users want in order to give them what they need. When I go to a site like the New York Times or the San Jose Mercury, it should know what I am interested in and what has changed since my last visit. If I read the story on the US stimulus package only six hours ago, then just show me the updates the reporter has filed since then (and the most interesting responses from readers, bloggers, or other sources). If Thomas Friedman has filed a column since I last checked, tell me that on the front page. Beyond that, present to me a front page rich with interesting content selected by smart editors, customized based on my reading habits (tracked with my permission). Browsing a newspaper is rewarding and serendipitous, and doing it online should be even better. This will not by itself solve the newspapers' business problems, but our heritage suggests that creating a superior user experience is the best place to start.

    Of course, the greatest user experience is pretty useless if there's nothing good to read, a truism that applies not just to newspapers but to the web in general. Just like a newspaper needs great reporters, the web needs experts. When it comes to information, not all of it is created equal and the web's future depends on attracting the best of it. There are millions of people in the world who are truly experts in their fields — scientists, scholars, artists, engineers, architects — but a great majority of them are too busy being experts in their fields to become experts in ours. They have a lot to say but no time to say it.

    Systems that facilitate high-quality content creation and editing are crucial for the Internet's continued growth, because without them we will all sink in a cesspool of drivel. We need to make it easier for the experts, journalists, and editors that we actually trust to publish their work under an authorship model that is authenticated and extensible, and then to monetize in a meaningful way. We need to make it easier for a user who sees one piece by an expert he likes to search through that expert's entire body of work. Then our users will be able to benefit from the best of both worlds: thoughtful and spontaneous, long form and short, of the ages and in the moment.

    We won't (and shouldn't) try to stop the faceless scribes of drivel, but we can move them to the back row of the arena. As Harry Truman said in 1949, "We are aided by all who want relief from the lies of propaganda — who desire truth and sincerity."

    When data is abundant, intelligence will win
    Putting the power to publish and consume content into the hands of more people in more places enables everyone to start conversations with facts. With facts, negotiations can become less about who yells louder, but about who has the stronger data. They can also be an equalizer that enables better decisions and more civil discourse. Or, as Thomas Jefferson put it at the start of his first term, "Error of opinion may be tolerated where reason is left free to combat it."

    The Internet allows for deeper and more informed participation and representation than has ever been possible. We see this happening frequently, particularly with our Geo products. The Surui tribe in the Amazon rain forest uses Google Earth to mark the boundaries of their land and work with authorities to stop illegal logging. Sokwanele, a civic action group in Zimbabwe, used the Google Maps API on their website to document reported cases of political violence and intimidation after the controversial Presidential election in March 2008. Armed with this map, the group can better convey and defend their argument that elections in Zimbabwe are neither free nor fair. The stakes couldn't be higher for these people. We can give them a fighting chance.

    Everyone should be able to defend arguments with data. To let them do so, we need tools like the Sitemaps protocol, which opens up large volumes of data previously trapped behind government firewalls. Most government websites can't be crawled, but with Sitemaps, thousands of pages have been unlocked. In the US, several states have opened up their public records through Sitemaps, and the Department of Energy's Office of Science & Technology Information made 2.3 million research findings available in just twelve hours.

    Information transparency helps people decide who is right and who is wrong and to determine who is telling the truth. When then-Senator Clinton incorrectly stated during the 2008 Presidential campaign that she had come under sniper fire during her 1996 trip to Bosnia, the Internet set her straight. This is why President Obama's promise to "do our business in the light of day" is important, because transparency empowers the populace and demands accountability as its immediate offspring.

    But as powerful as it can be in politics, data has the potential to be even more transformational in business. Oil fueled the Industrial Revolution, but data will fuel the next generation of growth. One of the largely unheralded by-products of the Internet era is how it has made the power of the most sophisticated analytical tools available to the smallest of businesses. Traditionally, business software packages have treated data reporting as a second class citizen. Here is my cool new feature, they say. Oh, you want to know how many people use it? You want the flexibility to organize and assess this data in ways that work best for you? Well, let us tell you about the analytics module! It's only tens of thousands of dollars more (not counting the 18% annual maintenance fee in perpetuity ... sucker!!)

    Fortunately that's not Google, nor can it ever be. All of our products should reflect our bias toward giving our customers, users, and partners as much data as possible - and letting them do with it what they wish. Then they can run their business like we do, by making decisions based on facts, not opinions. Here at Google the words of every colleague, from associates to vice presidents, carry the same weight so long as they are backed by data. (If you don't think we live up to this standard then please feel free to correct me ... but you better have the facts to prove it!!!)

    Hal Varian likes to say that the sexy job in the next ten years will be statisticians. After all, who would have guessed that computer engineers would be the cool job of the 90s? When every business has free and ubiquitous data, the ability to understand it and extract value from it becomes the complimentary scarce factor. It leads to intelligence, and the intelligent business is the successful business, regardless of its size. Data is the sword of the 21st century, those who wield it well, the Samurai.

    In 1913, Woodrow Wilson stated, "... and yet, it will be no cool process of mere science ... with which we face this new age of right and opportunity." Perhaps, but from our perspective the cool process of mere science, fueled by ubiquitous data and intelligence, will be quite sufficient to power new generations to success.

    The vast majority of computing will occur in the cloud
    Within the next decade, people will use their computers completely differently than how they do today. All of their files, correspondence, contacts, pictures, and videos will be stored or backed-up in the network cloud and they will access them from wherever they happen to be on whatever device they happen to hold. Access to data, applications, and content will be seamless and device-agnostic. Convergence isn't something that occurs at the device level, which was the vision we all had in the 90s as we struggled to invent that perfect gadget that did it all (witness my own unfortunate progeny, the Apple Newton, which ended tragically). Rather, devices will proliferate in many directions, but all of them will converge on the cloud. That's where our stuff, not to mention civilization's knowledge, will live.

    This doesn't mean that the access device simply becomes a juiced up version of an old 3270 terminal. To the contrary, smart programmers will figure out ways to use all that power in your hands to create great applications, and to let you run them whether or not you are connected. But it shouldn't take three minutes for the device to boot, and losing it shouldn't be a catastrophe. You'll just get a new one and it will sync instantly; all your contacts, pictures, music, files, and other stuff will automagically just be there, ready for you to log in and say "be mine".

    Still, these examples simplify and understate the true impact of what is going on with the transition to cloud computing. As Hal has noted, we are in a period of "combinatorial innovation", when there is a great availability of different component parts that innovators can combine or recombine to create new inventions. In the 1800s, it was interchangeable parts. In the 1920s, it was electronics. In the 1970s, it was integrated circuits. Today, the components of innovation are found in cloud computing, with abundant APIs, open source software, and low-cost, pay-as-you-go application services like our own App Engine and Amazon's EC2. The components are abundant and available to anyone who can get online.

    The power of innovation and the cloud are driving two trends. First, because the tools of innovation are so easy and inexpensive to access, and consumers are so numerous and easy to reach, the consumer market now gets the greatest innovations first. It's easy to forget that just twenty years ago the best technology was found in the workplace: computers, software, phone systems, etc. Thirty years ago all you software geniuses working on Search, Ads, and Apps would have been programmers at IBM; forty years ago, at NASA. Now, the best technology starts with consumers, where a Darwinian market drives innovation that far surpasses traditional enterprise tools, and migrates to the workplace only after thriving with consumers. Think of Google Video for Business, which started out as YouTube and then evolved to the enterprise. How many businesses out there have even conceived of how useful this can be to them? Not many, perhaps because only a year ago the costs of having such an internal service were prohibitive. No longer.

    Second, it used to be that every growing business would at some point have to make a big investment in computers and software for accounting systems, customer management systems, email servers, maybe even phone or video conferencing systems. Today, all of those services are available via the network cloud, and you pay for it only as you use it. So small businesses can scale up without making those huge capital investments, which is especially important in a recession. Access to sophisticated computer systems, and all the value they can deliver, was previously the realm of larger companies. Cloud computing levels that playing field so that the small business has access to the same systems that large businesses do. Given that small businesses generate most of the jobs in the economy, this is no small trend.

    We still have a long way to go in making web-based applications robust enough for businesses. Things like latency, data reliability, and security all have to be equal to or better than the currently available alternatives. The user experience needs to be fast, easy, and rich — "like reading a magazine," Larry has said. This is why we are building Chrome, Gears, V8 and more. Users now expect these apps to work perfectly for them all the time, and we need to meet that expectation.

    The real potential of cloud computing lies not in taking stuff that used to live on PCs and putting it online, but in doing things online that were previously simply impossible. Combining open standards with cloud computing will enable businesses to conduct commerce in brand new ways. For example, there is a great opportunity to take advantage of (to quote Hal again) "computer-mediated transactions". Computers now mediate virtually every commercial transaction, recording it, collecting data, and monitoring it, which means that we can now write and enforce contracts that were previously impossible. When you rent a car, you could be offered a thirty percent discount for agreeing not to exceed the speed limit, a deal that they could actually enforce with GPS reporting! Would you take it?

    Another example is machine language translation. As more people do more things online computer systems will have the opportunity to learn from the collective behavior of billions of humans. Translation will get a tiny bit smarter with each iteration. There are over 400,000 books in the modern version of the aforementioned Library of Alexandria, nearly half of them in Arabic. The culture and history that's in those books is not available to you unless you read Arabic, which of course most people don't. But soon, with the power of the cloud, they'll be able to read them anyway. This is why translation is important, because it gives us the ability, to quote John Adams from 1797, to "encourage schools, colleges, universities, academies, and every institution [to propagate] knowledge, virtue, and religion among all classes of the people."

    Lit by lightning
    Virtually every American president since George Washington has used his Inaugural Address to speak not just of the coming four years, but also of his vision for future generations. Similarly, we manage Google with a long-term focus. We live and run our business in these uncertain times, but our eyes are always on the future, on the better tomorrow that the Internet and all of its promise shall help bring to fruition. I hope that the four predictions that I have presented here will elicit your curiosity and illuminate the significance of the changes that lay ahead. They may inevitably come to pass, but their impact on us, our neighbors, our countries, and our world is not inevitably good. Hence, our challenge.

    We are standing at a unique moment in history which will help define not just the Internet for the next few years, but the Internet that individuals and societies around the world will traverse for decades. As Googlers our responsibility is nothing less than to help support the future of information, the global transition in how it is created, shared, consumed, and used to solve big problems. Our challenge is to steer incessantly toward greatness, to never think small when we can think big, to strive on with the work Larry and Sergey began over ten years ago, and from this task we will not be moved. In a world that feels like it is lit by lightning, speed wins, and we have a responsibility to our users to not retreat, to not be content to stand still, to not be complacent or near-sighted. The Internet has had a profound and remarkable impact in the past decade. Now, from the height of this place, let's appreciate its implications and pursue its promise.

    It seems only fitting to conclude this Presidents' Day treatise, which began by quoting our 44th president, with a statement from our first. And so, having thus imparted to you my sentiments as they have been awakened by the occasion which brings us together, I shall take my present leave.

    by A Googler ( at February 16, 2009 07:57 PM

    OReilly Radar

    Radar Interview with Clay Shirky

    Clay Shirky is one of the most incisive thinkers on technology and its effects on business and society. I had the pleasure to sit down with him after his keynote at the FASTForward '09 conference last week in Las Vegas.
    In this interview Clay talks about

    • The effects of low cost coordination and group action.

    • Where to find the next layer of value when many professions are being disrupted by the Internet

    • The necessary role of low cost experimentation in finding new business models

    A big thanks to the FASTForward Blog team for hosting me there.

    by Joshua-Michéle Ross at February 16, 2009 05:41 PM


    Perl’s new wave

    We all remember the “Perl is dead” hype from not so long ago. In short: Perl 6 wasn’t there yet and perl ironically wasn’t a copy of the language of the day (python, ruby, c#, …). I was positively surprised by the response of the perl community. It wasn’t the typical “our programs run fast” (to [...]

    by claudio at February 16, 2009 02:29 PM


    How Not To Make A Commercial Linux Distribution


    I have nothing against commercial Linux distribution. As a matter of fact, my first Linux experience was a commercial version of SUSE 7 almost nine years ago. I remember it had 6 CDs in a very professionally made CD pack, and SUSE did a very good job at making the installation process as user friendly as possible at that time. (Before SUSE decided to go evil). Its safe to say that I thought that the experience was good enough for me to justify paying for a Linux distribution.

    Enter iMagic OS

    Its not everyday that you see an announcement of a new commercial Linux distribution. We obviously see a lot of Linux distribution popping up every few days, which is essentially just a fork of some popular distribution out there. So what’s wrong with iMagic OS that its worth talking about?

    -> The Name: Lindows anyone? One of the lessons we have learned from the Lindows saga is that you don’t name an OS similar to a popular OS that most Linux users hate. I think very few will disagree with me that most die hard Linux users are not fond of Windows or even Mac OS. To make matters worse, they have a version which is named iMagic OS X (I kid you not! ) and iMagic OS Pro.

    iMagic OS

    -> Ubuntu Clone: If repackaging an Ubuntu release with your custom skin and pre installed applications is all it takes to make a commercial Linux Distribution. I want to do it too. So why not Linux Mint? In every sense of imagination Linux Mint is not only a much superior looking Ubuntu fork but, out of the box, it has enough custom options that it won’t be an overkill to declare it to be better than Ubuntu itself. iMagic OS is nothing more than skin-job of Ubuntu, and from the looks of the screenshot (I am not going to pay for it and find out), it isn’t a very good job either.

    -> Three versions – Three price category: It seems like the “OS X” and the “Myth” versions are based on Ubuntu (and seems to be discontinued, yet available for order). The “Pro” version seems to be based on Kubuntu and is the “flagship” product. The price ranges from $79.99 to $49.99. One of these versions comes with CrossOver Pro (big brother of Wine) to run MS products.

    iMagic OS Pro    $79.99

    iMagic OS X       $69.99

    iMagic OS Myth  $49.99

    If Bill Gates knows about this, he would be proud, even though it falls short of 6 editions of Windows 7.

    -> License agreement: It has a license agreement that will make you blush. Not once but twice, it mentions explicitly that you are can only install iMagic OS in no more than 3 computers. You have to agree in these terms (here & here) in order to purchase this OS. According to its Wikipedia entry, “It features a registration system that when violated, prevents installation of the OS, as well as new software and withdraws updates and support.”

    Don’t get me wrong. I have nothing against commercial Linux distribution, even though some of you might do. But if you are going to make a commercial Linux distribution please by all means do not make it look like or name after every proprietary OS cliché out there and against everything which Linux and free software stands for.

    If you liked this article, please share it on, StumbleUpon or Digg. I’d appreciate it. :)

    by Pavs at February 16, 2009 01:53 PM


    Street With A View - Google Maps art hack


    Sampsonia Way is a 9 block, one way alleyway in Pittsburgh, PA. It also happens to be the most exciting street in the world when viewed through Google Street View, thanks to the efforts of a neighborhood, two artists, and a conspiring Street View team.

    On May 3rd 2008, artists Robin Hewlett and Ben Kinsley invited the Google Inc. Street View team and residents of Pittsburgh's Northside to collaborate on a series of tableaux along Sampsonia Way. Neighbors, and other participants from around the city, staged scenes ranging from a parade and a marathon, to a garage band practice, a seventeenth century sword fight, a heroic rescue and much more...

    If you knew when the Street View car was coming through your neighborhood, what would you do to welcome it?

    Street With A View
    Sampsonia Way In Street View

    by Jason Striegel at February 16, 2009 11:00 AM


    Weekly Tweet Digest for 2009-02-15

    • EVE has sound?!? #
    • Weird. Dinosaur Comics colours are different in Minefield (paler) than in Vienna (more saturated). #
    • Do I really need to keep 8000 cronjob emails? #
    • Slashdot’s editors need work. Title for should have been “Slashdot slashdots Slashdot”. #
    • Argh. The plural of lens is lenses, not lens’. #
    • I think I just killed #
    • Yay, LDAP is down! Enforced coffee break! #
    • I think I’m just going to go get lunch. #
    • Yay LDAP’s back! #
    • Woo, new Metric album April 14! #
    • Hooray! Computers came back up just in time for coffee break! #
    • Re: OSCON being annual Woodstock of the open source crowd. Correct, it’s overcommercialized, overbloated, overcrowded, and used to be good. #
    • That said, there’s still a good reason to go to OSCON: hallway track. And meeting with old friends. Okay, two reasons. And Damian Conway. 3. #
    • Happy Darwin Day! #
    • What is this thing that I’m not supposed to click? #
    • Re: Why doesn’t Selig just strike home runs from years players did steroids? #
    • Another day, another server down. #
    • I hate it when bands mix bass drums to sound like you’ve blown your speakers. #
    • I’m looking at you, Wintersleep! “Miasmal Smoke & The Yellowbellied Freaks” is otherwise excellent! Why ruin it with distorted bass drums? #
    • Oh alright, I won’t rate you two stars. Three it is. But that’s it! #
    • And Shrike loses Titan #4. #
    • “Trust me. A moose costume is the best thing for moose hunting. They’ll never think you’re a hunter!” #lasttweet #
    • RT @astronomyblog Watching the count up to 1234567890 in UNIX time Nearly there. #
    • Happy 1234565432! #
    • Happy 1234567890! #
    • Haha oh god we’re all a bunch of nerds. #1234567890 #
    • Holy shit Word on OS X sucks. #
    • Don’t forget, Whedon fans, Dollhouse premiers tonight! #
    • Fuck you Amazon sellers who don’t ship to Hawaii. Fuck you. #

    by Brad at February 16, 2009 09:59 AM

    OReilly Radar

    New Zealand Goes Black

    The previous government in New Zealand enacted an amendment to the Copyright Act that required ISPs to have a policy to disconnect users after repeated accusations of infringement, over the objections of technologists. While it's possible to have a policy that requires proof rather than accusation, APRA (the RIAA of New Zealand) strongly opposes any such attempts at reasonable interpretation of Section 92. The minor parties in the coalition government oppose the "three accusations and you're offline" section and want it repealed. This is the last week before that law is due to come into effect and the Creative Freedom Foundation, a group formed to represent artists and citizens who oppose the section, has a week of protest planned to convince the ruling National Party to repeal S92.

    The first day's action was blacking out Twitter and Facebook avatars. I did it, as did Channel 3 Business News, a Creative Director at Saatchi and Saatchi, oh and Stephen Fry. Kudos to Juha Saarinen who first put out the call. This is building up to a full Internet blackout day on February 23rd. I'm delighted to say that the idea was formed at Kiwi Foo Camp, and the folks who were at Kiwi Foo have been running wild with it--building banners, releasing templates, spreading the word.

    by Nat Torkington at February 16, 2009 08:59 AM

    Four short links: 16 Feb 2009

    A lot of Python and databases today, with some hardware and Twitter pranking/security worries to taste:

    1. Free Telephony Project, Open Telephony Hardware -- professionally-designed mass-manufactured hardware for telephony projects. E.g., IP04 runs Asterisk and has four phone jacks and removable Flash storage. Software, schematics, and PCB files released under GPL v2 or later.
    2. Don't Click Prank Explained -- inside the Javascript prank going around Twitter. Transparent overlays would appear to be dangerous.
    3. Tokyo Cabinet: A Modern Implementation of DBM -- ok, so there's definitely something going on with these alternative databases. Here's the 1979 BTree library reinvented for the modern age, then extended with PyTyrant, a database server for Tokyo Cabinet that offers HTTP REST, memcached, and a simple binary protocol. Cabinet is staggeringly fast, as this article makes clear. And if that wasn't enough wow for one day, Tokyo Dystopia is the full-text search engine. The Tyrant tutorial shows you how to get the server up and running. And what would technology be without a Slideshare presentation? (via Stinky)
    4. Whoosh -- a pure Python fulltext search library.

    by Nat Torkington at February 16, 2009 08:42 AM


    Perform uniform mounting with generic NFS (16 Feb 2009)

    This article discusses the architecture and the mechanism behind a generic NFS mounter, a utility that will undoubtedly help the NFS clients by providing easier, one-point access to the files on the NFS server and by offering a more consolidated view of the NFS space. See how to automatically consolidate many different NFS versions into a uniform mount.

    February 16, 2009 07:45 AM

    Five network tricks for Linux on S/390 systems (16 Feb 2009)

    Linux brings the power of Open Source Unix tools to the S/39 mainframe. All the current versions of standard Unix services may run in a Linux partition gaining the advantages of mainframe hardware. This article shares five troubleshooting tips to counter the various problems that can arise when you bring up a Linux system on a System z series machine.

    February 16, 2009 07:45 AM

    Air and KDE 4.3. (16 Feb 2009)

    Pretty pictures from KDE 4.3."KDE will very shortly become the desktop you need and not the desktop we think you need. And I find that very very exciting." Air and KDE 4.3.

    February 16, 2009 07:45 AM

    All about Linux

    Debian GNU/Linux 5.0 Lenny Released

    Debian GNU/Linux version 5.0 code named Lenny has been released. This is a long awaited release and comes with a whole lot of features - Stability and security being the foremost. Read on to know more ...

    February 16, 2009 06:31 AM

    The Hive Archive

    Links for 2009-02-15 []

    • Motivation: Jerry Seinfeld's Productivity Secret
      He said for each day that I do my task of writing, I get to put a big red X over that day. "After a few days you'll have a chain. Just keep at it and the chain will grow longer every day. You'll like seeing that chain, especially when you get a few weeks under your belt. Your only job next is to not break the chain." "Don't break the chain," he said again for emphasis.

    February 16, 2009 06:00 AM

    Adnans SysDev


    Get Smart shoe phone


    Of all Maxwell Smart's ridiculous gadgets, the shoe phone has always been my favorite. Paul Gardner-Stephen decided to make this fantastic piece of spy tech a reality, with an Instructable that shows you how to make your own with a pair of wooden-heeled shoes, a Bluetooth headset, and a Motorolla V620.

    This shoe phone works by having a bluetooth headset in one shoe, and a mobile phone in the other. The reason for this is that when you see a mobile phone in the shoe when opened, it kind of ruins the magic of it being a shoe phone, rather than just a piece of consumer electronics wedged into a shoe.

    A Get Smart Style Shoe Phone

    by Jason Striegel at February 16, 2009 04:00 AM

    February 15, 2009


    Debian Lenny 5.0 released!

    Debian Lenny 5.0 is out !

    Yesterday, 14 February 2009, the Debian Project announced the official release of Debian GNU/Linux version 5.0 (codenamed “Lenny“). This comes after almost 2 years (22 months) from the previous stable release, “Etch” that was launched on 8 April 2007.

    “This release includes numerous updated software packages, such as the K Desktop Environment 3.5.10 (KDE), an updated version of the GNOME desktop environment 2.22.2, the Xfce 4.4.2 desktop environment, LXDE, the GNUstep desktop 7.3, X.Org 7.3, 2.4.1, GIMP 2.4.7, Iceweasel 3.0.6 (an unbranded version of Mozilla Firefox), Icedove (an unbranded version of Mozilla Thunderbird), PostgreSQL 8.3.6, MySQL 5.0.51a, GNU Compiler Collection 4.3.2, Linux kernel version 2.6.26, Apache 2.2.9, Samba 3.2.5, Python 2.5.2 and 2.4.6, Perl 5.10.0, PHP 5.2.6, Asterisk, Emacs 22, Inkscape 0.46, Nagios 3.06, Xen Hypervisor 3.2.1 (dom0 as well as domU support), OpenJDK 6b11, and more than 23,000 other ready-to-use software packages (built from over 12,000 source packages).”

    Release Announcement:

    “Upgrades to Debian GNU/Linux 5.0 from the previous release, Debian GNU/Linux 4.0 (codenamed “Etch”) are automatically handled by the aptitude package management tool for most configurations, and to a certain degree also by the apt-get package management tool. As always, Debian GNU/Linux systems can be upgraded painlessly, in place, without any forced downtime, but it is strongly recommended to read the release notes for possible issues, and for detailed instructions on installing and upgrading. The release notes will be further improved and translated to additional languages in the weeks after the release.”
    Also my little post “HowTo upgrade from Debian Etch to Lenny” might be a useful read before upgrading.

    by - Marius - at February 15, 2009 11:25 PM

    Trouble with tribbles

    Simple Web Services

    One of the things I like about XML-RPC is that is really is astonishingly easy to implement. I like to be able to understand the code I write - and that includes the things I reuse from elsewhere. And with XML-RPC, it's even better - it's normally possible to parse the XML that comes back by eye.

    Contrast that with the monstrosity that SOAP has evolved into - a huge morass of complexity, massive code size, bloat everywhere, and it's really turned into a right mess. There's been nothing simple about SOAP for a long time now.

    The snag with XML-RPC is that there are a couple of rather crippling limitations. In particular, you're stuck with 32-bit integers. I've had to enable vendor extensions in order to use 64-bit longs, and while that's fine for my client code, it makes the JKstat server far less useful for clients using other frameworks and languages.

    So I'm stuck in a hole. I'm going to stick with XML-RPC for now, as it will be good enough to test with and is really easy to develop against. (And the way I've implemented the remote access makes it trivial to put in new or replacement access services.)

    What's the way forward? I'm not at all keen on building a full web-services stack - that defeats the whole object of the exercise. There has been some discussion recently of exposing the kstat hierarchy via SNMP, but that's clunky and depends on SNMP (great if you use SNMP already, less great if you don't). I've been looking at things like JSON-RPC and RESTful web services as alternatives. The simplest approach may just be to encode everything as Strings and convert back.

    by Peter Tribble ( at February 15, 2009 09:53 PM

    Sam Ruby

    White House Feed Now Declared Invalid

    When it first appeared the White House feed had a few entries, all with the same id.  Now it has 87 such entries.  As first suggested by James Holderness, the feedvalidator now marks this feed as invalid.  It will do so for all feeds that contain ten or more entries all with the same id.  If this ends up producing too many false positives, I’ll tweak the algorithm.

    Also noted in the process: the feed itself contains a fair amount of debris.  A sytle attribute?  A meta tag?  o:p is common in content carelessly copy/pasted from Microsoft Word.

    script elements and onclick attributes generally aren’t syndication friendly.

    Using the correct mime type and adding in a self link wouldn’t be a bad idea either.

    February 15, 2009 08:21 PM

    mikas blog

    Debian GNU/Linux 5.0 codename Lenny - News for sysadmins

    Alright, Debian GNU/Linux 5.0 AKA as Lenny has been released. Time for a Debian unstable unfreeze party! 8-)

    What does the new stable release bring for system administrators? I’ll give an overview what news you might expect when upgrading from Debian GNU/Linux 4.0, codename Etch (released on 8th April 2007) to the current version Debian GNU/Linux 5.0, codename Lenny (released on 14th February 2009). I try to avoid duplicated information so make sure to read the release announcement and the official release notes for Lenny beforehand.

    Noteworthy Changes

    • initrd-tools got replaced by initramfs-tools
    • netkit-inetd got replaced by openbsd-inetd
    • the default syslog daemon sysklogd got replaced by rsyslog
    • new defaults when creating ext2/ext3 file systems: dir_index and resize_inode feature enabled by default and use blocksize = 4096, inode_size = 256 and inode_ratio = 16384 (see /etc/mke2fs.conf)
    • improved IPv6 support
    • init.d-scripts for dependency-based init systems
    • Debian-Volatile (hosting packages providing data that needs to be regularly updated over time, such as timezones definitions, anti-virus signature files,…) is an official service
    • EVMS (Enterprise Volume Management System) was removed
    • compatibility with the FHS v2.3
    • software developed for version 3.2 of the LSB
    • official Debian Lenny live systems for the amd64 and i386 architectures
    • several new d-i features


    Virtualisation related new tools:

    • ganeti: Cluster-based virtualization management software
    • libvirt-bin: Libvirt is a C toolkit to interact with the virtualization capabilities of recent versions of Linux (and other OSes). The library aims at providing a long term stable C API for different virtualization mechanisms.
    • virtinst: Programs to create and clone virtual machines
    • virt-manager: desktop application for managing virtual machines
    • xen-shell: Console based Xen administration utility
    • xenstore-utils: Xenstore utilities for Xen
    • xenwatch: Virtualization utilities, mostly for Xen

    Desktop oriented packages like virtualbox and qemu are available as well of course.

    Noteworthy Updates

    This is a (selective) list of some noteworthy updates:

    New packages

    Lenny ships over 7000 new packages. Lists of new/removed/replaced packages are available online. I’ll name 238 sysadmin related packages that might be worth a look. (Note: I don’t list addon stuff like optional server-modules, docs-only and kernel-source related packages. I plan to present some of the following packages in more detail in separate blog entries.)

    • ack-grep: A grep-like program specifically for large source trees
    • acpitail: Show ACPI information in a tail-like style
    • adns-tools: Asynchronous-capable DNS client library and utilities
    • aggregate: ipv4 cidr prefix aggregator
    • aosd-cat: an on screen display tool which uses libaosd
    • apt-cacher-ng: Caching proxy for distribution of software packages
    • apt-cross: retrieve, build and install libraries for cross-compiling
    • aptfs: FUSE filesystem for APT source repositories
    • apt-p2p: apt helper for peer-to-peer downloads of Debian packages
    • apt-transport-https: APT https transport, use ‘deb https://foo distro main’ lines in the sources.list
    • arp-scan: arp scanning and fingerprinting tool
    • array-info: command line tool reporting RAID status for several RAID types
    • balance: Load balancing solution and generic tcp proxy
    • bash-completion: programmable completion for the bash shell
    • blktrace: utilities for block layer IO tracing
    • daemonlogger: simple network packet logger and soft tap daemon
    • daemontools: a collection of tools for managing UNIX services
    • dbndns: Debian fork of djbdns, a collection of Domain Name System tools
    • dcfldd: enhanced version of dd for forensics and security
    • dctrl2xml: Debian control data to XML converter
    • debomatic: automatic build machine for Debian source packages
    • desproxy: tunnel TCP traffic through a HTTP proxy
    • detox: utility to replace problematic characters in filenames
    • di-netboot-assistant: Debian-Installer netboot assistant
    • dish: the diligence/distributed shell for parallel sysadmin
    • djbdns: a collection of Domain Name System tools
    • dns2tcp: TCP over DNS tunnel client and server
    • dnscache-run: djbdns dnscache service
    • dnshistory: Translating and storing of IP addresses from log files
    • dnsproxy: proxy for DNS queries
    • dsyslog: advanced modular syslog daemon
    • etckeeper: store /etc in git, mercurial, or bzr
    • ext3grep: Tool to help recover deleted files on ext3 filesystems
    • fair: high availability load balancer for TCP connections
    • fatresize: FAT16/FAT32 filesystem resizer
    • flog: dump STDIN to file and reopen on SIGHUP
    • freeradius-utils: FreeRadius client utilities
    • ganeti: Cluster-based virtualization management software
    • gfs2-tools: Red Hat cluster suite - global file system 2 tools
    • gitosis: git repository hosting application
    • gptsync: GPT and MBR partition tables synchronisation tool
    • grokevt: scripts for reading Microsoft Windows event log files
    • grub2: GRand Unified Bootloader, version 2
    • gt5: shell program to display visual disk usage with navigation
    • haproxy: fast and reliable load balancing reverse proxy
    • havp: HTTP Anti Virus Proxy
    • heirloom-mailx: feature-rich BSD mail(1)
    • hfsprogs: mkfs and fsck for HFS and HFS+ file systems
    • hinfo: Check address ownership and DNSBL listings for spam reporting
    • hlbr: IPS that runs over layer 2 (no TCP/IP stack required)
    • hobbit: monitoring system for systems, networks and applications - server
    • hotwire: Extensible graphical command execution shell
    • hunchentoot: the Common Lisp web server formerly known as TBNL
    • ifupdown-extra: Network scripts for ifupdown
    • ike: Shrew Soft VPN client - Daemon and libraries
    • incron: cron-like daemon which handles filesystem events
    • inoticoming: trigger actions when files hit an incoming directory
    • iodine: tool for tunneling IPv4 data through a DNS server
    • iotop: simple top-like I/O monitor
    • ipplan: web-based IP address manager and tracker
    • ips: Intelligent process status
    • iscsitarget: iSCSI Enterprise Target userland tools
    • isns: Internet Storage Naming Service
    • itop: simple top-like interrupt load monitor
    • iwatch: realtime filesystem monitoring program using inotify
    • jetring: gpg keyring mantainance using changesets
    • john: active password cracking tool
    • kanif: cluster management and administration swiss army knife
    • keepassx: Cross Platform Password Manager
    • keysafe: A safe to put your passwords in
    • killer: Background job killer
    • kpartx: create device mappings for partitions
    • kvm: Full virtualization on x86 hardware
    • latencytop: A tool for developers to visualize system latencies
    • lbcd: Return system load via UDP for remote load balancers
    • ldb-tools: LDAP-like embedded database - tools
    • ldnsutils: ldns library for DNS programming
    • lfhex: large file hex editor
    • live-helper: Debian Live build scripts
    • live-magic: GUI frontend to create Debian LiveCDs, netboot images, etc.
    • logapp: supervise execution of applications producing heavy output
    • lsat: Security auditor tool
    • lustre-utils: Userspace utilities for the Lustre filesystem
    • lwat: LDAP Web-based Administration Tool
    • maatkit: Command-line utilities for MySQL
    • mantis: web-based bug tracking system
    • memdump: memory dumper
    • memlockd: daemon to lock files into RAM
    • metainit: Generates init scripts
    • mirmon: monitor the state of mirrors
    • mkelfimage: utility to create ELF boot images from Linux kernel images
    • mongrel: A small fast HTTP library and server for Ruby
    • monkey: fast, efficient, small and easy to configure web server
    • monkeytail: tail variant designed for web developers monitoring logfiles
    • mpy-svn-stats: Simple and easy to use svn statistics generator
    • mr: a Multiple Repository management tool
    • msr-tools: Utilities for modifying MSRs from userspace
    • mtd-utils: Memory Technology Device Utilities
    • munge: authentication service to create and validate credentials
    • mxallowd: Anti-Spam-Daemon using nolisting/iptables
    • mylvmbackup: quickly creating backups of MySQL server’s data files
    • myrescue: rescue data from damaged harddisks
    • mysql-proxy: high availability, load balancing and query modification for mysql
    • mysqltuner: high-performance MySQL tuning script
    • nagvis: Visualization addon for Nagios
    • ncdu: ncurses disk usage viewer
    • netrw: netcat like tool with nice features to transport files over network
    • netsend: a speedy filetransfer and network diagnostic program
    • network-config: Simple network configuration tool
    • nfdump: netflow capture daemon
    • ngetty: getty replacement - one single daemon for all consoles
    • nilfs2-tools: Continuous Snapshotting Log-structured Filesystem
    • ninja: Privilege escalation detection system for GNU\Linux
    • noip2: client for dynamic DNS service
    • nsd3: authoritative domain name server (3.x series)
    • ntfs-3g: read-write NTFS driver for FUSE
    • nulog: Graphical firewall log analysis interface
    • nuttcp: network performance measurement tool
    • ocsinventory-server: Hardware and software inventory tool (Communication Server)
    • odt2txt: simple converter from OpenDocument Text to plain text
    • olsrd: optimized link-state routing daemon (unik-olsrd)
    • onesixtyone: fast and simple SNMP scanner
    • openais: Standards-based cluster framework (daemon and modules)
    • opencryptoki: PKCS#11 implementation for Linux (daemon)
    • openvas-client: Remote network security auditor, the client
    • ophcrack: Microsoft Windows password cracker using rainbow tables
    • op: sudo like controlled privilege escalation
    • otpw-bin: OTPW programs for generating OTPW lists
    • packeth: Ethernet packet generator
    • paperkey: extract just the secret information out ouf OpenPGP secret key
    • paris-traceroute: New version of well known tool traceroute
    • password-gorilla: a cross-platform password manager
    • pathfinderd: Daemon for X.509 Path Discovery and Validation
    • pathfinder-utils: Utilities to use with the Pathfinder Daemon
    • pcaputils: specialized libpcap utilities
    • pcp: System level performance monitoring and performance management
    • perlconsole: small program that lets you evaluate Perl code interactively
    • pgloader: loads flat data files into PostgreSQL
    • pgpool2: connection pool server and replication proxy for PostgreSQL
    • pgsnap: PostgreSQL report tool
    • pmailq: postfix mail queue manager
    • pnputils: Plug and Play BIOS utilities
    • policykit: framework for managing administrative policies and privileges
    • postfwd: Postfix policyd to combine complex restrictions in a ruleset
    • postpone: schedules commands to be executed later
    • powertop: Linux tool to find out what is using power on a laptop
    • prayer: standalone IMAP-based webmail server
    • prelude-correlator: Hybrid Intrusion Detection System [ Correlator ]
    • privbind: Allow unprivileged apps to bind to a privileged port
    • pssh: Parallel versions of SSH-based tools
    • ptop: PostgreSQL performance monitoring tool akin to top
    • pyftpd: ftp daemon with advanced features
    • rancid-core: rancid — Really Awesome New Cisco confIg Differ
    • rancid-util: Utilities for rancid
    • rdnssd: IPv6 recursive DNS server discovery daemon
    • rdup: utility to create a file list suitable for making backups
    • reglookup: utility to read and query Windows NT/2000/XP registry
    • rgmanager: Red Hat cluster suite - clustered resource group manager
    • rinse: RPM installation environment
    • rofs: Read-Only Filesystem for FUSE
    • rsyslog: enhanced multi-threaded syslogd
    • safe-rm: wrapper around the rm command to prevent accidental deletions
    • samba-tools: tools provided by the Samba suite
    • samdump2: Dump Windows 2k/NT/XP password hashes
    • scalpel: A Frugal, High Performance File Carver
    • scamper: advanced traceroute and network measurement utility
    • scanmem: Locate and modify a variable in a running process
    • schedtool: Queries/alters process’ scheduling policy and CPU affinity
    • screenie: a small and lightweight GNU screen(1) wrapper
    • scrounge-ntfs: Data recovery program for NTFS filesystems
    • ser: Sip Express Router, very fast and configurable SIP proxy
    • serverstats: a simple tool for creating graphs using rrdtool
    • shutdown-at-night: System to shut down clients at night, and wake them in the morning
    • sipcrack: SIP login dumper/cracker
    • sks: Synchronizing OpenPGP Key Server
    • slack: configuration management program for lazy admin
    • sma: Sendmail log analyser
    • smbind: PHP-based tool for managing DNS zones for BIND
    • smbnetfs: User-space filesystem for SMB/NMB (Windows) network servers and shares
    • softflowd: Flow-based network traffic analyser
    • speedometer: measure and display the rate of data across a network connection
    • spf-milter-python: RFC 4408 compliant Python SPF Milter for Sendmail and Postfix
    • spf-tools-perl: SPF tools (spfquery, spfd) based on the Mail::SPF Perl module
    • spf-tools-python: sender policy framework (SPF) tools for Python
    • sqlgrey: Postfix Greylisting Policy Server
    • ssdeep: Recursive piecewise hashing tool
    • sshfp: DNS SSHFP records generator
    • sshm: A command-line tool to manage your ssh servers
    • sshproxy: ssh gateway to apply ACLs on ssh connections
    • sslscan: Fast SSL scanner
    • strace64: A system call tracer for 64bit binaries
    • sucrack: multithreaded su bruteforcer
    • supercat: program that colorizes text for terminals and HTML
    • superiotool: Super I/O detection tool
    • system-config-lvm: A utility for graphically configuring Logical Volumes
    • system-config-printer: graphical interface to configure the printing system
    • tack: terminfo action checker
    • taktuk: efficient, large scale, parallel remote execution of commands
    • tcpwatch-httpproxy: TCP monitoring and logging tool with support for HTTP 1.1
    • terminator: Multiple GNOME terminals in one window
    • timelimit: Simple utility to limit a process’s absolute execution time
    • tipcutils: TIPC utilities
    • tor: anonymizing overlay network for TCP
    • tpm-tools: Management tools for the TPM hardware (tools)
    • tracker-utils: metadata database, indexer and search tool - commandline tools
    • tumgreyspf: external policy checker for the postfix mail server
    • ucspi-tcp: command-line tools for building TCP client-server applications
    • unbound: validating, recursive, caching DNS resolver
    • unhide: Forensic tool to find hidden processes and ports
    • uniutils: Tools for finding out what is in a Unicode file
    • unsort: reorders lines in a file in semirandom ways
    • uphpmvault: upload recovery images to HP MediaVault2 via Ethernet
    • usermode: Graphical tools for certain user account management tasks
    • utf8-migration-tool: Debian UTF-8 migration wizard
    • uuid-runtime: universally unique id library
    • vblade-persist: create/manage supervised AoE exports
    • vde2: Virtual Distributed Ethernet
    • vdmfec: recover lost blocks using Forward Error Correction
    • virtinst: Programs to create and clone virtual machines
    • virt-manager: desktop application for managing virtual machines
    • virtualbox-ose: x86 virtualization solution - binaries
    • virt-viewer: Displaying the graphical console of a virtual machine
    • watchupstream: Look for newer upstream releases
    • whirlpool: Implementation of the whirlpool hash algorithm
    • win32-loader: Debian-Installer loader for win32
    • xavante: Lua HTTP 1.1 Web server
    • xdelta3: A diff utility which works with binary files
    • xen-shell: Console based Xen administration utility
    • xenstore-utils: Xenstore utilities for Xen
    • xenwatch: Virtualization utilities, mostly for Xen
    • xfingerd: BSD-like finger daemon with qmail support
    • xl2tpd: a layer 2 tunneling protocol implementation
    • xrdp: Remote Desktop Protocol (RDP) server
    • yersinia: Network vulnerabilities check software
    • zerofree: zero free blocks from ext2/3 file-systems
    • zipcmp: compare contents of zip archives
    • zipmerge: merge zip archives
    • ziproxy: compressing HTTP proxy server

    Further Ressources

    by mika at February 15, 2009 04:26 PM


    Adnans SysDev

    KVM and Windows XP - Howtoforge

    Installing Windows XP As A KVM Guest On Ubuntu 8.10 Desktop | HowtoForge - Linux Howtos and Tutorials

    There's a bug in virt-install and virt-manager on Ubuntu 8.10 that does not let you run Windows XP as a guest under KVM. During the Windows installation, the guest needs to be rebooted, and then you get the following error, and Windows XP refuses to boot: "A disk read error occured. Press Ctrl+Alt+Del to restart". This guide shows how you can solve the problem and install Windows XP as a KVM guest on Ubuntu 8.10.


    by Adnan ( at February 15, 2009 01:03 PM


    Back from Bro Workshop

    Last week I attended the Bro Hands-On Workshop 2009. Bro is an open source network intrusion detection and traffic characterization program with a lineage stretching to the mid-1990s. I finally met Vern Paxson in person, which was great. I've known who Vern was for about 10 years but never met him or heard him speak.

    I first covered Bro in The Tao of Network Security Monitoring in 2004 with help from Chris Manders. About two years ago I posted Bro Basics and Bro Basics Follow-Up here. I haven't used Bro in production but after learning more about it in the workshop I would be comfortable using some of Bro's default features.

    I'm not going to say anything right now about using Bro. I did integrate Bro analysis into most of the cases in my all-new TCP/IP Weapons School 2.0 class at Black Hat this year. If TechTarget clears me for writing again in 2009 I will probably write some Bro articles for Traffic Talk.

    Richard Bejtlich is teaching new classes in Europe in 2009. Register by 1 Mar for the best rates.

    by Richard Bejtlich ( at February 15, 2009 12:55 PM

    SysAdmin's Diary

    Debian GNU/Linux 5.0 Released

    Yes, Debian GNU/Linux 5.0 aka Debian Lenny has been officially released! For details, kindly read the official announcement at To MyDebian gang, shall we have Makan-Makan session? February 14th, 2009 The Debian Project is pleased to announce the official release of Debian GNU/Linux version 5.0 (codenamed “Lenny”) after 22 months of constant development. Debian GNU/Linux is [...]

    by irwan at February 15, 2009 12:52 PM


    a small tip for more efficient command line usage on debian

    Debian is one of the few distros that you can’t search the bash history backward or forward for past commands by default.
    To change that behaviour you need to uncomment two lines inside /etc/inputrc.


    # alternate mappings for "page up" and "page down" to search the history
    # "\e[5~": history-search-backward
    # "\e[6~": history-search-forward


    # alternate mappings for "page up" and "page down" to search the history
    "\e[5~": history-search-backward
    "\e[6~": history-search-forward

    Example usage:
    To search through your old commands that started with “ssh” (e.g. ssh -p 551, ssh, ssh -L1111:, just type ssh and hit PgUp, you will see the previous ssh commands appearing on the command line.
    $ ssh[PgUp] transforms to $ ssh -p 551 hit PgUp again and it transforms to
    $ ssh PgUp and it becomes $ ssh -L1111:

    by site admin at February 15, 2009 09:06 AM

    Linux Poison

    How to configure Linux as Internet Gateway for small office

    This tutorial shows how to set up network-address-translation (NAT) on a Linux system with iptables rules so that the system can act as a gateway and provide internet access to multiple hosts on a local network using a single public IP address. This is achieved by rewriting the source and/or destination addresses of IP packets as they pass through the NAT system. [Note] The location of the

    by Nikesh Jauhari ( at February 15, 2009 07:43 AM

    Adnans SysDev

    OReilly Radar

    Change Happens

    Last night, I watched a 1951 British movie, The Man in the White Suit. The plot hinged on everyone's realization that a new fabric invented by a young chemist (played by Alec Guiness) would put the British textile industry out of business. The fabric never wears out and resists dirt. Both labor and mill owners unite to suppress the discovery.

    We know now, of course, that the great British woolen, cotton and silk mills did go the way of the buggy whip, prey not to new synthetic fabrics but to low cost overseas competition. At the time, it was unthinkable that the British mills would become all but extinct. When my great grandparents worked at Lister's Mill in Bradford, it employed more than 10,000 people. My mother, who grew up "back of the mill," recalls how the streets were so packed with people at closing time that there was no room for vehicles. By the time I remember visiting with my grandparents on Silk Street in the late 1960s, the mill was still active, but a shadow of its former self. Thirty years later, this monument of a once great industry was turned into shops and luxury apartments.

    I think too of how my grandmother, with the prejudices of her time, was alarmed at how the "pakis" were taking over Bradford. How pleasing it was, then, to hear from my friend Imran Ali recently about the evolution of Bradford, a rebirth in which his family from Pakistan made the city their home:

    My Grandfather came to the UK in the 50s, settling in Bradford before bringing his brothers and sons here to work and study...he was in the Indian/British Army during WW2, before finding work in various textile mills across the Bradford area.

    Coming to Bradford moved us up from that background to our family's first university graduates and now professional careers

    So it's a place I'm so fond of I can't bring myself to leave. As much as I love coming to the Bay Area, Mt. San Bruno, Sonoma county, Burlingame and my other favoriite spots aren't the same as driving over Ilkley Moor on a snowy winter's day and seeing the Dales unfurl before me.

    You might be interested to know a few interesting facts about the city since you may have last visited...

    • The University of Bradford was one of the first two to teach computer science in the UK (Manchester being the other) - though it's disputed who was first!

    • The university's school of computing gave the early UK web industry a great talent pool, including some of the founding team for Freeserve, the UK's largest ISP during the first boom, and also its biggest exit.

    • Based on that we started a non-profit collective of new media companies called bmedi@ in 2001...

    • The National Media Museum is located in Bradford and they just added their photo collections to Flickr.

    • Grant Morrison wrote a graphic novel, Vimanarama, set in Bradford.

    • The city's currently in the midst of a depression that goes back to the 2001 riots, but civic leaders have tabled an ambitious $3.2bn regeneration plan for the city's built environment.

    There's actually a lot of interesting tech stuff going on regionally - a bunch of us have kickstarted grassrootsy-stuff like BarCamps, geek dinners and are starting to help a local university model itself on the Media Labs and ITPs of the world.

    Meanwhile, Lister Park, a lovely park that I remember visiting with my grandmother, is now called The Mughal Gardens. Imran adds: "There's a Pakistani cafe near there that serves kebabs made with a sauce that's extracted from the Earth's molten core - so spicy, you can briefly see through time itself ;)"

    I won't say that this entry has that much spice, but I hope you can take a moment with me to see through time to allow wonder and delight to replace fear of change.

    We're in the midst of enormous upheaval right now, between the Scylla and Charybdis of economic meltdown and climate change, with the promise of the Singularity visible in the distance like Apollo or Athena might have appeared to Odysseus' frightened sailors.

    This is not new. History is full of optimism and despair, discovery and upheaval, with distant hope inspiring us to the great efforts that alone can save us. And despite all our attempts to prognosticate, it has a way of surprising us. The makers of The Man in the White Suit were fascinated and frightened by the possibilities of industrial chemistry: it had all the magic that today we associate with great advances in computing or synthetic biology. And inventions of new materials did in fact change the world, though not in ways that the film's creators lampooned.

    Coming to terms with change is a basic life skill. If you don't have it, it's time to put it on your self-improvement to-do list. I'm reminded of something I wrote nearly 30 years ago in my first book, just out of college, a study of the work of science-fiction writer Frank Herbert:

    One of [Herbert's] central ideas is that human consciousness exists on--and by virtue of--a dangerous edge of crisis, and that the most essential human strength is the ability to dance on that edge. The more man confronts the dangers of the unknown, the more conscious he becomes. All of Herbert's books portray and test the human ability to consciously adapt....

    It is a general principle of ecology that an ecosystem is stable not because it is secure and protected, but because it contains such diversity that some of its many types of organisms are bound to survive despite drastic changes in the environment or other adverse conditions. Herbert adds, however, that the effort of civilization to create and maintain security for its individual members, "necessarily creates the conditions of crisis because it fails to deal with change."

    In short, get with the program! The future isn't going to be like the past. What's more, it isn't going to be like any future we imagine. How wonderful that is, if only we are prepared to accept it.

    by Tim O'Reilly at February 15, 2009 05:31 AM


    Nabaztag rabbit and Pandorabots AI mashup


    Johnny Baillargeaux sent in a fun mashup that allows the Nabaztag programmable wireless rabbit to communicate with a Pandorabots AI bot service. With his software, you can write an AI script using AIML, publish it on Pandorabots, and then the output of the bot will be sent through the rabbit. Neat stuff.

    Nabaztag/Ubiquity/Pandorabots integration
    Nabaztag wifi rabbit

    by Jason Striegel at February 15, 2009 05:00 AM

    Ubuntu Geek

    February 14, 2009

    mikas blog

    Unix time: 1234567890

    I hope you know the comics of xkcd and abstrusegoose about Unix time. Unix time?

    Unix time, or POSIX time, is a system for describing points in time, defined as the number of seconds elapsed since midnight Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds. It is widely used not only on Unix-like operating systems but also in many other computing systems.

    This are my solutions to convert the Unix time ‘1234567890′ to human readable format:

    GNU date:

    % date -d @1234567890
    Sat Feb 14 00:31:30 CET 2009

    BSD date:

    % date -ur 1234567890
    Sat Feb 14 00:31:30 CET 2009


    % zsh -c 'zmodload zsh/datetime ; strftime "%c" 1234567890'
    Sat 14 Feb 2009 12:31:30 AM CET


    % python -c 'import time; print time.ctime(1234567890)'
    Sat Feb 14 00:31:30 2009


    % ruby -e 'puts'
    Sat Feb 14 00:31:30 +0100 2009


    % perl -e 'print scalar localtime(1234567890),"\n";'
    Sat Feb 14 00:31:30 2009


    % echo 'select FROM_UNIXTIME(1234567890);' | mysql -h localhost
    2009-02-14 00:31:30


    % echo "SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 1234567890 * INTERVAL '1 second';" | psql test
     2009-02-14 00:31:30+01
    (1 row)


    % echo '
    #include <stdio.h>
    #include <time.h>
    int main() {
       time_t sec;
       struct tm * ts;
       sec = (1234567890);
       ts = localtime(&sec);
       printf("%s", ctime(&sec));
       return 0;
    }' | gcc -x c - && ./a.out
    Sat Feb 14 00:31:30 2009


    % cat
    import java.util.Date;
    import java.util.TimeZone;
    class UnixTime {
            public static void main(String[] args) {
                    System.out.println(new Date(1234567890L*1000L));
    % javac && java UnixTime
    Sat Feb 14 00:31:30 CET 2009


    % echo 'new Date(1234567890*1000);' | smjs -i
    js> Sat Feb 14 2009 00:31:30 GMT+0100 (CET)


    % php --run 'print date("r", "1234567890");'
    Sat, 14 Feb 2009 00:31:30 +0100

    by mika at February 14, 2009 11:35 PM

    Adnans SysDev

    Linux Poison

    How to Create and Configure robot.txt for Apache web server

    "Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a

    by Nikesh Jauhari ( at February 14, 2009 07:43 PM

    Year in the Life of a BSD Guru

    Interview of me on SCALE Blog

    The SCALE blog is posting interviews of some of the speakers at next week's Women in Open Source mini-conference. My interview is here where I answer questions about BSD Certification, how I got into open source, and what it's like being a woman in technology.

    February 14, 2009 03:48 PM

    mikas blog

    “To be filled by O.E.M.”

      Board Info: #2
        Manufacturer: "[snip]"
        Product: "To be filled by O.E.M."
        Version: "To be filled by O.E.M."
        Serial: "To be filled by O.E.M."
        Asset Tag: "To Be Filled By O.E.M."
        Type: 0x0a (Motherboard)
        Features: 0x09
          Hosting Board
        Location: "To Be Filled By O.E.M."
        Chassis: #3

    Note: I snipped the manufacturer. Feel free to guess who is responsible for the “To be filled by O.E.M.” entries though.

    by mika at February 14, 2009 02:41 PM

    Sam Ruby

    RDFa in HTML5

    Manu Sporny: Another Q/A page has been started concerning questions surrounding RDFa, including common red-herrings and repeated discussions (with answers) that can be avoided when discussing what RDFa can and cannot do (more)

    Related links:

    Other highlights:

    • Ian Hickson: A very common architectural mistake that software engineers make is looking at five problems, seeing their commonality, and attempting to solve all five at once. The result is almost always a solution that is sub-par for all five problems.

    February 14, 2009 12:36 PM

    Linux Poison

    Keep Your Processes Running Despite A Dropped Connection

    I guess you all know this: you are connected to your server with SSH and in the middle of compiling some software (e.g. a new kernel) or doing some other task which takes lots of time, and suddenly your connection drops for some reason, and you lose your labor. This can be very annoying, but fortunately there is a small utility called screen which lets you reattach to a previous session so that

    by Nikesh Jauhari ( at February 14, 2009 09:43 AM

    Adnans SysDev

    developerworks article on using screen

    Speaking UNIX: Stayin' alive with Screen

    The command line is a powerful tool, but it has a fatal weakness: If the shell perishes, so does your work. To keep your shell and your work alive—even across multiple sessions and dropped connections—use GNU Screen, a windowing system for your console.

    by Adnan ( at February 14, 2009 09:39 AM

    Jonathan Schwartz

    JavaFX Hits 100,000,000 Milestone!

    I have some extraordinary news to share.

    As of late this evening, Sun will have shipped its 100,000,000th JavaFX runtime. Congratulations, folks! From a standing start in early December last year, JavaFX's download rate makes it the fastest growing RIA platform on the market - demonstrating the fastest adoption of any product Sun has ever shipped.

    The 100,000,000 milestone was reached just in time for us to announce the second phase of our JavaFX strategy, the release of JavaFX Mobile at next week's Mobile World Congress. JavaFX Mobile is a runtime identical to JavaFX Desktop, but preconfigured for gizmos with very small memory footprints (like mobile phones). With our newest partners, from Sony Ericsson to LG Electronics (and more adding every day), this should add a massive breadth of mobile runtimes to the converged JavaFX count - and create even more opportunity for Java developers.

    Why such a fast uptake? The Java platform continues to provide the world's most complete open source platform for a rich internet - supported by the world's largest developer community. JavaFX allows Sun to reach beyond our traditional base to creative professionals and non-coders working with audio, video and high performance graphics. And most importantly - JavaFX allows content owners to bypass potentially hostile browsers, to install applications directly on user desktops and phones. You'll see that phenomenon heat up in 2009, accelerated by the emergence of "AppStores" on every device connected to the internet.

    What's our view of the overall marketplace? Here are a few thoughts.

    First, freely distributed, open source software will continue to create enormous revenue opportunities for those that understand the underlying business model - as an example, the Java business for Sun, last quarter, delivered more than $67m in billings, up nearly 50% year over year. On an annualized basis, that means the Java client business (as distinct from the Java server business) is now a multi-hundred million dollar business, opening doors for Sun, and the Java community, across the planet. All built on freely available runtimes and source code. Free as in beer, free as in speech, and free as in market.

    Second, devices are becoming functionally equivalent - what you can do with Flash is comparable to Silverlight, and again comparable to JavaFX. We each have our specialty, but over the long haul, my view is adoption rates and business models will be a greater driver of success than the technologies themselves. Why? Because if you're Amazon building the extraordinary Kindle 2, it matters that Sun won't put its business model between you and your customers - you want the technology you select to enable your business, not your supplier's, while enabling access to the world's largest developer community. (That said, must you use JavaFX or Flash or Silverlight to be a part of the rich internet future? Well, no - Apple used Objective-C for the iPhone, after all, completely discrediting the purist notion that if the app isn't written with a web scripting language, it isn't fashion forward).

    Finally, the consumer electronics market is going to be infinitely more vibrant and competitive than the relatively stagnant personal computer market. Having just seen a host of new Java devices, from automobile dashboards and BluRay DVD players, to set top boxes, picture frames, VOIP phones and new consumer electronics... the economy might be cooling down, but the RIA market is definitely heating up.

    The Java platform is only growing in importance and value, across billions of devices. At Sun, we're planning on maintaining Java's ubiquity as the number one runtime environment, backed by the world's most price performant datacenter infrastructure, all powered by Sun's cloud. After all, the network is the computer.

    So again, congratulations to the team - and the Java community! Now, on to the next 100,000,000! (For those interested, download JavaFX SDK here.)

    by Jonathan Schwartz at February 14, 2009 07:25 AM


    Our awesome dresser

    We’ve had a dresser for a while. Big, solid, wood. With kidlet coming Mrs. CanSpice decided it was time to spice it up. She asked Randy, a good friend of ours, if he would like to paint it. Randy’s always done excellent work (check out the first four pictures in my Shakespeare In The Park 2008 photostream) so we knew our dresser would turn out awesome.

    We were definitely not disappointed. Witness our new awesome dresser:

    We can’t thank Randy enough!

    by Brad at February 14, 2009 05:04 AM

    OReilly Radar; The Falling Cost and Accelerated Speed of Group Action

    StimulusWatch.jpg is a great example of how easy it is today for people to, as Clay Shirky says, “organize without organizations.” began after Jerry Brito attended a mayor’s Conference and posted this request:

    "Let’s help President-Elect Obama do what he is promising. Let’s help him “prioritize” so the projects so that we “get the most bang for the buck” and identify those that are old school “pork coming out of Congress”. We can do this through good clean fun crowdsourcing. Who can help me take the database on the Conference of Mayors site and turn each project into a wiki-page or other mechanism where local citizens can comment on whether the project is actually needed or whether it’s a boondoggle? How can we create an app that will let citizens separate the wheat from the pork and then sort for Congress and the new administration the project in descending order or relevancy?

    Several developers read the post and got to work.  Stimuluswatch went live on February 2nd with all the features Brito had requested. Last Friday alone there were 20,000 unique hits to the site. Total time to complete, seven weeks including holidays. Total cost - about $40 in monthly hosting fees.

    I caught up with two of the developers behind the effort, Peter Snyder (via phone) and Kevin Dwyer (via email). The story they told me exemplifies how the web enables some remarkably fast group action. Here is how Kevin tells it - and pay attention to how many references there are to some form of open source, web service, or plug-and-play functionality that the team used to get this done.

    “After reading Jerry's original blog post about the US Conference of Mayors report, I quickly wrote some python code to grab (screen scrape) all of the projects from their web site and put them into a sqlite database. The lxml module was awesome for this. Brian Mount took it and remastered the database into a MySQL database. Peter Snyder then popped up and offered to build the web site using a PHP based system called CodeIgniter. It lives up to its name (and Pete is awesome) because he had a fairly complex site up in no time. Now that we had a great base for the site, Jerry wrote copy and worked up some CSS/HTML which gives the site a great look and feel. Jerry also helped us integrate disqus and tumblr, which definitely helped reduce the number of wheels we had to reinvent. I experimented with several wiki backends and settled on MediaWiki. Using a perl module, I created wiki stubs for each of the projects to give users a bit of a framework for recording any facts they researched about each project, as well as listing points in favor and against. The whole thing now runs on an Amazon EC2 image.

    Peter also pointed out that in the short time since launch, users themselves have helped cleanse errors in the data that was pulled from the mayor’s database and already begun filling out details on these local projects; including showering great disdain on the “doorbells” project.

    None of these people knew each other previously. They were brought together by blog post into a common effort. They used open source tools in rapid development. They plugged in off the shelf online social technologies (disqus, tumblr and mediawiki) to create a forum to discuss these local projects. They achieved this in seven weeks. In fact, according to Peter, “the real effort here was more like two weeks”.

    It will be interesting to see how performs as a place to allow transparency and citizen involvement in civic projects.  As we the public wait for to launch, perhaps we should just be asking them to give us the data. We can do the rest.

    by Joshua-Michéle Ross at February 14, 2009 01:42 AM


    Weekly Science Video: Conan O’Brien on boron

    News came out at the beginning of February that researchers had found a new form of super-hard boron when the element is compressed by extremely high pressures. This is the fourth known stable form of boron.

    If you read down to the bottom of that article you’ll find that the author originally got it wrong and reported that there were three stable forms of boron.

    Conan O’Brien called him on it.

    by Brad at February 14, 2009 01:18 AM

    February 13, 2009



    $ date -r 1234567890
    Fri Feb 13 13:31:30 HST 2009
    $ perl -e 'print scalar localtime(1234567890),"\n";'
    Fri Feb 13 13:31:30 2009

    by Brad at February 13, 2009 11:31 PM

    Glenn Brunette

    Solaris Security Chat in SecondLife

    Virtual Glenn is a pretty strange concept, but for those who can move past it, check this out! This is a picture of my SecondLife avatar in front of the Solaris Campus stage. On February 24th, 2009 at 9 AM PT / 12 PM ET, I will be participating in an expert chat that will be loosely based around my blog article titled Top 5 Solaris 10 Security Features You Should Be Using. I will be talking a bit about each of the five items as well as answering questions. In total, the event will last about an hour and should be a lot of fun (assuming I can overcome being a SecondLife n00b!)

    This will be my first presentation inside of a virtual world, and I would encourage anyone who is interested to get a login, a copy of the client, and join me on the 24th to have a little fun a world away. For more information, check out the Sun Virtual Worlds posting for the event! Hope to see you there!

    by gbrunett at February 13, 2009 09:50 PM

    Blog o Matty

    pbcopy / pbpaste in OS X

    I came across a nifty utility in OS X that allows you to copy / paste data to/from the clipboard without having to select text and command+c/command+v Michael-MacBook-Pro:~ (michael)> echo foo | pbcopy Michael-MacBook-Pro:~ (michael)> pbpaste foo Thats kind of neat.  What about connecting to a remote machine, executing some command, and then having that output in the clipboard of your [...]

    by mike at February 13, 2009 05:34 PM


    JUnit 3 & JUnit 4: Oil & water

    Version 4 of the popular JUnit test framework has been available for use for quite some time (In fact, it is up to version 4.5). However, many projects have a wealth of JUnit 3-style tests and developers may choose to continue using it. If, on the other hand, you decide to dip your toes in JUnit 4 waters, make it a complete immersion. Don't try to mix and match.As you may be aware, couple of

    by Sri Sankaran ( at February 13, 2009 04:28 PM

    Blog o Matty

    Easily encoding documents prior to publishing them to the web

    While reviewing the command lines on commandlinefu, I came across this nifty little gem: $ perl -MHTML::Entities -ne ‘print encode_entities($_)’ FILETOENCODE This command line snippet takes a file as an argument, and escapes all of the characters that can’t be directly published on the web (i.e., it will convert a right bracket to & gt). This is [...]

    by matty at February 13, 2009 04:17 PM

    Year in the Life of a BSD Guru

    Free BSD Foundation: Call for Proposals

    The FreeBSD Foundation has $30,000 USD allocated for the funding of development work relating to any of the major subsystems or infrastructure within the FreeBSD operating system. You do not have to be a FreeBSD commiter to submit a proposal. The PDF containing the call for proposals and instructions for submitting a proposal is

    February 13, 2009 03:45 PM

    Thinking faster

    Free Agents in the workforce

    Do you remember back in the dark ages, oh, probably five or six years ago when all was sunny and bright?  When the promise of increasing bandwidth and fascinating dot-coms would create value by "stickiness"?  When technology and brainpower would free us from the drudgery of a 9-5 job.  From the looks of the most recent economic numbers, they got the last part right.  A lot of people have been freed up from their 9-5 drudgery.

    However, what I'm most interested in examining today is the concept of the "Free Agent".  You may recall a book on the topic - Free Agent Nation - which argued that with so much well educated talent and the ability to work wherever, whenever an individual wanted, it was likely that we'd become a nation of free agents, selling our talents in short time spans to the highest bidder, then moving on.  That book was published in 2002, as the Dot Com era was beginning to fall apart.  I think that after the dot com bust and 9/11, people sought the comfort of more permanent, "stable" employment in larger organizations, so Pink's ideas did not find root in that timeframe.  However, there's a good argument to be made that the concept of the free agent may be more important in the near term future.

    Right now, many firms are struggling to determine where their core values and capabilities are, and what's extraneous to those core capabilities.  Every firm has strengths and weaknesses, but most are trying to get back to base principles, and then build from there.  What many of these firms want is flexibility in their workforce, the ability to quickly scale up and slim down.  Additionally, many would rather have a few very deeply skilled people than a big team that may not have as much skill.  I suspect that these factors, along with the increase in broadband, Software as a Service and mobility will lead to an increase in free agents.  Right now there are a number of very strong people available who have been burned by "safe" organizations - Bear Sterns anyone? - and may trust themselves over corporations again.  There are still needs for many of the talents and services these folks can provide, but newly chastened firms won't be hiring full time again soon.  The needs of the businesses may coincide with the desires of the newly departed.

    Using more "free agents" will provide greater flexibility, new thinking and perhaps a deeper skill set than retaining an internal team.  It will cost less, as firms don't have to carry retirement or health care on the free agents.  On the flip side, the free agents can build credible small businesses and control their own destinies. 

    Pink was probably right, just out of phase with history when he wrote Free Agent Nation.  The discontinuity of the downturn may lead to a new contract between businesses and talented individuals.

    by Jeffrey Phillips at February 13, 2009 02:57 PM

    Administered by Joe. Content copyright by their respective authors.