Planet Sysadmin               

          blogs for sysadmins, chosen by sysadmins...
(Click here for multi-language)

August 21, 2014

Rands in Repose

Finding the Perfect Underline

From Marcin Winchary on Medium:

The perfect underline should be visible, but unobtrusive — allowing people to realize what’s clickable, but without drawing too much attention to itself. It should be positioned at just the right distance from the text, sitting comfortably behind it for when descenders want to occupy the same space…


I believe Medium’s fastidiousness regarding typography is one of the defining characteristics of their brand.


by rands at August 21, 2014 03:28 PM

Everything Sysadmin

Maybe the problem was...

they wouldn't let me wear ear plugs.

August 21, 2014 11:28 AM

Server Density

Cloud location matters – latency, privacy, redundancy

This article was originally published on GigaOm.

Now that we’re seeing intense competition in the cloud infrastructure market, each of the vendors is looking for as many ways to differentiate itself as possible. Big wallets are required to build the infrastructure and picking the right locations to deploy that capital is becoming an important choice. Cloud vendors can be innovative on a product or technical level, but location is just as important — which geographies does your cloud vendor have data centers in and why does that matter?

Why is location important?

There are a number of reasons why a diverse range of locations is important:

  • Redundancy: Compared to the chances of a server failure, whole data center outages are rare — but they can happen. In the case of power outages, software bugs or extreme weather, it’s important to be able to distribute your workloads across multiple, independent facilities. This is not just to get redundancy across data centers but also across geographies so you can avoid local issues like bad weather or electrical faults. You need data centers close enough to minimize latency but far enough to be separated by geography.
  • Data protection: Different types of data have different locality requirements e.g. requiring personal data to remain within the EU.
  • User latency: response times for the end user are very important in certain applications, so having data centers close to your users is important, and the ability to send traffic to different regions helps simplify this. CDNs can be used for some content but connectivity is often required to the source too.

Deploying data centers around the world is not cheap, and this is the area where the big cloud providers have an advantage. It is not just a case of equipping and staffing data centers — much of the innovation is coming from how efficient those facilities are. Whether that means using the local geography to make data centers green, or building your own power systems, this all contributes to driving down prices, which can only truly be done at scale.

How do the top providers perform?

The different providers all have the concept of regions or data centers within a specific geography. Usually, these are split into multiple regions so you can get redundancy within the region, but this is not sufficient for true redundancy because the whole region could fail, or there could be a local event like a storm. Therefore, counting true geographies is important:

Cloud provider locations

Azure is in the lead with 12 regions followed by Softlayer (10), Amazon (8) and Rackspace (6). Google loses out, with only 3 regions.

Where is the investment going?

It’s somewhat surprising that Amazon has gone for so long with only a single region in Europe — although this may be about to change with evidence of a new region based in Germany. If you want redundancy then you really need at least 2 data centers nearby, otherwise latency will pose a problem. For example, replicating a production database between data centers will experience higher latency if you have to send data across the ocean (from the U.S. to Ireland, say). It’s much better to replicate between Ireland and Germany!


Softlayer is also pushing into other regions with the $1.2 billion investment it announced for new data centers in 2014. Recently it launched Hong Kong and London data centers, with more planned in North America (2), Europe (2), Brazil, UAE, India, China, Japan and Australia (2).

Softlayer network map

The major disappointment is Google. It’s spending a lot of money on infrastructure and actually have many more data centers worldwide than are part of Google Cloud – in USA (6), Europe (3) and Asia (2) – which would place it second behind Microsoft. Of course, Google is a fairly new entrant into the cloud market and most of its demand is going to be from products like search and Gmail, where consumer requirements will dominate. Given the speed at which it’s launching new features, I expect this to change soon if it’s really serious about competing with the others.

Google data center locations

What about China?

I have specifically excluded China from the figures above but it is still an interesting case. The problem is that while connectivity inside China is very good (in some regions), crossing the border can add significant latency and packet loss. Microsoft and Amazon both have regions within China, but they require a separate account and you usually have to be based in China to apply. Softlayer has announced a data center in Shanghai, so it will be interesting to see whether it can connect their global private network with good throughput. As for Google, it publicly left China 4 years ago so it may never launch a region there.

It’s clear that location is going to be a competitive advantage, one where Microsoft currently holds first place but will lose it to Softlayer soon. Given the amount of money being invested, it will be interesting to see where cloud availability expands to next.

The post Cloud location matters – latency, privacy, redundancy appeared first on Server Density Blog.

by David Mytton at August 21, 2014 10:53 AM

Chris Siebenmann

How data flows around on the client during an Amanda backup

An Amanda backup session involves a lot of processes running on the Amanda client. If you're having a problem with slow backups it can be somewhere between useful and important to understand how everything is connected to everything else and where things go.

A disclaimer: my current information is probably incomplete and it only covers one case. Hopefully it will at least give people an idea of where to start looking and how data is likely to flow around their Amanda setup.

Let's start with an ASCII art process tree of a tar-based backup with indexing (taken from an OmniOS machine with Amanda 3.3.5):

   sendbackup (1)
      sendbackup (2)
         sh -c '....'
            tar -tf -
            sed -e s/^\.//
      tar --create --file - /some/file/system ....

(In real life the two sendbackup processes are not numbered in pstree or whatever; I'm doing it here to be clear which one I'm talking about.)

Call the 'sh -c' and everything under it the 'indexing processes' and the 'tar --create' the 'backup process'. The backup process is actually making the backup; the indexing processes are reading a copy of the backup stream in order to generate the file index that amrecover will use.

Working outwards from the backup process:

  • the backup process is reading from the disk and writing to its standard output. Its standard output goes to sendbackup (2).

  • sendbackup (2) reads its standard input, which is from the backup process, and basically acts like tee; it feeds one copy of the data to amandad and another into the indexing processes.

  • the indexing processes read standard input from sendbackup (2) and wind up writing their overall standard output to amandad (the tar and sed are in a shell pipeline).

  • sendbackup (1) sits there reading the standard error of all processes under it, which I believes it forwards to amandad if it sees any output. This process will normally be not doing anything.

  • amandad reads all of these incoming streams of data (the actual backup data, the index data, and the standard error output) and forwards them over the network to your Amanda server. If you use a system call tracer on this amandad, you'll also see it reading from and writing to a pipe pair. I believe this pipe pair is being used for signaling purposes, similar to waiting for both file activity and other notifications.

    (Specifically this is being done by GLib's g_wakeup_signal() and g_wakeup_acknowledge() functions, based on tracing library call activity. I believe they're called indirectly by other GLib functions that Amanda uses.)

Under normal circumstances, everything except sendbackup (1) will be churning away making various system calls, primarily to read and write from various file descriptors. The overall backup performance will be bounded by the slowest component in the whole pipeline of data shuffling (although the indexing processes shouldn't normally be any sort of bottleneck).

Depending on the exact Amanda auth setting you're using, streams from several simultaneous backups may flow from your amandad process to the server either as multiple TCP streams or as one multiplexed TCP stream. See amanda-auth(7) for the full details. In all cases I believe that all of this communication runs through a single amandad process, making it a potential bottleneck.

(At this point I don't know whether amandad is an actual bottleneck for us or something else weird is going on.)

(I knew some of the Amanda internal flow at the time that I wrote ReasoningBackwards, but I don't think I wrote it down anywhere apart from implicitly in that entry and it was in less detail than I do now.)

by cks at August 21, 2014 04:43 AM

August 20, 2014

Rands in Repose

The End of Printed Newspaper

Clay Shirky on Medium:

Contrary to the contrived ignorance of media reporters, the future of the daily newspaper is one of the few certainties in the current landscape: Most of them are going away, in this decade. (If you work at a paper and you don’t know what’s happened to your own circulation or revenue in the last few years, now might be a good time to ask.) We’re late enough in the process that we can even predict the likely circumstance of its demise.



by rands at August 20, 2014 02:43 PM

Yellow Bricks

Vendors to check out at VMworld…

While I was making a list of companies to visit on the Solutions Exchange I figured I would share them. Cormac did so a week ago and it is an excellent list, make sure to read that one! My list is also mainly around storage, but includes some different types of solutions.

  • SimpliVity, one of the big players in the hyper-converged market… I always try to stop by, because what they deliver is a nice all-round solution. With CRN reporting that SimpliVity will start shipping on  Cisco gear soon I am guessing they will get a lot of traction!
  • Nimble, probably one of the most successful storage startups of the last 5 years… Worth looking into when you are considering investing in to new storage
  • Ravello, running nested workloads on top of AWS… who would have thought that a year or two ago?
  • Atlantis, rumor has it a new version of USX is coming out at some point in the near future and I am assuming they will demo it at VMworld…
  • Micron, seen tweets of them about an “all-flash VSAN solution” which got my attention instantly. For sure will be checking that one!
  • PernixData, well I always wanted to shake Frank Denneman’s hand ;-). Plus they just had a new major round of funding and announced great sales/growth numbers!
  • SolidFire, one of the most interesting scale-out storage solutions out there that offers end-to-end QoS!
  • Scality, object storage solution which seems to be getting good traction, want to see what it is about!
  • CloudPhysics, they always demo something exciting at VMworld. First year it was the HA simulator that blew people away… Last year they showcased a great cost comparison solution for vCHS / AWS and Azure. what will it be this year?
  • DataGravity, I have not been briefed by them… they were introduced to the world yesterday and will showcase their technology at VMworld. Recommend reading Steve Foskett’s update and visiting their booth!
  • ThousandEyes, a couple of months back someone dropped this name. I have not had much time yet to look at what they do extensively so maybe VMworld is the right time. It sounds interesting doing end-to-end monitoring of traffic from the end-user to the datacenter with all layers in between!
  • Platform9, recently came out of stealth… wrote an article on it, will want to see their demo on KVM / vSphere / Docker management.

That is it for this round, sure there are many many interesting companies on the show floor but these are just some that stood out to me for various reasons. If I have missed the name of your company please don’t feel offended, with a long list of hundreds of exhibitors I had to pick a couple.

"Vendors to check out at VMworld…" originally appeared on Follow me on twitter - @DuncanYB.

Pre-order my upcoming book Essential Virtual SAN via Pearson today!

by Duncan Epping at August 20, 2014 09:33 AM

Chris Siebenmann

Explicit error checking and the broad exception catching problem

As I was writing yesterday's entry on a subtle over-broad try in Python, it occurred to me that one advantage of a language with explicit error checking, such as Go, is that a broad exception catching problem mostly can't happen, especially accidentally. Because you check errors explicitly after every operation, it's very hard to aggregate error checks together in the way that a Python try block can fall into.

As an example, here's more or less idiomatic Go code for the same basic operation:

for _, u := range userlist {
   fi, err := os.Stat(u.hdir)
   if err != nil || !(fi.IsDir() && fi.Mode().Perm() == 0) {

(Note that I haven't actually tried to run this code so it may have a Go error. It does compile, which in a statically typed language is at least a decent sign.)

This does the stat() of the home directory and then prints the user name if either there was an error or the homedir is not a mode 000 directory, corresponding to what happened in the two branches of the Python try block. When we check for an error, we're explicitly checking the result of the os.Stat() call and it alone.

Wait, I just pulled a fast one. Unlike the Python version, this code's printing of the username is not checking for errors. Sure, the fmt.Println() is not accidentally being caught up in the error check intended for the os.Stat(), but we've exchanged this for not checking the error at all, anywhere.

(And this is sufficiently idiomatic Go that the usual tools like go vet and golint won't complain about it at all. People ignore the possibility of errors from fmt.Print* functions all the time; presumably complaining about them would create too much noise for a useful checker.)

This silent ignoring of errors is not intrinsic to explicit error checking in general. What enables it here is that Go, like C, allows you to quietly ignore all return values from a function if you want instead of forcing you to explicitly assign them to dummy variables. The real return values of fmt.Println() are:

n, err := fmt.Println(

But in my original Go code there is nothing poking us in the nose about the existence of the err return value. Unless we think about it and remember that fmt.Println() can fail, it's easy to overlook that we're completely ignoring an error here.

(We can't do the same with os.Stat() because the purpose of calling it is one of the return values, which means that we have to at least explicitly ignore the err return instead of just not remembering that it's there.)

(This is related to how exceptions force you to deal with errors, of course.)

PS: I think that Go made the right pragmatic call when it allowed totally ignoring return values here. It's not completely perfect but it's better than the real alternatives, especially since there are plenty of situations where there's nothing you can do about an error anyways.

Sidebar: how you can aggregate errors in an explicit check language

Languages with explicit error checks still allow you to aggregate errors together if you want to, but now you have to do it explicitly. The most common pattern is to have a function that returns an error indicator and performs multiple different operations, each of which can fail. Eg:

func oneuser(u user) error {
   var err error
   fi, err := os.Stat(u.hdir)
   if err != nil {
      return err
   if !(fi.IsDir() && fi.Mode().Perm() == 0) {
      _, err = fmt.Println(
   return err

If we then write code that assumes that a non-nil result from oneuser() means that the os.Stat() has failed, we've done exactly the same error aggregation that we did in Python (and with more or less the same potential consequences).

by cks at August 20, 2014 05:57 AM

August 19, 2014

Ubuntu Geek

How to configure SNMPv3 on ubuntu 14.04 server

Sponsored Link
Simple Network Management Protocol (SNMP) is an "Internet-standard protocol for managing devices on IP networks". Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks and more.It is used mostly in network management systems to monitor network-attached devices for conditions that warrant administrative attention. SNMP is a component of the Internet Protocol Suite as defined by the Internet Engineering Task Force (IETF). It consists of a set of standards for network management, including an application layer protocol, a database schema, and a set of data objects.[2]
Read the rest of How to configure SNMPv3 on ubuntu 14.04 server (481 words)

© ruchi for Ubuntu Geek, 2014. | Permalink | No comment | Add to
Post tags: , ,

Related posts

by ruchi at August 19, 2014 11:22 PM

Rands in Repose

A Path to Discovery

Saturday morning is for discovery. Multiple browser windows point me in multiple directions and I wander until I discover a thing to consider. The key to Saturday morning is not direction, the key is association. I am free associating myself across the internet looking for… something. used to be key to this experience. This now shutdown service was my soundtrack for free association Saturday mornings. allowed you to jump into a virtual room where a small handful of DJs were playing music they selected. These DJs were simply users who chose to share music rather than hang back and listen.

Rooms were given titles that would give you an idea of what you might hear, but enforcement was sketchy. Most of the pressure to stay on theme was social. Playing Britney Spears in the dubstep rooms was hilariously frowned upon. My Saturday morning move was to find a room with a moderate sized population and an interesting title. Electro-house themed rooms were a favorite with their combination of energy and lack of distracting lyrics.

Whether inspired by or another browser window, the Saturday morning moment I’m looking for is a moment of discovery. It’s a unique state of mysterious familiarity. I am presented with a song, image, or text that I instantly know is important, but have never heard, seen or read.

The essential varied systems that scale the internet are incentivized to capture your eyeballs because your attention means monetization, which when achieved allows these systems to fund their growth. They want your attention and they’ll do whatever is necessary and legal to capture it because they must do so to survive. This is their core motivation is to provide you precisely what you are searching for because doing so bring you back.

While these services are important, this is not what I’m looking for on Saturday morning. I am looking for mysterious familiarity, I am willing to head out in a seemingly random direction with belief there is value out there, but no actual evidence. I am choosing the risky path of discovery. shut down at the end of last year. They couldn’t figure out how to monetize, but in late 2011 had discovery dialed in. Sometime around Christmas 2011, I was huddled in a dubstep room sweating through my second cup of coffee and someone played this. This is great. I immediately jumped into iTunes to find the artist – nothing. Wait, wait? A Google search revealed the artist known as OVERWERK had no website and was offering his music on a pay what you can model. Huh? This amazingly talented artist was a virtual unknown, he was giving his music away, and I’d somehow found him. I’d discovered something new.

The daily tools and services we’ve surrounded ourselves with are incentivized to satisfy our urgent need for instant gratification – to make the precious moments we send on them as useful as quickly as possible. I’m on the lookout for something different. I need more tools and services that encourage serendipity as their primary function because I know how to search for what I need, but what is to discover what I do not know.

by rands at August 19, 2014 02:34 PM

Debian Admin

Install Docker on Debian Jessie 8.0 (64-bit)

Sponsored Link

Docker is an open platform for developing, shipping, and running applications. Docker is designed to deliver your applications faster. With Docker you can separate your applications from your infrastructure AND treat your infrastructure like a managed application. Docker helps you ship code faster, test faster, deploy faster, and shorten the cycle between writing code and running code.

Docker does this by combining a lightweight container virtualization platform with workflows and tooling that help you manage and deploy your applications.

At its core, Docker provides a way to run almost any application securely isolated in a container. The isolation and security allow you to run many containers simultaneously on your host. The lightweight nature of containers, which run without the extra load of a hypervisor, means you can get more out of your hardware.

Surrounding the container virtualization are tooling and a platform which can help you in several ways:

getting your applications (and supporting components) into Docker containers
distributing and shipping those containers to your teams for further development and testing
deploying those applications to your production environment, whether it be in a local data center or the Cloud.

Install Docker on Debian essie 8.0 (64-bit)

Debian 8 comes with a 3.14.0 Linux kernel, and a package which installs all its prerequisites from Debian's repository.

Note: Debian contains a much older KDE3/GNOME2 package called docker, so the package and the executable are called


To install the latest Debian package (may not be the latest Docker release):

$ sudo apt-get update
$ sudo apt-get install

To verify that everything has worked as expected:

$ sudo docker run -i -t ubuntu /bin/bash

Which should download the ubuntu image, and then start bash in a container.

Giving non-root access

The docker daemon always runs as the root user and the docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root, and so, by default, you can access it with sudo.

If you (or your Docker installer) create a Unix group called docker and add users to it, then the docker daemon will make the ownership of the Unix socket read/writable by the docker group when the daemon starts. The docker daemon must always run as the root user, but if you run the docker client as a user in the docker group then you don't need to add sudo to all the client commands. From Docker 0.9.0 you can use the -G flag to specify an alternative group.

Docker Example:

# Add the docker group if it doesn't already exist.

$ sudo groupadd docker

# Add the connected user "${USER}" to the docker group.
# Change the user name to match your preferred user.
# You may have to logout and log back in again for
# this to take effect.

$ sudo gpasswd -a ${USER} docker

# Restart the Docker daemon.

$ sudo service docker restart

For more details information check docker user guide

Sponsored Link

by ruchi at August 19, 2014 08:03 AM

Chris Siebenmann

An example of a subtle over-broad try in Python

Today I wrote some code to winnow a list of users to 'real' users with live home directories that looks roughly like the following:

for uname, hdir in userlist:
      st = os.stat(hdir)
      if not stat.S_ISDIR(st.st_mode) or \
         stat.S_IMODE(st.st_mode) == 0:
      # looks good:
      print uname
   except EnvironmentError:
      # accept missing homedir; might be a
      # temporarily missing NFS mount, we
      # can't tell.
      print uname

This code has a relatively subtle flaw because I've accidentally written an over-broad exception catcher here.

As suggested by the comment, when I wrote this code I intended the try block to catch the case where the os.stat failed. The flaw here is that print itself does IO (of course) and so can raise an IO exception. Since I have the print inside my try block, a print-raised IO exception will get caught by it too. You might think that this is harmless because the except will re-do the print and thus presumably immediately have the exception raised again. This contains two assumptions: that the exception will be raised again and that if it isn't, the output is in a good state (as opposed to, say, having written only partial output before an error happened). Neither are entirely sure things and anyways, we shouldn't be relying on this sort of thing when it's really easy to fix. Since both branches of the exception end up at the same print, all we have to do is move it outside the try: block entirely (the except case then becomes just 'pass').

(My view is that print failing is unusual enough that I'm willing to have the program die with a stack backtrace, partly because this is an internal tool. If that's not okay you'd need to put the print in its own try block and then do something if it failed, or have an overall try block around the entire operation to catch otherwise unexpected EnvironmentError exceptions.)

The root cause here is that I wasn't thinking of print as something that does IO that can throw exceptions. Basic printing is sufficiently magical that it feels different and more ordinary, so it's easy to forget that this is a possibility. It's especially easy to overlook because it's extremely uncommon for print to fail in most situations (although there are exceptions, especially in Python 3). You can also attribute this to a failure to minimize what's done inside try blocks to only things that absolutely have to be there, as opposed to things that are just kind of convenient for the flow of code.

As a side note, one of the things that led to this particular case is that I changed my mind about what should happen when the os.stat() failed because I realized that failure might have legitimate causes instead of being a sign of significant problems with an account that should cause it to be skipped. When I changed my mind I just did a quick change to what the except block did instead of totally revising the overall code, partly because this is a small quick program instead of a big system.

by cks at August 19, 2014 02:35 AM

Rands in Repose

My Favorite Kevin Cornell

After 200 issues of A List Apart, Kevin Cornell is retiring as staff illustrator. I find Cornell’s work to be gorgeous:


I was fortunate enough to have him illustrate the very first (out of print) Rands t-shirt. Can’t wait to see what he tackles next.


by rands at August 19, 2014 02:35 AM

August 18, 2014

Rands in Repose

Designer’s Guide to DPI

My hunch is a lot of folks are going to have a bunch of questions that will easily be handled by this handy guide by Sebastien Gabriel:

This guide is designed as a “get started” or introductory read for the starting to intermediate designer who wants to learn or get more knowledge about cross-DPI and cross-platform design from the very beginning.


by rands at August 18, 2014 08:57 PM

Eric's Blog

Recruiting Around a Series-A

If you’ve ever heard the saying, “the second you think it’s time for HR, it’s already too late,” then you’ll know what I’m talking about in this post. SimpleReach, having recently raised a round of funding, is now heavily in to the recruiting process on all fronts. But there is a dichotomy that comes with getting to this point…once you are at the point where can hire, you should have been recruiting for a long time.

Early on, pre-series A, you should be spending your time on ensuring traction and product-market fit. And most of the time, you are going to be doing that on a limited budget (why else would you be going out and raising a series-A if not for more budget). So on both the technology side and the business fronts, until that money comes in, you are in the grind. Thats means writing code, adding features, selling, marketing, and working to be more attractive for the fundraising process. And anyone who’s been though that can tell you, just as with anything else in the startup, it is all consuming.

Then, assuming you are successful in the raise and the money hits, you finally have the capacity to hire. You can do all the things you’ve been wanting to do for a while; more outreach, crank out more features, hire a sales team to grow revenue. The only problem is, you haven’t spent time culling leads on prospective hires. In other words, all the people you could be hiring, you haven’t started talking to or building strong enough relationships with to hand them a job offer and have them sign on board quickly.

So over the past month or so since the money hit, I have spent a lot of time in the recruiting process. That involves going through recruiters, doing the phone screens, actual interviews, and going to events to drum up more interest. And even though historically, I’ve always spent at least some time on getting the word out at events (see slides and videos from my talks), it just hasn’t seemed to be enough. All this obviously takes up a lot of time and time is one of the few precious commodities that exist in a startup. And how to recruit well is a skill that should probably be learned earlier.

But learning how is not an easy task. It is about walking a fine line between telling the world how great you are and being humble enough to let people know that you need their help to become even better (and help them become better too). So the lesson learned is, I believe, is at least two-fold, 1) at any given time, founders should spend at least 10-20% of their time recruiting and nurturing potential relationships, and 2) reinforce the idea that recruiting is everyone’s job. The value of the former is difficult to justify to yourself when you feel like you should be cranking out features or closing deals. But you will thank yourself later if you think more strategically and dedicate some of your time and mental energy to building those relationships for the future. The latter is getting all of your team on board to help you build those relationships and find those people for the future. There are a few good posts on how to recruit for startups. But ultimately, networking is an individual skill and practice and experience is the only real way to develop it.

by eric at August 18, 2014 02:37 PM


Removing packages and configurations with apt-get

Yesterday while re-purposing a server I was removing packages with apt-get and stumbled upon an interesting problem. After I removed the package and all of it's configurations, the subsequent installation did not re-deploy the configuration files.

After a bit of digging I found out that there are two methods for removing packages with apt-get. One of those method should be used if you want to remove binaries, and the other should be used if you want to remove both binaries and configuration files.

What I did

Since the method I originally used caused at least 10 minutes of head scratching; I thought it would be useful to share what I did and how to resolve it.

On my system the package I wanted to remove was supervisor which is pretty awesome btw. To remove the package I simply removed it with apt-get remove just like I've done many times before.

# apt-get remove supervisor
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 1,521 kB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 14158 files and directories currently installed.)
Removing supervisor ...
Stopping supervisor: supervisord.
Processing triggers for ureadahead ...

No issues so far, the package was removed according to apt without any issues. However, after looking around a bit I noticed that the /etc/supervisor directory still existed. As well as the supervisord.conf file.

# ls -la /etc/supervisor
total 12
drwxr-xr-x  2 root root 4096 Aug 17 19:44 .
drwxr-xr-x 68 root root 4096 Aug 17 19:43 ..
-rw-r--r--  1 root root 1178 Jul 30  2013 supervisord.conf

Considering I was planning on re-installing supervisor and I didn't want to cause any weird configuration issues as I moved from one server role to another I did what any other reasonable Sysadmin would do. I removed the directory...

# rm -Rf /etc/supervisor

I knew the supervisor package was removed, and I assumed that the package didn't remove the config files to avoid losing custom configurations. In my case I wanted to start over from scratch, so deleting the directory sounded like a reasonable thing.

# apt-get install supervisor
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/314 kB of archives.
After this operation, 1,521 kB of additional disk space will be used.
Selecting previously unselected package supervisor.
(Reading database ... 13838 files and directories currently installed.)
Unpacking supervisor (from .../supervisor_3.0b2-1_all.deb) ...
Processing triggers for ureadahead ...
Setting up supervisor (3.0b2-1) ...
Starting supervisor: Error: could not find config file /etc/supervisor/supervisord.conf
For help, use /usr/bin/supervisord -h
invoke-rc.d: initscript supervisor, action "start" failed.
dpkg: error processing supervisor (--configure):
 subprocess installed post-installation script returned error exit status 2
Errors were encountered while processing:
E: Sub-process /usr/bin/dpkg returned an error code (1)

However, it seems supervisor could not start after re-installing.

# ls -la /etc/supervisor
ls: cannot access /etc/supervisor: No such file or directory

There is good reason why supervisor wouldn't restart; because the /etc/supervisor/supervisord.conf file was missing. Shouldn't the package installation deploy the supervisord.conf file? Well, technically no. Not with the way I removed the supervisor package.

Why it didn't work

How remove works

If we look at apt-get's man page a little closer we can see why the configuration files are still there.

  remove is identical to install except that packages are removed
  instead of installed. Note that removing a package leaves its
  configuration files on the system.

As the manpage clearly says, remove will remove the package but leaves configuration files in place. This explains why the /etc/supervisor directory was lingering after removing the package; but it doesn't explain why a subsequent installation doesn't re-deploy the configuration files.

Package States

If we use dpkg to look at the supervisor package, we will start to see the issue.

# dpkg --list supervisor
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                             Version               Architecture          Description
rc  supervisor                       3.0b2-1               all                   A system for controlling process state

With the dpkg package manager a package can have more states than just being installed or not-installed. In fact there are several package states with dpkg.

  • not-installed - The package is not installed on this system
  • config-files - Only the configuration files are deployed to this system
  • half-installed - The installation of the package has been started, but not completed
  • unpacked - The package is unpacked, but not configured
  • half-configured - The package is unpacked and configuration has started but not completed
  • triggers-awaited - The package awaits trigger processing by another package
  • triggers-pending - The package has been triggered
  • installed - The packaged is unpacked and configured OK

If you look at the first column of the dpkg --list it shows rc. The r in this column means the package is remove, which as we saw above means the configuration files are left on the system. The c in this column shows that the package is in the state of config-files. Meaning, only the configuration files are deployed on this system.

When running apt-get install the apt package manager will lookup the current state of the package, when it sees that the package is already in the config-files state it simply skips the configuration file portion of the package installation. Since I manually removed the configuration files outside of the apt or dpkg process the configuration files are gone and will not be deployed with a simple apt-get install.

How to resolve it and remove configurations properly

Purging the package from my system

At this point, I found myself with a broken installation of supervisor. Luckily, we can fix the issue by using the purge option of apt-get.

# apt-get purge supervisor
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 1,521 kB disk space will be freed.
Do you want to continue [Y/n]? y
(Reading database ... 14158 files and directories currently installed.)
Removing supervisor ...
Stopping supervisor: supervisord.
Purging configuration files for supervisor ...
dpkg: warning: while removing supervisor, directory '/var/log/supervisor' not empty so not removed
Processing triggers for ureadahead ...

Purge vs Remove

The purge option of apt-get is similar to the remove function however with one difference. The purge option will remove both the package and configurations. After running apt-get purge we can see that the package was fully removed by running dpkg --list again.

# dpkg --list supervisor
dpkg-query: no packages found matching supervisor

Re-installation without error

Now that the package has been fully purged, and the state of it is now not-installed; we can re-install without errors.

# apt-get install supervisor
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/314 kB of archives.
After this operation, 1,521 kB of additional disk space will be used.
Selecting previously unselected package supervisor.
(Reading database ... 13833 files and directories currently installed.)
Unpacking supervisor (from .../supervisor_3.0b2-1_all.deb) ...
Processing triggers for ureadahead ...
Setting up supervisor (3.0b2-1) ...
Starting supervisor: supervisord.
Processing triggers for ureadahead ...

As you can see from the output above, the supervisor package has been installed and started. If we check the /etc/supervisor directory again we can also see the necessary configuration files.

# ls -la /etc/supervisor/
total 16
drwxr-xr-x  3 root root 4096 Aug 17 19:46 .
drwxr-xr-x 68 root root 4096 Aug 17 19:46 ..
drwxr-xr-x  2 root root 4096 Jul 30  2013 conf.d
-rw-r--r--  1 root root 1178 Jul 30  2013 supervisord.conf

You should probably just use purge in most cases

After running into this issue I realized, most of the times I ran apt-get remove I really wanted the functionality of apt-get purge. While it is nice to keep configurations handy in case we need them after re-installation, using remove all the time also leaves random config files to clutter your system. Free to cause configuration issues when packages are removed then re-installed.

In the future I will most likely default to apt-get purge.

Originally Posted on Go To Article

by Benjamin Cane at August 18, 2014 11:45 AM

Aaron Johnson

Links: 8-17-2014

by ajohnson at August 18, 2014 06:30 AM

Chris Siebenmann

The potential issue with Go's strings

As I mentioned back in Things I like about Go, one of the Go things that I really like is its strings (and slices in general). From the perspective of a Python programmer, what makes them great is that creating strings is cheap because they often don't require a copy. In Python, any time you touch a string you're copying some or all of it and this can easily have a real performance impact. Writing performant Python code requires considering this carefully. In Go, pretty much any string operation that just takes a subset of the string (eg trimming whitespace from the front and the end) is copy-free, so you can throw around string operations much more freely. This can make a straightforward algorithm both the right solution to your problem and pretty efficient.

(Not all interesting string operations are copy-free, of course. For example, converting a string to all upper case requires a copy, although Go's implementation is clever enough to avoid this if the string doesn't change, eg because it's already all in upper case.)

But this goodness necessarily comes with a potential badness, which is that those free substrings keep the entire original string alive in memory. What makes Go strings (and slices) so cheap is that they are just references to some chunk of underlying storage (the real data for the string or the underlying array for a slice); making a new string is just creating a new reference. But Go doesn't (currently) do partial garbage collection of string data or arrays, so if even one tiny bit of it is referred to somewhere the entire object must be retained. In other words, a string that's a single character is (currently) enough to keep a big string from being garbage collected.

This is not an issue that many people will run into, of course. To hit it you need to either be dealing with very big original strings or care a lot about memory usage (or both) and on top of that you have to create persistent small substrings of the non-persistent original strings (well, what you want to be non-persistent). Many usage patterns won't hit this; your original strings are not large, your subsets cover most of the original string anyways (for example if you break it up into words), or even the substrings don't live very long. In short, if you're an ordinary Go programmer you can ignore this. The people who care are handling big strings and keeping small chunks of them for a long time.

(This is the kind of thing that I notice because I once spent a lot of effort to make a Python program use as little memory as possible even thought it was parsing and storing chunks out of a big configuration file. This made me extra-conscious about things like string lifetimes, single-copy interned strings, and so on. Then I wrote a parser in Go, which made me consider all of these issues all over again and caused me to realize that the big string representing my entire input file was going to be kept in memory due to the bits of it that my parser was clipping out and keeping.)

By the way, I think that this is the right tradeoff for Go to make. Most people using strings will never run into this, while it's very useful that substrings are cheap. And this sort of cheap substrings also makes less work for the garbage collector; instead of a churn of variable length strings when code is using a lot of substrings (as happens in Python), you just have a churn of fixed-size string references.

Of course there's the obvious fix if your code starts running into this: create a function that 'minimizes' a string by turning it into a []byte and then back. This creates a minimized string at the cost of an extra copy over the theoretical ideal implementation and can be trivially done in Go today.

Sidebar: How strings.ToUpper() et al avoid unnecessary copies

All of the active transformation functions like ToUpper() and ToTitle() are implemented using strings.Map() and functions from the unicode package. Map() is smart enough to not start making a new string until the mapping function returns a different rune than the existing one. As a result, any similar direct use of Map() that your code has will get this behavior for free.

by cks at August 18, 2014 04:46 AM

August 17, 2014

Ubuntu Geek

How to Join Ubuntu 14.04 to Active directory using Realmd

This tutorial will explain How to Join Ubuntu 14.04 to Active directory using Realmd.Active Directory (AD) is a directory service that Microsoft developed for Windows domain networks and is included in most Windows Server operating systems as a set of processes and services.An AD domain controller authenticates and authorizes all users and computers in a Windows domain type network—assigning and enforcing security policies for all computers and installing or updating software.
Read the rest of How to Join Ubuntu 14.04 to Active directory using Realmd (197 words)

© ruchi for Ubuntu Geek, 2014. | Permalink | No comment | Add to
Post tags: , ,

Related posts

by ruchi at August 17, 2014 11:02 PM

Server Density

Chris Siebenmann

The challenges of diagnosing slow backups

This is not the techblog entry I thought I was going to write. That entry would have been a quietly triumphal one about finding and squelching an interesting and obscure issue that was causing our backups to be unnaturally slow. Unfortunately, while the particular issue is squelched our backups of at least our new fileservers are still (probably) unnaturally slow. Certainly they seem to be slower than we want. So this entry is about the challenge of trying to figure out why your backups are slow (and what you can do about it).

The first problem is that unless the cause is quite simple, you're going to wind up needing to make detailed observations of your systems while the backups are running. In fact you'll probably have to do this repeatedly. By itself this is not a problem as such. What makes it into one is that most backups are run out of hours, often well out of hours. If you need to watch the backups and the backups start happening at 11pm, you're going to be staying up. This has various knock-on consequences, including that human beings are generally not at their sharpest at 11pm.

(General purpose low level metrics collection can give you some immediate data but there are plenty of interesting backup slowdowns that cannot be caught with them, including our recent one. And backups run during the day (whether test runs or real ones) are generally going to behave somewhat differently from nighttime backups, at least if your overall load and activity are higher in the day.)

Beyond that issue, a running backup system is generally a quite complex beast with many moving parts. There are probably multiple individual backups in progress in multiple hosts, data streaming back to backup servers, and any number of processes in communication with each other about all of this. As we've seen, the actual effects of a problem in one piece can manifest far away from that piece. In addition pieces may be interfering with each other; for example, perhaps running enough backups at once on a single machine causes them to contend for a resource (even an inobvious one, since it's pretty easy to spot saturated disks, networks, CPU, et al).

Complex systems create complex failure modes, which means that there are a lot of potential inobvious things that might be going on. That's a lot of things to winnow through for potential issues that pass the smell test, don't contradict any evidence of system behavior that you already have, and ideally that can be tested in some simple way.

(And the really pernicious problems don't have obvious causes, because if they did they would be easy to at least identify.)

What writing this tells me is that this is not unique to backup systems and that I should probably sit down to diagram out the overall backup system and its resources, then apply Brendan Gregg's USE Method to all of the pieces involved in backups in a systematic way. That would at least give me a good collection of data that I could use to rule things out.

(It's nice to form hypotheses and then test them and if you get lucky you can speed everything up nicely. But there are endless possible hypotheses and thus endless possible tests, so at some point you need to do the effort to create mass winnowings.)

by cks at August 17, 2014 04:50 AM

August 16, 2014

Chris Siebenmann

Caches should be safe by default

I've been looking at disk read caching systems recently. Setting aside my other issues, I've noticed something about basically all of them that makes me twitch as a sysadmin. I will phrase it this way:

Caches should be safe by default.

By 'safe' I mean that if your cache device dies, you should not lose data or lose your filesystem. Everything should be able to continue on, possibly after some manual adjustment. The broad purpose of most caches is to speed up reads; write accelerators are a different thing and should be labeled as such. When your new cache system is doing this for you, it should not be putting you at a greater risk of data loss because of some choice it's quietly made; if writes touch the cache at all, it should default to write-through mode. To do otherwise is just as dangerous as those databases that achieve great speed through quietly not really committing their data to disk.

There is a corollary that follows from this:

Caches should clearly document when and how they aren't safe.

After I've read a cache's documentation, I should not be in either doubt or ignorance about what will or can happen if the cache device dies on me. If I am, the documentation has failed. Especially it had better document the default configuration (or the default examples or both), because the default configuration is what a lot of people will wind up using. As a corollary to the corollary, the cache documentation should probably explain what I get for giving up safety. Faster than normal writes? It's just required by the cache's architecture? Avoiding a write slowdown that the caching layer would otherwise introduce? I'd like to know.

(If documenting this stuff makes the authors of the cache embarrassed, perhaps they should fix things.)

As a side note, I think it's fine to offer a cache that's also a write accelerator. But I think that this should be clearly documented, the risks clearly spelled out, and it should not be the default configuration. Certainly it should not be the silent default.

by cks at August 16, 2014 02:41 AM

August 15, 2014

Everything Sysadmin

Simple bucket-ized stats in awk

Someone recently asked how to take a bunch of numbers from STDIN and then break them down into distribution buckets. This is simple enough that it should be do-able in awk.

Here's a simple script that will generate 100 random numbers. Bucketize them to the nearest multiple of 10, print based on # of items in bucket:

while true ; do echo $[ 1 + $[ RANDOM % 100 ]] ; done | head -100 | awk '{ bucket = int(($1 + 5) / 10) * 10 ; arr[bucket]++} END { for (i in arr) {print i, arr[i] }}' | sort -k2n,2 -k1n,1

Many people don't know that in bash, a single quote can go over multiple lines. This makes it very easy to put a little bit of awk right in the middle of your code, eliminating the need for a second file that contains the awk code itself. Since you can put newlines anywhere, you can make it very readable:


while true ; do
  echo $[ 1 + $[ RANDOM % 100 ]]
done | head -100 | \
  awk '
        bucket = int(($1 + 5) / 10) * 10 ;
      END {
        for (i in arr) {
          print i, arr[i]
' | sort -k2n,2 -k1n,1

If you want to sort by the buckets, change the sort to sort -k1n,1 -k2n,2

If you want to be a little more fancy, separate out the bucket function into a separate function. What? awk can do functions? Sure it can. You can also import values from the environment using the -v flag.


# Bucketize stdin to nearest multiple of argv[1], or 10 if no args given.
# "nearest" means 0..4.999 -> 0, 5..14.999 -> 10, etc.

# Usage:
# while true ; do echo $[ 1 + $[ RANDOM % 100 ]]; done | head -99 | 8

awk -v multiple="${1:-10}" '

function bucketize(a) {
  # Round to the nearest multiple of "multiple"
  #  (nearest... i.e. may round up or down)
  return int((a + (multiple/2)) / multiple) * multiple;

# All lines get bucketized.
{ arr[bucketize($1)]++ }

# When done, output the array.
  for (i in arr) {
    print i, arr[i]
' | sort -k2n,2 -k1n,1

I generally use Python for scripting but for something this short, awk makes sense. Sadly using awk has become a lost art.

August 15, 2014 02:28 PM

Google Blog

Through the Google lens: search trends August 8-14

Demonstrations in Missouri and the death of Robin Williams had people searching for a greater understanding this week.

Losing a Hollywood legend
First up, the news of Robin Williams’ death sparked tens of millions of searches about the beloved actor’s life and career. Legions of fans searched for every one of their favorite films from Williams’ decades-long career; top topics include Hook, Jumanji and Good Morning Vietnam. Many were looking up his most memorable quotes and roles, including the “O captain, my captain” monologue in Dead Poets Society, Genie’s first scene in Aladdin, and a standup bit about golf. Others searched for tributes by Williams’ fellow actors and comedians, like Jimmy Fallon and Conan O’Brien. And just yesterday, news that the actor had been diagnosed with Parkinson’s disease led people to the web once again.

Two days after Williams’ death, Lauren Bacall passed away at the age of 89, inspiring people to search for more information on the actress, in particular her marriage to Humphrey Bogart back in Hollywood’s golden age.
Unrest in Missouri
Protests ignited in the St. Louis suburb of Ferguson, Missouri this weekend after an unarmed teenager named Mike Brown was shot and killed by police on Saturday. People turned to search to learn more about the conflict, and searches for terms like [ferguson riot] and [ferguson shooting] rose by more than 1,000%.
Math and science phenomena
Maryam Mirzakhani, a professor of mathematics at Stanford, was awarded the 2014 Fields Medal this week for her work on understanding the symmetry of curved surfaces such as spheres. She is the first woman and first Iranian to win the prize, considered the Nobel Prize of mathematics.

Turning from one sphere to a celestial one, two astronomical events led searchers to the web to learn more. The Perseid meteor shower had its annual peak this week—and got a doodle for the occasion—and the brightest super moon of the year had everyone a little lun-y.

Ice ice bucket
This week saw a rise in searches for [als] thanks to the ALS Ice Bucket Challenge, a viral campaign to raise money to fight what’s better known as Lou Gehrig’s disease. From Martha Stewart to Justin Timberlake to your college roommate, odds are you know someone who’s dumped a bucket of icy water on themselves for the cause. The ALS Association has received millions of dollars in donations as a result, though we don’t have any numbers on how many brave folks took the plunge.

Tip of the week
Still basking in the glow of that super moon? Learn more about our familiar friend in the sky by asking your Google Search app on iPhone or Android, “How far away is the moon?” and get an answer spoken back to you. You can then ask, “How big is it?” Google will understand what “it” you’re talking about and give you the 411.

by Emily Wood ( at August 15, 2014 02:03 PM

Aaron Johnson

Links: 8-14-2014

  • How to Be Polite — The Message — Medium
    Quote: "What I found most appealing was the way that the practice of etiquette let you draw a protective circle around yourself and your emotions. By following the strictures in the book, you could drag yourself through a terrible situation and when it was all over, you could throw your white gloves in the dirty laundry hamper and move on with your life. I figured there was a big world out there and etiquette was going to come in handy along the way." Lots of great stuff in that article… also loved the "wow, that sounds hard." example.
    (categories: life conversation manners etiquette )

by ajohnson at August 15, 2014 06:30 AM

Chris Siebenmann

A consequence of NFS locking and unlocking not necessarily being fast

A while back I wrote Cross-system NFS locking and unlocking is not necessarily fast, about one drawback of using NFS locks to communicate between processes on different machines. This drawback is that it may take some time for process B on machine 2 to find out that process A on machine 1 has unlocked the shared coordination file. It turns out that this goes somewhat further than I realized at the time. Back then I looked at cross-system lock activity, but it turns out that you can see long NFS lock release delays even when the two processes are on the same machine.

If you have process A and process B on the same machine, both contending for access to the same file via file locking, you can easily see significant delays between active process A releasing the lock and waiting process B being notified that it now has the lock. I don't know enough about the NLM protocol to know if the client or server kernels can do anything to make the process go faster, but there are some client/server combinations where this delay does happen.

(If the client's kernel is responsible for periodically retrying pending locking operations until they succeed, it certainly could be smart enough to notice that another process on the machine just released a lock on the file and so now might be a good time for a another retry.)

This lock acquisition delay can have a pernicious spiraling effect on an overall system. Suppose, not entirely hypothetically, that what a bunch of processes on the same machine are locking is a shared log file. Normally a process spends very little time doing logging and most of their time doing other work. When they go to lock the log file to write a message, there's no contention, they get the lock, they basically immediately release the lock, and everyone goes on fine. But then you hit a lock collision, where processes A and B both want to write. A wins, writes its log message, and unlocks immediately. But the NFS unlock delay means that process B is then going to sit there for ten, twenty, or thirty seconds before it can do its quick write and release the lock in turn. Suppose during this time another process, C, also shows up to write a log message. Now C may be waiting too, and it too will have a big delay to acquire the lock (if locks are 'fair', eg FIFO, then it will have to wait both for B to get the lock and for the unlock delay after B is done). Pretty soon you have more and more processes piling up waiting to write to the log and things grind to a much slower pace.

I don't think that there's a really good solution to this for NFS, especially since an increasing number of Unixes are making all file locks be NFS aware (as we've found out the hard way before). It's hard to blame the Unixes that are doing this, and anyways the morally correct solution would be to make NLM unlocks wake up waiting people faster.

PS: this doesn't happen all of the time. Things are apparently somewhat variable based on the exact server and client versions involved and perhaps timing issues and so on. NFS makes life fun.

Sidebar: why the NFS server is involved here

Unless the client kernel wants to quietly transfer ownership of the lock being unlocked to another process instead of actually releasing it, the NFS server is the final authority on who has a NFS lock and it must be consulted about things. For all that any particular client machine knows, there is another process on another machine that very occasionally wakes up, grabs a lock on the log file, and does stuff.

Quietly transferring lock ownership is sleazy because it bypasses any other process trying to get the lock on another machine. One machine with a rotating set of processes could unfairly monopolize the lock if it did that.

by cks at August 15, 2014 05:48 AM

August 14, 2014

Ubuntu Geek

Install Munin (Monitoring Tool) on Ubuntu 14.04 server

Munin the monitoring tool surveys all your computers and remembers what it saw. It presents all the information in graphs through a web interface. Its emphasis is on plug and play capabilities. After completing a installation a high number of monitoring plugins will be playing with no more effort.
Read the rest of Install Munin (Monitoring Tool) on Ubuntu 14.04 server (684 words)

© ruchi for Ubuntu Geek, 2014. | Permalink | No comment | Add to
Post tags: , , ,

Related posts

by ruchi at August 14, 2014 11:25 PM

Administered by Joe. Content copyright by their respective authors.