Planet Sysadmin               

          blogs for sysadmins, chosen by sysadmins...
(Click here for multi-language)

September 22, 2014

Aaron Johnson

Links: 9-21-2014

  • Fixed mindset vs Growth mindset | Derek Sivers
    Quote: "People in a fixed mindset believe you either are or aren’t good at something, based on your inherent nature, because it’s just who you are. People in a growth mindset believe anyone can be good at anything, because your abilities are entirely due to your actions." Need to somehow get this through to the little dudes in my life.
    (categories: life motivation thinking mindset change failure )

by ajohnson at September 22, 2014 06:30 AM

Chris Siebenmann

Another side of my view of Python 3

I have been very down on Python 3 in the past. I remain sort of down on it, especially in the face of substantial non-current versions on the platforms I use and want to use, but there's another side of this that I should admit to: I kind of want to be using Python 3.

What this comes down to at its heart is that for all the nasty things I say about it, Python 3 is where the new and cool stuff is happening in Python. Python 3 is where all of the action is and I like that in general. Python 2 is dead, even if it's going to linger on for a good long while, and I can see the writing on the wall here.

(One part of that death is that increasingly, interesting new modules are only going to be Python 3 or are going to be Python 3 first and only Python 2 later and half-heartedly.)

And Python 3 is genuinely interesting. It has a bunch of new idioms to get used to, various challenges to overcome, all sorts of things to learn, and so on. All of these are things that generally excite me as a programmer and make it interesting to code stuff (learning is fun, provided I have a motivation).

Life would be a lot easier if I didn't feel this way. If I felt that Python 3 had totally failed as a language iteration, if I thought it had taken a terrible wrong turn that made it a bad idea, it would be easy to walk away from it entirely and ignore it. But it hasn't. While I dislike some of its choices and some of them are going to cause me pain, I do expect that the Python 3 changes are generally good ones (and so I want to explode them). Instead, I sort of yearn to program in Python 3.

So why haven't I? Certainly one reason is that I just haven't been writing new Python code lately (and beyond that I have real concerns about subjecting my co-workers to Python 3 for production code). But there's a multi-faceted reason beyond that, one that's going to take another entry to own up to.

(One aspect of the no new code issue is that another language has been competing for my affections and doing pretty well so far. That too is a complex issue.)

by cks at September 22, 2014 04:21 AM

September 21, 2014

Ubuntu Geek

SysAdmin1138

The alerting problem

4100 emails.

That's the approximate number of alert emails that got auto-deleted while I was away on vacation. That number will rise further before I officially come back from vacation, but it's still a big number. The sad part is, 98% of those emails are for:

  • Problems I don't care about.
  • Unsnoozable known issues.
  • Repeated alarms for the first two points (puppet, I'm looking at you)

We've made great efforts in our attempt to cut down our monitoring fatigue problem, but we're not there yet. In part this is because the old, verbose monitoring system is still up and running, in part this is due to limitations in the alerting systems we have access to, and in part due to organizational habits that over-notify for alarms under the theory of, "if we tell everyone, someone will notice."

A couple weeks ago, PagerDuty had a nice blog-post about tackling alert fatigue, and had a lot of good points to consider. I want to spend some time on point 6:

Make sure the right people are getting alerts.

How many of you have a mailing list you dump random auto-generated crap like cron errors and backup failure notices to?

This pattern is very common in sysadmin teams, especially teams that began as one or a very few people. It just doesn't scale. Also, you learn to just ignore a bunch of things like backup "failures" for always-open files. You don't build an effective alerting system with the assumption that alerts can be ignored; if you find yourself telling new hires, "oh ignore those, they don't mean anything," you have a problem.

The failure mode of tell-everyone is that everyone can assume someone else saw it first and is working on it. And no one works on it.

I've seen exactly this failure mode many times. I've even perpetrated it, since I know certain coworkers are always on top of certain kinds of alerts so I can safely ignore actually-critical alerts. It breaks down if those people have a baby and are out of the office for four weeks. Or were on the Interstate for three hours and not checking mail at that moment.

When this happens and big stuff gets dropped, technical management gets kind of cranky. Which leads to hypervigilence and...

The failure mode of tell-everyone is that everyone will pile into the problem at the same time and make things worse.

I've seen this one too. A major-critical alarm is sent to a big distribution list, six admins immediately VPN in and start doing low-impact diagnostics. Diagnostics that aren't low impact if six people are doing them at the same time. Diagnostics that aren't meant to be run in parallel and can return non-deterministic results if run that way, which tells six admins different stories about what's actually wrong sending six admins into six different directions to solve not-actually-a-problem issues.

This is the Thundering Herd problem as it applies to sysadmins.

The usual fix for this is to build in a culture of, "I've got this," emails and to look for those messages before working on a problem.

The usual fix for this fails if admins do a little "verify the problem is actually a problem" work before sending the email and stomp on each other's toes in the process.

The usual fix for that is to build a culture of, "I'm looking into it," emails.

Which breaks down if a sysadmin is reasonably sure they're the only one who saw the alert and works on it anyway. Oops.


Really, these are all examples of telling the right people about the problem, but you really do need to go into more detail than "the right people". You need, "the right person". You need an on-call schedule that will notify one or two of the Right People about problems. Build that with the expectation that if you're in the hot seat you will answer ALL alerts, and build a rotation so no one is in the hotseat long enough to start ignoring alarms, and you have a far more reliable alerting system.

PagerDuty sells such a scheduling system. But what if you can't afford X-dollars a seat for something like that? You have some options. Here is one:

An on-call distribution-list and scheduler tasks
This recipe will provide an on-call rotation using nothing but free tools. It won't work with all environments. Scripting or API access to the email system is required.

Ingredients:

    • 1 on-call distribution list.
    • A list of names of people who can go into the DL.
    • A task scheduler such as cron or Windows Task Scheduler.
    • A database of who is supposed to be on-call when (can substitute a flat file if needed)
    • A scripting language that can talk to both email system management and database.

Instructions:

Build a script that can query the database (or flat-file) to determine who is supposed to be on-call right now, and can update the distribution-list with that name. Powershell can do all of this for full MS-stack environments. For non-MS environments more creativity may be needed.

Populate the database (or flat-file) with the times and names of who is to be on-call.

Schedule execution of the script using a task scheduler.

Configure your alert-emailing system to send mail to the on-call distribution list.

Nice and free! You don't get a GUI to manage the schedule and handling on-call shift swaps will be fully manual, but you at least are now sending alerts to people who know they need to respond to alarms. You can even build the watch-list so that it'll always include certain names that always want to know whenever something happens, such as managers. The thundering herd and circle-of-not-me problems are abated.

This system doesn't handle escalations at all, that's going to cost you either money or internal development time. You kind of do get what you pay for, after all.

How long should on-call shifts be?

That depends on your alert-frequency, how long it takes to remediate an alert, and the response time required.

Alert Frequency and Remediation:

  • Faster than once per 30 minutes:
    • They're a professional fire-fighter now. This is their full-time job, schedule them accordingly.
  • One every 30 minutes to an hour:
    • If remediation takes longer than 1 minute on average, the watch-stander can't do much of anything else but wait for alerts to show up. 8-12 hours is probably the most you can expect reasonable performance.
    • If remediation takes less than a minute, 16 hours is the most you can expect because this frequency ensures no sleep will be had by the watch-stander.
  • One every 1-2 hours:
    • If remediation takes longer than 10 minutes on average, the watch-stander probably can't sleep on their shift. 16 hours is probably the maximum shift length.
    • If remediation takes less than 10 minutes, sleep is more possible. However, if your watch-standers are the kind of people who don't fall asleep fast, you can't rely on that. 1 day for people who sleep at the drop of a hat, 16 hours for the rest of us.
  • One every 2-4 hours:
    • Sleep will be significantly disrupted by the watch. 2-4 days for people who sleep at the drop of a hat. 1 day for the rest of us.
  • One every 4-6 hours:
    • If remediation takes longer than an hour, 1 week for people who sleep at the drop of a hat. 2-4 days for the rest of us.
  • Slower than one every 6 hours:
    • 1 week

Response Time:

This is a fuzzy one, since it's about work/life balance. If all alerts need to be responded to within 5 minutes of their arrival, the watch-stander needs to be able to respond in 5 minutes. This means no driving or doing anything that requires not paying attention to the phone such as kid's performances or after-work meetups. For a watch-stander that drives to work, their on-call shift can't overlap their commute.

For 30 minute response, things are easier. Driving short trips is easier, and longer ones so long as the watch-stander pulls over to check what each alert is when they arrive. Kid performances are still problematic, and longer commutes just as much.

And then there is the curve-ball known as, "define 'response'". If Response is acking the alert, that's one thing and much less disruptive to off-hours lie. If Response is defined as "starts working on the problem," that's much more disruptive since the watch-stander has to have a laptop and bandwidth at all times.

The answers here will determine what a reasonable on-call shift looks like. A week of 5 minute time-to-work is going to cause the watch-stander to be house-bound for that entire week and that sucks a lot; there better be on-call pay associated with a schedule like that or you're going to get turnover as sysadmins go work for someone less annoying.


It's more than just make sure the right people are getting alerts, it's building a system of notifying the Right People in such a way that the alerts will get responded to and handled.

This will build a better alerting system overall.

by SysAdmin1138 at September 21, 2014 11:00 PM

Aaron Johnson

WHAT I DID THIS WEEKEND: 09/21/2014

  • Did a long drive out to Offa’s Dyke (nice little playground and trail for the kids by the visitors center but otherwise nothing to write home about) and Croft Castle (more of a manor but had a beautiful garden, a nice playground and some cool WWI outfits to try on) on Saturday.
  • Took the boys geocaching around our house on Sunday, found our first five caches on a great 3.5 mile hike / walk we did right from the house. Have a profile now on geocaching.com. Looking forward to finding some caches on our trip to Iceland in a couple weeks. Bought a couple tracker bugs from Amazon that we’ll leave in Iceland.
  • Long run for the week (training for half marathon distance) was 8.3 miles this afternoon in beautiful weather. Was really interesting to see my heart rate (which seems to be relatively high on the other runs I’ve done based on the 220 – your age thing I’ve read in various places) for the first 45 minutes staying WELL below what it usually is, first time I’ve felt like the training (which is easy run on Monday, tempo run on Wednesday, interval / hill run on Friday) has paid off. I’m still super slow, but that might have been the longest run I’ve ever done in my life. Doesn’t look like tapiriik is syncing stuff from Garmin back to Runkeeper. :(

by ajohnson at September 21, 2014 09:20 PM

Server Density

Chris Siebenmann

One reason why Go can have methods on nil pointers

I was recently reading an article on why Go can call methods on nil pointers (via) and wound up feeling that it was incomplete. It's hard to talk about 'the' singular reason that Go can do this, because a lot of design decisions went into the mix, but I think that one underappreciated reason this happens is because Go doesn't have inheritance.

In a typical language with inheritance, you can both override methods on child classes and pass a pointer to a child class instance to a function that expects a pointer to the parent class instance (and the function can then call methods on that pointer). This combination implies that the actual machine code in the function cannot simply make a static call to the appropriate parent class method function; instead it must somehow go through some sort of dynamic dispatch process so that it calls the child's method function instead of the parent's when passed what is actually a pointer to a child instance.

In non-nil pointers to objects, you have a natural place to put such a vtable (or rather a pointer to it) because the object has actual storage associated with it. But a nil pointer has no storage associated with it and so you can't naturally do this. That means given a nil pointer, how do you find the correct vtable? After all it might be a nil pointer of the child class that should call child class methods.

Because Go has no inheritance this problem does not come up. If your function takes a pointer to a concrete type and you call t.Method(), the compiler statically knows which exact function you're calling; it doesn't need to do any sort of dynamic lookup. Thus it can easily make this call even when given a nil. In effect the compiler gets to rewrite a call to t.Method() to something like ttype_Method(t).

But wait, you may say. What about interfaces? These have exactly the dynamic dispatch problem I was just talking about. The answer is that Go actually represents interface values that are pointers as two pointers; one which is the actual value and another points to (among other things) a vtable for the interface (which is populated based on the concrete type). Because Go statically knows that it is dealing with an interface instead of a concrete type, the compiler builds code that calls indirectly through this vtable.

(As I found out, this can lead to a situation where what you think is a nil pointer is not actually a nil pointer as Go sees it because it has been wrapped up as an interface value.)

Of course you could do this two-pointer trick with concrete types too if you wanted to, but it would have the unfortunate effect of adding an extra word to the size of all pointers. Most languages are not interested in paying that cost just to enable nil pointers to have methods.

(Go doesn't have inheritance for other reasons; it's probably just a happy coincidence that it enables nil pointers to have methods.)

PS: it follows that if you want to add inheritance to Go for some reason, you need to figure out how to solve this nil pointer with methods problem (likely in a way that doesn't double the size of all pointers). Call this an illustration of how language features can be surprisingly intertwined with each other.

by cks at September 21, 2014 05:38 AM

September 20, 2014

Everything Sysadmin

How the internet has affected what books get published?

Someone recently asked me how the rise of the Internet has affected what books get published, specifically related to books about operating systems and other open source projects.

This is based on what I've been told by various publishers and is "conventional wisdom". Of course, an actual publisher may disagree or explain it differently, or have counterexamples.

This is the email I sent in reply:

One way that the internet has changed the book industry that is not well-known outside of publishing circles is that it has lead to the death of the reference book.

It used to be for every language or system, someone would make easy money printing a book that lists all the system calls, library calls, or configuration fields in a system. Each would be listed along with a definition, and sometimes a longer explanation or example. These reference books (and the easy profits from them) have disappeared because that's all on the web for free. For example, the entire standard Python library is documented on python.org and is updated constantly. Why publish it in a book?

Another tier of books that have disappeared are the "getting started" books for open source projects. In an effort to better market themselves, open source projects have excellent tutorials online for free. Often they are interactive. (Check out the interactive Docker Tutorial and the Go Playground to see what I mean).

Part of what has caused this is the commercialization of open source projects. If you are selling support, your easiest sales are to people that already use the product and now need support. Therefore anything you do to expand the number of people using the product expands the number of potential customers for the support services. Therefore, a greedy company gives away any documentation related to how to get started as it increases their base of potential customers. This is quite radical if you think about it... when you sell commercial software you hide this information so you can sell "professional services" that help people get started. For them, the potential market is all the people that don't use the product already. (Footnote Simon Philips's keynote at LCA 2009 )

As a result, the books that can be printed and sold profitably now need to have have some kind of "culture" component (i.e. what we used to call "soft skills"), or are general education to bring people up to speed so they can understand what's already out there, or the "cookbook" style books that list situation after situation. Actually, I love the cookbook-style books because it is more likely that I'll serendipitously come across something new by reading all the examples in a row. If I don't understand the example at least I know where to turn if I ever find myself in that situation. If I do decypher the example enough to understand why it works, I've learned a lot about the system.

What should an OS vendor do? I think they should be the ones producing cookbooks and also facilitating user-generated cookbooks. The official cookbook is a good way to proscribe marketing-endorsed methods. However there is a much bigger need for user-generated cookbooks full of the niche situations that only users can come across. Vendors should facilitate, wiki-style, this kind of thing. They may fear that it is a tacit endorsement of methods they don't approve of, but they need to recognize that a user getting their needs fulfilled is higher priority than perfection.

-Tom

P.S. I just realized that my employer's website, [stackoverflow.com)[http://stackoverflow.com] is just that kind of fulfillment. One of our sub-sites AskUbuntu.com provides exactly that kind of "user-generated cookbook" with the ability to gently push people in the right direction. (Plug: other vendors could choose to partner with us rather than create a site from scratch.)

September 20, 2014 07:28 PM

Chris Siebenmann

My view on using VLANs for security

I've recently read some criticism of the security value of VLANs. Since we use VLANs heavily I've been thinking a bit about this issue and today I feel like writing up my opinions. The short version is that I don't think using VLANs is anywhere close to being an automatic security failure. It's much more nuanced (and secure) than that.

My overall opinion is that the security of your VLANs rests on the security of the switches (and hosts) that the VLANs are carried on, barring switch bugs that allow you to hop between VLANs in various ways or to force traffic to leak from one to another. The immediate corollary is that the most secure VLANs are the ones that are on as few switches as possible. Unfortunately this cuts against both flexibility and uniformity; it's certainly easier if you have all of your main switches carry all of your VLANs by default, since that makes their configurations more similar and means it's much less work to surface a given VLAN at a given point.

(This also depends on your core network topology. A chain or a ring can force you to reconfigure multiple intermediate switches if VLAN A now needs to be visible at a new point B, whereas a star topology pretty much insures only a few directly involved switches need to be touched.)

Because they're configured (partly) through software instead of purely by physical changes, a VLAN based setup is more vulnerable to surreptitious evil changes. All an attacker has to do is gain administrative switch access and they can often make a VLAN available to something or somewhere it shouldn't be. As a corollary, it's harder to audit a VLAN-based network than one that is purely physical in that you need to check the VLAN port configurations in addition to the physical wiring.

(Since basically all modern switches are VLAN-capable even if you don't use the features, I don't think that avoiding VLANs means that an attacker who wants to get a new network on to a machine needs the machine to have a free network port. They can almost certainly arrange a way to smuggle the network to the machine as a tagged link on an existing port.)

So in summary I think that VLANs are somewhat less secure than separate physical networks but not all that much less secure, since your switches should be fairly secure in general (both physically and for configuration changes). But if you need ultimate security you do want or need to build out physically separate networks. However my suspicions are that most people don't have security needs that are this high and so are fine with using just VLANs for security isolation.

(Of course there are political situations where having many networks on one switch may force you to give all sorts of people access to that switch so that they can reconfigure 'their' network. If you're in this situation I think that you have several problems, but VLANs do seem like a bad idea because they lead to that shared switch awkwardness.)

Locally we don't have really ultra-high security needs and so our VLAN setup is good enough for us. Our per-group VLANs are more for traffic isolation than for extremely high security, although of course they and the firewalls between the VLANs do help increase the level of security.

Sidebar: virtual machines, hosts, VLANs, and security

One relatively common pattern that I've read about for virtual machine hosting is to have all of the VLANs delivered to your host machines and then to have some sort of internal setup that routes appropriate networks to all of the various virtual machines on a particular host. At one level you can say that this is obviously a point of increased vulnerability with VLANs; the host machine is basically operating as a network switch in addition to its other roles so it's an extra point of vulnerability (perhaps an especially accessible one if it can have the networking reconfigured automatically).

My view is that to say this is to misread the actual security vulnerability here. The real vulnerability is not having VLANs; it is hosting virtual machines on multiple different networks (presumably of different security levels) on the same host machine. With or without VLANs, all of those networks have to get to that host machine and thus it has access to all of them and thus can be used to commit evil with or to any of them. To really increase security here you need to deliver fewer networks to each host machine (which of course has the side effect of making them less uniform and constraining which host machines a given virtual machine can run on).

(The ultimate version is that each host machine is only on a single network for virtual machines, which means you need at least as many host machines as you have networks you want to deploy VMs on. This may not be too popular with the people who set your budgets.)

by cks at September 20, 2014 03:21 AM

September 19, 2014

Google Blog

Through the Google lens: search trends Sept 12-18

-Welcome to this week’s search trends. May I take your order?
-Can I have a referendum on independence, a totally inappropriate flight passenger with a Hollywood baby on the side?
-Coming right up!

Flag and country
“They may take away our lives, but they'll never take our freedom!” That was Sir William Wallace battlecry for Scottish independence in the film Braveheart. While this week’s events in Scotland weren’t quite as cinematic, the results could have been revolutionary. On Thursday the world watched and searched as an unprecedented numbers of Scots went to the polls to answer the question, "Should Scotland be independent from the United Kingdom?" Turns out the majority of people don’t think it should, and voted to stay a member of the U.K. Party leaders have now promised significant constitutional changes for the entire kingdom. What would Wallace have made of that?

The comeback kings
Everybody loves a comeback and search had its fair share this week. First up, nostalgia for the 90’s brought Surge soda back from the dead. Thanks to a Facebook campaign called "The SURGE Movement," Coca-Cola will now sell its "fully-loaded citrus” soft drink for a limited time on Amazon. And the Chicago Bears denied the 49ers a win in their brand-spanking-new stadium when they rallied to overturn a 13-point deficit in the last quarter to beat San Francisco 28-20.



Airing dirty laundry
Hard plastic-y seats, broken recliner adjusters, zero leg room—flying economy isn’t always the most pleasant experience. And depending on who you’re sitting next to, your easy two-hour flight could turn into a nightmare before you even take off. But the passengers of the world aren’t having it, not anymore. This week, “passenger shaming” went viral on social media as traumatized travelers shared photos of the most absurdly obnoxious unconscientious things some passenger do on flights—we’re talking bare feet, bare skin... well, you should just see for yourself.

But at least those offending fliers were shielded in anonymity. Singer Robin Thicke wasn’t afforded the same luxury, revealing in a court deposition this week that he had little to do with the creation of last year’s song of the summer “Blurred Lines.” As part of his defense against a copyright infringement lawsuit, Thicke admitted that he was under the influence of drugs and alcohol for most of 2013—bringing a whole new meaning to the song’s title.

And the winner is ...
The hipster revolution has finally taken over the United States! Need proof? Searchers don’t. When New Yorker Kira Kazantsev won the the title of Miss America, the Internet discovered that the U.S.A’s new leading lady is a former food blogger. She’s even reported on her state’s crown foodie jewel, the cronut. Miss America wasn’t the only who got to bask in the limelight; boxing world champion Floyd “Money” Mayweather Jr. won his rematch with contender Marcos Maidana by an unanimous decision. The victory brings his undefeated tally to 47… somehow the title world champion is starting to sound like an understatement.

Love on the set!
For Orange is the New Black screenwriter Lauren Morelli, life imitated art a bit more than she probably expected. While writing the hit program, Morelli decided to divorce her husband and start a relationship with Samira Wiley, an actress from the show. Meanwhile, searchers learned that Mindy Kaling considers former The Office castmate and on-screen boyfriend B.J. Novak “the love that got away.” But while not all on-set relationships last, some couples not only make it work but also take their relationship to the next level. That’s the route taken by Ryan Gosling and Eva Mendes, who met while making the movie The Place Beyond the Pines. The power couple welcomed baby girl Gosling earlier this week.

Tip of the week
The NFL season’s just getting started so it’s time to hunker down and plan your football viewing schedule. Just say, “OK Google, show me the NFL schedule” to coordinate your life for the next four months. We’ll see you back in the spring.

by Emily Wood (noreply@blogger.com) at September 19, 2014 03:36 PM

Chris Siebenmann

What I mean by passive versus active init systems

I have in the past talked about passive versus active init systems without quite defining what I meant by that, except sort of through context. Since this is a significant division between init systems that dictates a lot of other things, I've decided to fix that today.

Put simply, an active init system is one that actively tracks the status of services as part of its intrinsic features; a passive init system is one that does not. The minimum behavior of an active init system is that it knows what services have been activated and not later deactivated. Better active init systems know whether services are theoretically still active or if they've failed on their own.

(Systemd, upstart, and Solaris's SMF are all active init systems. In general any 'event-based' init system that starts services in response to events will need to be active, because it needs to know which services have already been started and which ones haven't and thus are candidates for starting now. System V init's /etc/init.d scripts are a passive init system, although /etc/inittab is an active one. Most modern daemon supervision systems are active systems.)

One direct consequence is that an active init system essentially has to do all service starting and stopping itself, because this is what lets it maintain an accurate record of what services are active. You may run commands to do this, but they have to talk to the init system itself. By contrast, in a passive init system the commands you run to start and stop services can be and often are just shell scripts; this is the archetype of System V init.d scripts. You can even legitimately start and stop services outside of the scripts at all, although things may get a bit confused.

(In the *BSDs things can be even simpler in that you don't have scripts and you may just run the daemons. I know that OpenBSD tends to work this way but I'm not sure if FreeBSD restarts stuff quite that directly.)

An active init system is also usually more communicative with the outside world. Since it knows the state of services it's common for the init system to have a way to report this status to people who ask, and of course it has to have some way of being told either to start and stop services or at least that particular services have started and stopped. Passive init systems are much less talkative; System V init basically has 'change runlevel' and 'reread /etc/inittab' and that's about it as far its communication goes (and it doesn't even directly tell you what the runlevel is; that's written to a file that you read).

Once you start down the road to an active init system, in practice you wind up wanting some way to track daemon processes so you can know if a service has died. Without this an active init system is basically flying blind in that it knows what theoretically started okay but it doesn't necessarily know what's still running. This can be done by requiring cooperative processes that don't do things like detach themselves from their parents or it can be done with various system specific Unix extensions to track groups of processes even if they try to wander off on their own.

As we can see from this, active init systems are more complicated than passive ones. Generally the more useful features they offer and the more general they are the more complicated they will be. A passive init system can be done with shell scripts; an attractive active one requires some reasonably sophisticated C programming.

PS: An active init system that notices when services die can offer a feature where it will restart them for you. In practice most active init systems aren't set up to do this for most services for various reasons (that may or may not be good ones).

(This entry was partly sparked by reading parts of this mail thread that showed up in my Referer logs because it linked to some of my other entries.)

by cks at September 19, 2014 04:58 AM

September 18, 2014

Ubuntu Geek

Vesta – Simple & Clever Hosting Control Panel

Sponsored Link
We made an extremely simple and clear interface. Instead of adding more elements to work with, we prefer to remove as much as possible. The main goal was to improve the ergonomics of the control panel by reducing unnecessary movements and operations. It is all about using less, because less is more. We hope you will love it as we do.

(...)
Read the rest of Vesta – Simple & Clever Hosting Control Panel (302 words)


© ruchi for Ubuntu Geek, 2014. | Permalink | No comment | Add to del.icio.us
Post tags: , ,

Related posts

by ruchi at September 18, 2014 11:07 PM

systemBash

Centos pip python install error

While attempting to install Thumbor on a CentOS server I recently had the following error message:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
------------------------------------------------------------
/usr/bin/pip run on Thu Sep 18 21:07:45 2014
Getting page https://pypi.python.org/simple/pycrypto/
URLs to search for versions for pycrypto in /usr/lib64/python2.6/site-packages:
* https://pypi.python.org/simple/pycrypto/
Analyzing links from page https://pypi.python.org/simple/pycrypto/
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.0.1.tar.gz#md5=4d5674f3898a573691ffb335e8d749cd (from https://pypi.python.org/simple/pycrypto/), version: 2.0.1
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.1.0.tar.gz#md5=1d3eb04f06e6f09a080bc37fb019f9bf (from https://pypi.python.org/simple/pycrypto/), version: 2.1.0
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.2.tar.gz#md5=4f0ed728b14b98f09120cb2ec461ec98 (from https://pypi.python.org/simple/pycrypto/), version: 2.2
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.3.tar.gz#md5=2b811cfbfc342d83ee614097effb8101 (from https://pypi.python.org/simple/pycrypto/), version: 2.3
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.4.1.tar.gz#md5=c2a1404a848797fb0806f3e11c29ef15 (from https://pypi.python.org/simple/pycrypto/), version: 2.4.1
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.4.tar.gz#md5=274fa44c30a320d56460a93fdd95e702 (from https://pypi.python.org/simple/pycrypto/), version: 2.4
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.5.tar.gz#md5=783e45d4a1a309e03ab378b00f97b291 (from https://pypi.python.org/simple/pycrypto/), version: 2.5
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.1.tar.gz#md5=55a61a054aa66812daf5161a0d5d7eda (from https://pypi.python.org/simple/pycrypto/), version: 2.6.1
  Found link https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.tar.gz#md5=88dad0a270d1fe83a39e0467a66a22bb (from https://pypi.python.org/simple/pycrypto/), version: 2.6
Using version 2.6.1 (newest of versions: 2.6.1, 2.6, 2.5, 2.4.1, 2.4, 2.3, 2.2, 2.1.0, 2.0.1, 2.0.1)
Downloading/unpacking pycrypto from https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.1.tar.gz#md5=55a61a054aa66812daf5161a0d5d7eda

  Running setup.py egg_info for package pycrypto

    running egg_info
    writing pip-egg-info/pycrypto.egg-info/PKG-INFO
    writing top-level names to pip-egg-info/pycrypto.egg-info/top_level.txt
    writing dependency_links to pip-egg-info/pycrypto.egg-info/dependency_links.txt
    warning: manifest_maker: standard file '-c' not found
    reading manifest file 'pip-egg-info/pycrypto.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'pip-egg-info/pycrypto.egg-info/SOURCES.txt'
  Source in /tmp/pip-build-root/pycrypto has version 2.6.1, which satisfies requirement pycrypto from https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.1.tar.gz#md5=55a61a054aa66812daf5161a0d5d7eda
Installing collected packages: pycrypto

  Found existing installation: pycrypto 2.0.1

    Uninstalling pycrypto:

      Removing file or directory /usr/lib64/python2.6/site-packages/pycrypto-2.0.1-py2.6.egg-info
      Successfully uninstalled pycrypto

  Running setup.py install for pycrypto

    Running command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build-root/pycrypto/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-C4u4v3-record/install-record.txt --single-version-externally-managed
    running install
    running build
    running build_py
    running build_ext
    running build_configure
    checking for gcc... gcc

    checking whether the C compiler works... yes

    checking for C compiler default output file name... a.out

    checking for suffix of executables...

    checking whether we are cross compiling... configure: error: in `/tmp/pip-build-root/pycrypto':

    configure: error: cannot run C compiled programs.

    If you meant to cross compile, use `--host'.

    See `config.log' for more details

    Traceback (most recent call last):

      File "<string>", line 1, in <module>

      File "/tmp/pip-build-root/pycrypto/setup.py", line 456, in <module>

        core.setup(**kw)

      File "/usr/lib64/python2.6/distutils/core.py", line 152, in setup

        dist.run_commands()

      File "/usr/lib64/python2.6/distutils/dist.py", line 975, in run_commands

        self.run_command(cmd)

      File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

        cmd_obj.run()

      File "/usr/lib/python2.6/site-packages/setuptools/command/install.py", line 53, in run

        return _install.run(self)

      File "/usr/lib64/python2.6/distutils/command/install.py", line 577, in run

        self.run_command('build')

      File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command

        self.distribution.run_command(command)

      File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

        cmd_obj.run()

      File "/usr/lib64/python2.6/distutils/command/build.py", line 134, in run

        self.run_command(cmd_name)

      File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command

        self.distribution.run_command(command)

      File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

        cmd_obj.run()

      File "/tmp/pip-build-root/pycrypto/setup.py", line 251, in run

        self.run_command(cmd_name)

      File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command

        self.distribution.run_command(command)

      File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

        cmd_obj.run()

      File "/tmp/pip-build-root/pycrypto/setup.py", line 278, in run

        raise RuntimeError("autoconf error")

    RuntimeError: autoconf error

    Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build-root/pycrypto/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-C4u4v3-record/install-record.txt --single-version-externally-managed:

    running install

running build

running build_py

running build_ext

running build_configure

checking for gcc... gcc

checking whether the C compiler works... yes

checking for C compiler default output file name... a.out

checking for suffix of executables...

checking whether we are cross compiling... configure: error: in `/tmp/pip-build-root/pycrypto':

configure: error: cannot run C compiled programs.

If you meant to cross compile, use `--host'.

See `config.log' for more details

Traceback (most recent call last):

  File "<string>", line 1, in <module>

  File "/tmp/pip-build-root/pycrypto/setup.py", line 456, in <module>

    core.setup(**kw)

  File "/usr/lib64/python2.6/distutils/core.py", line 152, in setup

    dist.run_commands()

  File "/usr/lib64/python2.6/distutils/dist.py", line 975, in run_commands

    self.run_command(cmd)

  File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

    cmd_obj.run()

  File "/usr/lib/python2.6/site-packages/setuptools/command/install.py", line 53, in run

    return _install.run(self)

  File "/usr/lib64/python2.6/distutils/command/install.py", line 577, in run

    self.run_command('build')

  File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command

    self.distribution.run_command(command)

  File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

    cmd_obj.run()

  File "/usr/lib64/python2.6/distutils/command/build.py", line 134, in run

    self.run_command(cmd_name)

  File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command

    self.distribution.run_command(command)

  File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

    cmd_obj.run()

  File "/tmp/pip-build-root/pycrypto/setup.py", line 251, in run

    self.run_command(cmd_name)

  File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command

    self.distribution.run_command(command)

  File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command

    cmd_obj.run()

  File "/tmp/pip-build-root/pycrypto/setup.py", line 278, in run

    raise RuntimeError("autoconf error")

RuntimeError: autoconf error

----------------------------------------

  Rolling back uninstall of pycrypto

  Replacing /usr/lib64/python2.6/site-packages/pycrypto-2.0.1-py2.6.egg-info
Command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build-root/pycrypto/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-C4u4v3-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip-build-root/pycrypto

Exception information:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pip/basecommand.py", line 139, in main
    status = self.run(options, args)
  File "/usr/lib/python2.6/site-packages/pip/commands/install.py", line 271, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "/usr/lib/python2.6/site-packages/pip/req.py", line 1185, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pip/req.py", line 592, in install
    cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
  File "/usr/lib/python2.6/site-packages/pip/util.py", line 662, in call_subprocess
    % (command_desc, proc.returncode, cwd))
InstallationError: Command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build-root/pycrypto/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-C4u4v3-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip-build-root/pycrypto

It essentially boils down to:

1
2
3
4
checking whether we are cross compiling... configure: error: in `/tmp/pip-build-root/pycrypto':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

Weird, I have gcc and all compile programs installed.

It took me a fair time of troubleshooting, but I finally figured out it was because it was attempting to build this in /tmp, which I have set to mount at noexec for security purposes. This disallows execution of programs in this directory.

Running

1
mount -oremount,exec /tmp

Allowed it to run without issue.

by Dave at September 18, 2014 07:17 PM

Server Density

Automated Google Cloud and Google Compute Engine monitoring

Today we’re releasing the Server Density integration into Google Compute Engine as an official Google Cloud Platform Technology Partner. Server Density works across all environments and platforms and is now fully integrated into Google’s cloud infrastructure products, including Compute Engine and Persistent Disks, to offer alerting, historical metrics and devops dashboards to Google customers.

Google Cloud graphs

Server Density customers can connect their Google Cloud accounts to automatically monitor and manage instances across Google data centers alongside existing environments and other cloud providers. Many customers will run systems across multiple providers in a hybrid setup, so Server Density is uniquely placed to help with that because even though we have specialist integration into Google, it works well anywhere – cloud, hybrid and on-prem.

$500 credit for Google/Server Density customers

Server Density normally starts at $10/m to monitor Linux, Windows, FreeBSD and Mac servers but Google Cloud customers can monitor up to 5 servers for free for life (worth over $500/year). Google is also offering Server Density customers $500 in credits to trial Google Cloud Platform. To find out more and sign up, head over to our website for details.

The post Automated Google Cloud and Google Compute Engine monitoring appeared first on Server Density Blog.

by David Mytton at September 18, 2014 01:00 PM

Chris Siebenmann

Ubuntu's packaging failure with mcelog in 14.04

For vague historical reasons we've had the mcelog package in our standard package set. When we went to build our new 14.04 install setup, this blew up on us; on installation, some of our machines would report more or less the following:

Setting up mcelog (100-1fakesync1) ...
Starting Machine Check Exceptions decoder: CPU is unsupported
invoke-rc.d: initscript mcelog, action "start" failed.
dpkg: error processing package mcelog (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 mcelog
E: Sub-process /usr/bin/dpkg returned an error code (1)

Here we see a case where a collection of noble intentions have had terrible results.

The first noble intention is a desire to warn people that mcelog doesn't work on all systems. Rather than silently run uselessly or silently exit successfully, mcelog instead reports an error and exits with a failure status.

The second noble intention is the standard Debian noble intention (inherited by Ubuntu) of automatically starting most daemons on installation. You can argue that this is a bad idea for things like database servers, but for basic system monitoring tools like mcelog and SMART monitoring I think most people actually want this; certainly I'd be a bit put out if installing smartd didn't actually enable it for me.

(A small noble intention is that the init script passes mcelog's failure status up, exiting with a failure itself.)

The third noble intention is that it is standard Debian behavior for an init script that fails when it is started in the package's postinstall script to cause the postinstall script itself to exit out with errors (it's in a standard dh_installinit stanza). When the package postinstall script errors out, dpkg itself flags this as a problem (as well it should) and boom, your entire package install step is reporting an error and your auto-install scripts fall down. Or at least ours do.

The really bad thing about this is that server images can change hardware. You can transplant disks from one machine to another for various reasons; you can upgrade the hardware of a machine but preserve the system disks; you can move virtual images around; you can (as we do) have standard machine building procedures that want to install a constant set of packages without having to worry about the exact hardware you're installing on. This mcelog package behavior damages this hardware portability in that you can't safely install mcelog in anything that may change hardware. Even if the initial install succeeds or is forced, any future update to mcelog will likely cause you problems on some of your machines (since a package update will likely fail just like a package install).

(This is a packaging failure, not an mcelog failure; given that mcelog can not work on some machines it's installed on, the init script failure should not cause a fatal postinstall script failure. Of course the people who packaged mcelog may well not have known that it had this failure mode on some machines.)

I'm sort of gratified to report that Debian has a bug for this, although the progress of the bug does not fill me with great optimism and of course it's probably important enough to ever make it into Ubuntu 14.04 (although there's also an Ubuntu bug).

PS: since mcelog has never done anything particularly useful for us, we have not been particularly upset over dropping it from our list of standard packages. Running into the issue was a bit irritating though, but mcelog seems to be historically good at irritation.

PPS: the actual problem mcelog has is even more stupid than 'I don't support this CPU'; in our case it turns out to be 'I need a special kernel module loaded for this machine but I won't do it for you'. It also syslogs (but does not usefully print) a message that says:

mcelog: AMD Processor family 16: Please load edac_mce_amd module.#012: Success

See eg this Fedora bug and this Debian bug. Note that the message really means 'family 16 and above', not 'family 16 only'.

by cks at September 18, 2014 05:58 AM

September 17, 2014

Rands in Repose

The Song of the Introvert

You are a threat.

It’s a strong word. I don’t mean that you intend pain, injury, or damage. But I’m an introvert and you – as a new unknown human – are a threat to me. I don’t know what you want and you most definitely want something and until I figure that out, you’re a threat. See…

I have control issues.

I am mostly calm when I am alone in my Cave. My stuff is where I expect it to be, the furniture is how I like it and the walls are blood red – they surround me completely. There are rarely surprises in my Cave and that is how I like it, thank you very much. My Cave is where I avoid the chaos and…

You are chaos.

You are disorder and confusion. I haven’t figured out an eye contact protocol with you yet, and I don’t know what you want so I don’t understand what motivates you so you are unpredictable. You are an unknown, which means you are full of surprises and surprises aren’t the spice of life, they are new data that don’t yet fit in my system and…

I am addled with systems.

My love of calm predictability has come at a cost. I write everything down in a black notebook – no lines. There are boxes next to items that must be tracked, there are stars for ideas that must be remembered. A yellow highlighter and a .5mm Zebra Sarasa gel pen accompany me everywhere because the presence of this notebook is part of my well-defined system of never missing anything. See, paradoxically, while I would likely prefer to be hiding in my Cave, I also love signal and…

You are high signal.

I am fascinated by how you punctuate your sentences with your hands. You pause for as long as it takes to makes sure you are going to say something of value. Sometimes these pauses are maddeningly long. You are fiercely optimistic and state outlandish impossible things. You are fearless in giving feedback to strangers. You are less fearless, but you can deliver the same feedback with a momentary glance. It’s fascinating how all of you have built all of your systems to get through your day. I am fascinated because…

I am insatiably (quietly) curious.

My curiosity is a defense mechanism. I am desperately trying to get back to my Cave where the surprises are scheduled. I have learned the faster I can learn about you, the faster I will figure out what you want, and that will tell me what motivates you, and when I know what motivates you, I will better understand how to communicate with you. I am not trying to manipulate you, I am not trying to pander to you, I am trying to understand you because…

I am an introvert.

by rands at September 17, 2014 01:36 PM

Tech Teapot

Back to Basics

After a while things stop being new. Things that really used to excite you, stop exciting you. Things that you were passionate about, you stop being passionate about. That’s just how things work.

I wrote my very first computer program 26 years ago this month. It was in college, using a Perkin Elmer mini computer running Berkely BSD 4.2 on a VT220 terminal (with a really good keyboard.) The program was written in Pascal. Pascal was the educational programming language of the time. It was an amazing time. Every time I went near the terminal, I approached with a sense of wonder.

But, over time, the sense of wonder starts to wane. Once somebody starts paying you to do something, the sense of wonder starts to wane real fast. You don’t control it any more. You are likely to be producing something that somebody else wants you to produce. In a way they want you to produce it.

I have been pondering my career recently. Such as it is. You do start pondering your career when you hit the wrong end of your forties. How can I get back that sense of wonder again?

I’ve always had a hankering after learning Lisp. I read about it even before I went to college twenty six years ago, and it has always fascinated me. Pretty well any programming concept you can think of, Lisp usually got there first.

One of my recent discoveries has been a series of books: The Little Schemer, The Seasoned Schemer and The Reasoned Schemer teaching Scheme in quite a unique, accessible and fun style.

Scheme is a modern dialect of Lisp. There are lots of others including Clojure.

I think that learning a language from scratch, just for the fun of it, may just be the tonic for a mild dose of mid-career blues. Hopefully, that sense of wonder may return. I sure hope so.

I’ll let you know :)

The post Back to Basics appeared first on Openxtra Tech Teapot.

by Jack Hughes at September 17, 2014 12:57 PM

Server Density

Photography for tech startups

After having released the new homepage for our server monitoring tool, I thought it’d be a nice idea to share the creative decisions we made, with a focus on photography. It’s not a particularly new idea for a tech company to invest in design, but it is unusual for photography – especially considering we only had a small budget. With that in mind (and a homepage in the balance), we needed to get it right the first time around.

Why photography?

As well as being a great way to digest content (a million words and all that jazz), photography is the number one way to sell a product online. If you take a look at Amazon, Apple or even Google Glass, the way they convince you to hand over your credit card details is with pictures. Text reinforces the boring bits that you need to know, but images sell the products.

Now if you transfer this to the startup scene, although companies are starting sell their software in the setting of how it will be consumed – they tend to be the big ones with endless marketing budgets. Couple the cost with the fact that a software company has nothing tangible to photograph and you start to realise why few startup marketers ‘expose’ themselves to the alien world of photography and art direction.

Despite the obvious difficulties we faced, we wanted to use photos because they’re a great way to link our product to real life, and the many ways in which our customers use our product. Instead of using screenshots to reinforce that message, setting the screenshots within a context that the visitor might be familiar with is a much better way of communicating the benefits of using Server Density. A whole lot more information gets transferred, not to mention they’d look nice… Hopefully!

As it was my first ever time being ‘creative director’ of a photo shoot, it was a nervous time. There was a lot of money riding on the photos, and let’s be honest the success / failure of our redesign. Here’s a few of the major stumbling blocks I came across, alongside how I alleviated them as problems.

The cost

The major downside of focussing designs on beautiful photos is that you first have to take those photos. Unless one of your team is also a still life photographer with 5+ years of experience, you’ll be paying a pretty penny for it. Or you’ll be getting some pretty amateur looking photographs. So first you have to find a great photographer, for a reasonable price.

Enter awesome photographer

This part was rather difficult in my head, but quite simple in practice. As we’re based in London, we needed to find someone local to us. I therefore trawled online for a few hours to find a selection of good photographers. After having a better look through portfolios we ended up with our favourite who I contacted. They key things that came out of our multiple phone conversations were:

  • We needed a relaxed photographer who could roll with the punches.
  • We needed them to bring both indoor and outdoor equipment.

Plan meticulously.

So the idea here is to plan every detail before the photographer comes. All they should then have to do is spend an hour sorting the lighting out and then a few clicks on the camera. In our case and a lot of others in the startup world, our photographs need text to be overlaid on to the images. With that in mind, I gave our photographer a visual reference of how the headings / navigation / call to action would be formatted so he understood the distribution of whitespace that was needed.

Startup photo - whitespace

That was an important step done, because these images need to fit around the content, not the other way round.

Storyboard the photographs

With that complete, it became about storyboarding. This provided the photographer and our team an action plan for the afternoon and a guide for us to use as a reference throughout the day. We stuck to it meticulously:

startup storyboards

Scout the locations

We made sure that we visited each of the locations prior to the day, and that they were suitable for the photograph that we wanted. The two processes weren’t mutually exclusive, and the ideas from the storyboards derived from what we felt was possible. We needed to plan and capture 3 photos that fit our timeframe, which meant they all needed to be located close to our office in Chiswick, London.

Here we are in the park the day before the shoot, picking the best location:

location-scouting-startup

The photographs

We wanted to capture three distinct use cases. Monitoring dashboards displayed on a screen in an office; diagnosing / fixing infrastructure problems at a desk; and getting a monitoring alert on the go.

Looking at these uses of Server Density as: Passive, active and reactive helped with creating a suitable environment for them to be shot in.

Server Monitoring Dashboard

This photo was about showing off how you can use Server Density in your office environment, for this one we wanted to create a hipster feel with an interior brick wall that gave the impression of creative as opposed to clinical. We found a white brick wall that (until a few weeks previous when the roof had been knocked off) had been an interior wall. Here is how it looked to any passers by:

Behind the scenes dashboard

Server Monitoring Graphs

The graphing photograph was about showing Server Density off in a more clinical environment of getting the problem fixed. We did this in our office, where we had all the appropriate equipment. We found a nice wall, moved the office around for a few hours and generally caused chaos in an otherwise rather peaceful working environment.

Oh and I almost forgot about the pièce de résistance – The Deathstar! The end result of this was also a hat-tip to our friends at New Relic, who used a similar homepage image a while ago. Of course, we added our own local touches like coffee from our local shop and our custom dot notebooks.

Server monitoring graphs

Server Monitoring Alerts

Finally it was important to show how you might receive an alert in your day to day life as a devops practitioner or sysadmin. There’s no better change of scenery than a park and we’re lucky enough to have an office a minutes walking distance from greenery.

We also wanted to see the underground in the distance as a reference to London, and to emphasise that you’re away from the hustle and bustle, but still able to stay in control, and on top of your infrastructure.

Monitoring Alerts behind the scenes

A little bit of luck

I consider ourselves pretty fortunate (or maybe it was good judgement) to have ended up with the photographer we did. He was incredibly knowledgeable, shared our vision and was more than happy to problem solve along the way. I think this was the real make or break factor of this project.

We were also lucky in the sense that we didn’t get rain on the day, which was important considering two of our photographs were taken outside. We did however capitalise on this opportunity by timing our shoot correctly. For example, we spent the early afternoon getting the park shot done in the clear daylight; we then completed the office shot which was the easiest; and then had enough time to finish the dashboard shot as the sun was going down. You might be able to see from the previous image, the shadows on the walls would have been a problem any earlier in the day.

The editing

Now I don’t want to destroy the smoke and mirrors that we’ve so eloquently created, but our photos did have to be edited. Particularly the dashboard shot which:

  1. Wasn’t shot inside
  2. Wasn’t mounted on the wall
  3. Didn’t have a lightbulb hanging

Also naturally all of the screenshots were added in at the end of the process so that we can update them going forward as our interface gets tweaked.

The big unveil

In the interest of keeping the page load of this blog post less obscene than it already is, then you can take a look at the final images on our updated homepage.

The legal side

Interestingly we hit a little bit of a stumbling block at the final stage when signing off the pictures. Contrary to what I thought, it isn’t standard for you to own the full copyright of the images once they have been completed. The photographer usually licenses them out to you for an arranged timeframe and predefined use.

Naively this is something that should have been sorted out with our photographer prior to commissioning the photographs, terms of use and length of license should be talked through more formally than we did. However, thankfully our photographer was one of those good old fashioned English gents to which a handshake was as good as a signature, so all was fine in the end and we completed the contract after having completed the photos.

After our 10 year license runs out, we’ll have to talk to him about renewing on separate terms / perhaps redoing the pictures. Goodness knows how big the iPhone will be in 2024!

The post Photography for tech startups appeared first on Server Density Blog.

by Rufus at September 17, 2014 11:27 AM

Daniel E. Markle

2014 Lake Hope Rebel Rally

2014 Lake Hope Rebel Rally
After a hiatus last year due to work, I made it once more to the Lake Hope Rebel Rally this year. Having waited to reserve until after the cabins had filled, I decided to double up the adventure on this trip as my first motorcycle camping trip.


Continue reading "2014 Lake Hope Rebel Rally"

by dmarkle@ashtech.net (Daniel E. Markle) at September 17, 2014 10:51 AM

Raymii.org

Boot to Vim, Vim as Pid 1

This is a response on a great article from Pascal Bourguignon, namely how to run Emacs as PID 1. As we all know, nobody uses emacs. No, all joking aside, I found it to be a good article and wanted to see how I could do that with Vim. Not in User Mode Linux, but by creating an actual ISO. Boot to Vim, as you might want to call it. This is actually fairly simple. Compile vim statically, set it as init= at boot and you're done. We are going to use small 9MB distro named Tiny Core, Core edition and customize that to boot right into our static build of Vim.

September 17, 2014 07:29 AM

Yellow Bricks

Queue Depth info in the VSAN HCL!


I just noticed there has been an update to the VSAN HCL. When I now do a search for a disk controller (vmwa.re/vsanhcl) it immediately shows the queue depth of the controller. This will make life a lot easier, especially for those who prefer to build their own Virtual SAN node instead of using a Ready Node configuration. Although it is just a minor detail it is useful to know, and will definitely make life a lot easier when configuring your component built Virtual SAN nodes.

"Queue Depth info in the VSAN HCL!" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.


Pre-order my upcoming book Essential Virtual SAN via Pearson today!

by Duncan Epping at September 17, 2014 07:10 AM

Chris Siebenmann

In praise of Solaris's pfiles command

I'm sure that at one point I was introduced to pfiles through a description that called it the Solaris version of lsof for a single process. This is true as far as it goes and I'm certain that I used pfiles as nothing more than this for a long time, but it understates what pfiles can do for you. This is because pfiles will give you a fair amount more information than lsof will, and much of that information is useful stuff to know.

Like lsof, pfiles will generally report what a file descriptor maps to (file, device, network connection, and Solaris IPC 'doors', often with information about what process is on the other end of the door). Unlike on some systems, the pfiles information is good enough to let you track down who is on the other end of Unix domain sockets and pipes. Sockets endpoints are usually reported directly; pipe information generally takes cross-correlating with other processes to see who else has an S_IFIFO with the specific ino open.

(You would think that getting information on the destination of Unix domain sockets would be basic information, but on some systems it can take terrible hacks.)

Pfiles will also report some socket state information for sockets, like the socket flags and the send and receive buffers. Personally I don't find this deeply useful and I wish that pfiles also showed things like the TCP window and ACK state. Fortunately you can get this protocol information with 'netstat -f inet -P tcp' or 'netstat -v -f inet -P tcp' (if you want lots of details).

Going beyond this lsof-like information, pfiles will also report various fcntl() and open() flags for the file descriptor. This will give you basic information like the FD's read/write status, but it goes beyond this; for example, you can immediately see whether or not a process has its sockets open in non-blocking mode (which can be important). This is often stuff that is not reported by other tools and having it handy can save you from needing deep dives with DTrace, a debugger, or the program source code.

(I'm sensitive to several of these issues because my recent Amanda troubleshooting left me needing to chart out the flow of pipes and to know whether some sockets were nonblocking or not. I could also have done with information on TCP window sizes at the time, but I didn't find the netstat stuff until just now. That's how it goes sometimes.)

by cks at September 17, 2014 06:04 AM

Raymii.org

Statically (cross) compiled vim for x86, x86-64 and mipsel

Sometimes I need to manage a few systems with either low resources or a very restricted set of packages. On those systems no compilers or development libraries are available, however it is allowed to bring binaries.A few of those systems are 32 bit x68 systems, some are MIPS systems, even worse. They serve a secure purpose, I cannot go in to much detail about them, except for they require a high level of security, they process certificates. I really like vim as my editor, the only editor available by default on those systems is ed. I have an ed cheatsheet for this purpose. The solution for this problem is to create a statically (cross) compiled version of vim. This article shows you how to create a statically compiled vim that runs everywhere.

September 17, 2014 12:00 AM

September 16, 2014

Ubuntu Geek

Rexloader – A cross-platform download manager

This project is an advanced cross-platform (Qt/C++) download manager over http(s) with configurable multithreaded downloading, proxy support, logging and hash calculating (md5, sha1). We also plan to implement support for ftp and p2p (torrent, dc++ etc).
(...)
Read the rest of Rexloader – A cross-platform download manager (23 words)


© ruchi for Ubuntu Geek, 2014. | Permalink | No comment | Add to del.icio.us
Post tags: , , ,

Related posts

by ruchi at September 16, 2014 11:31 PM

Standalone Sysadmin

Nagios Config Howto Followup

One of the most widely read stories I've ever posted on this blog is my Nagios Configuration HowTo, where I explained how I set up my Nagios config at a former employer. I still think that it's a good layout to use if you're manually building Nagios configs. In my current position, we have a small manual setup and a humongous automated monitoring setup. We're moving toward a completely automated monitoring config using Puppet and Nagios, but until everything is puppetized, some of it needs to be hand-crafted, bespoke monitoring.

For people who don't have a single 'source of truth' in their infrastructure that they can draw monitoring config from, hand-crafting is still the way to go, and if you're going to do it, you might as well not drive yourself insane. For that, you need to take advantage of the layers of abstraction in Nagios and the built-in object inheritance that it offers.

Every once in a while, new content gets posted that refers back to my Config HowTo, and I get a bump in visits, which is cool. Occasionally, I'll get someone who is interested and asks questions, which is what happened in this thread on Reddit. /u/sfrazer pointed to my config as something that he references when making Nagios configs (Thanks!), and the original submitter replied:

I've read that write up a couple of times. My configuration of Nagios doesn't have an objects, this is what it looks like

(click to embiggen)

And to understand what you are saying, just by putting them in the file structure you have in your HowTo that will create an inheritance?

I wanted to help him understand how Nagios inheritance works, so I wrote a relatively long response, and I thought that it might also help other people who still need to do this kind of thing:


 

No, the directories are just to help remember what is what, and so you don't have a single directory with hundreds of files.

What creates the inheritance is this:

You start out with a host template:

msimmons@nagios:/usr/local/nagios/etc/objects$ cat generic-host.cfg
define host {
    name generic-host
    notifications_enabled   1
    event_handler_enabled   1
    flap_detection_enabled  1
    failure_prediction_enabled  1
    process_perf_data   1
    retain_status_information   1
    retain_nonstatus_information    1
    max_check_attempts 3
    notification_period 24x7
    contact_groups systems
    check_command check-host-alive
    register 0
}
# EOF

So, what you can see there is that I have a host named "generic-host" with a bunch of settings, and "register 0". The reason I have this is that I don't want to have to set all of those settings for every other host I make. That's WAY too much redundancy. Those settings will almost never change (and if we do have a specific host that needs to have the setting changed, we can do it on that host).

Once we have generic host, lets make a 'generic-linux' host that we can have the linux machines use:

msimmons@monitoring:/usr/local/nagios/etc/objects/linux$ cat generic-linux.cfg 
define host { 
    name     linux-server
    use generic-host
    check_period    24x7
    check_interval  5
    retry_interval  1
    max_check_attempts  5
    check_command   check-host-alive
    notification_interval 1440
    contact_groups  systems
    hostgroups linux-servers
    register 0
}

define hostgroup {
    hostgroup_name linux-servers
    alias Linux Servers
}
# EOF

Alright, so you see we have two things there. A host, named 'linux-server', and you can see that it inherits from 'generic-host'. I then set some of the settings specific to the monitoring host that I'm using (for instance, you probably don't want notification_interval 1440, because that's WAY too long for most people - a whole day would go between Nagios notifications!). The point is that I set a bunch of default host settings in 'generic-host', then did more specific things in 'linux-server' which inherited the settings from 'generic-host'. And we made it 'register 0', which means it's not a "real" host, it's a template. Also, and this is important, you'll see that we set 'hostgroups linux-servers'. This means that any host we make that inherits from 'linux-server' will automatically be added to the 'linux-servers' hostgroup.

Right below that, we create the linux-servers hostgroup. We aren't listing any machines. We're creating an empty hostgroup, because remember, everything that inherits from linux-servers will automatically become a member of this group.

Alright, you'll notice that we don't have any "real" hosts yet. We're not going to yet, either. Lets do some services first.

msimmons@monitoring:/usr/local/nagios/etc/objects$ cat check-ssh.cfg
define command{
   command_name   check_ssh
   command_line   $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
   }
# EOF

This is a short file which creates a command called "check_ssh". This isn't specific to Linux or anything else. It could be used by anything that needed to verify that SSH was running. Now, lets build a service that uses it:

msimmons@monitoring:/usr/local/nagios/etc/objects/services$ cat generic-service.cfg 
define service{
        name                            generic-service  
        active_checks_enabled           1     
        passive_checks_enabled          1      
        parallelize_check               1       
        obsess_over_service             1        
        check_freshness                 0         
        notifications_enabled           1          
        event_handler_enabled           1           
        flap_detection_enabled          1            
        failure_prediction_enabled      1             
        process_perf_data               1
        retain_status_information       1 
        retain_nonstatus_information    1  
        is_volatile                     0   
        check_period                    24x7 
        max_check_attempts              3     
        normal_check_interval           10     
        retry_check_interval            2       
        contact_groups                  systems
      notification_options    w,u,c,r        
        notification_interval           1440
        notification_period             24x7       
         register                        0            
}
# EOF

This is just a generic service template with sane settings for my environment. Again, you'll want to use something good for yours. Now, something that will inherit from generic-service:

msimmons@monitoring:/usr/local/nagios/etc/objects/linux$ cat linux-ssh.cfg
define service { 
    use generic-service
    service_description Linux SSH Enabled
    hostgroup_name linux-servers
    check_command check_ssh 
}
# EOF

Now we have a service "Linux SSH Enabled". This uses check_ssh, and (importantly), 'hostgroup_name linux-servers' means "Every machine that is a member of the hostgroup 'linux-servers' automatically gets this service check".

Lets do the same thing with ping:

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
 }

 define service { 
    use generic-service
    service_description Linux Ping
    hostgroup_name linux-servers
    check_command check_ping!3000.0,80%!5000.0,100
}

Sweet. (If you're wondering about the exclamation marks on the check_ping line in the Linux Ping service, we're sending those arguments to the command, which you can see set the warning and critical thresholds).

Now, lets add our first host:

msimmons@monitoring:/usr/local/nagios/etc/objects/linux$ cat mylinuxserver.mycompany.com.cfg 
define host{
       use linux-server
       host_name myLinuxServer.mycompany.com
       address my.ip.address.here
}

That's it! I set the host name, I set the IP address, and I say "use linux-server" so that it automatically gets all of the "linux-server" settings, including belonging to the linux host group, which makes sure that it automatically gets assigned all of the Linux service checks. Ta-Da!

Hopefully this can help people see the value in arranging configs like this. If you have any questions, please let me know via comments. I'll be happy to explain! Thanks!

 

 

by Matt Simmons at September 16, 2014 10:22 PM

TaoSecurity

We Need More Than Penetration Testing

Last week I read an article titled  People too trusting when it comes to their cybersecurity, experts say by Roy Wenzl of The Wichita Eagle. The following caught my eye and prompted this post:

[Connor] Brewer is a 19-year-old sophomore at Butler Community College, a self-described loner and tech geek...

Today he’s what technologists call a white-hat hacker, hacking legally for companies that pay to find their own security holes. 

When Bill Young, Butler’s chief information security officer, went looking for a white-hat hacker, he hired Brewer, though Brewer has yet to complete his associate’s degree at Butler...

Butler’s security system comes under attack several times a week, Young said...

Brewer and others like him are hired by companies to deliberately attack a company’s security network. These companies pay bounties if the white hackers find security holes. “Pen testing,” they call it, for “penetration testing.”

Young has repeatedly assigned Brewer to hack into Butler’s computer system. “He finds security problems,” Young said. “And I patch them.”

On the face of it, this sounds like a win-win story. A young white hat hacker does something he enjoys, and his community college benefits from his expertise to defend itself.

My concern with this article is the final sentence:

Young has repeatedly assigned Brewer to hack into Butler’s computer system. “He finds security problems,” Young said. “And I patch them.”

This article does not mention whether Butler's CISO spends any time looking for intruders who have already compromised his organization. Finding security problems and patching them is only one step in the security process.

I still believe that the two best words ever uttered by Bruce Schneier were "monitor first," and I worry that organizations like those in this article are patching holes while intruders maneuver around them within the compromised network.

by Richard Bejtlich (noreply@blogger.com) at September 16, 2014 01:47 PM

A Brief History of Network Security Monitoring

Last week I was pleased to deliver the keynote at the first Security Onion Conference in Augusta, GA, organized and hosted by Doug Burks. This was probably my favorite security event of the year, attended by many fans of Security Onion and the network security monitoring (NSM) community.

Doug asked me to present the history of NSM. To convey some of the milestones in the development of this operational methodology, I developed these slides (pdf). They are all images, screen captures, and the like, but I promised to post them. For example, the image at left is the first slide from a Webinar that Bamm Visscher and I delivered on 4 December 2002, where we presented the formal definition of NSM the first time. We defined network security monitoring as

the collection, analysis, and escalation of indications and warnings to detect and respond to intrusions.

You may recognize similarities with the intelligence cycle and John Boyd's Observe - Orient - Decide Act (OODA) loop. That is not an accident.

During the presentation I noted a few key years and events:

  • 1986: The Cliff Stoll intrusions scare the government, military, and universities supporting gov and mil research.
  • 1988: Lawrence Livermore National Lab funds three security projects at UC Davis by supporting the Prof Karl Levitt's computer science lab. They include AV software, a "security profile inspector," and the "network security monitor."
  • 1988-1990: Todd Heberlein and colleagues code and write about the NSM platform.
  • 1991: While instrumenting a DISA location suffering from excessive bandwidth usage, NSM discovers 80% of the clogged link is caused by intruder activity.
  • 1992: Former FBI Director, then assistant AG, Robert Mueller writes a letter to NIST warning that NSM might not be legal.
  • 1 October 1992: AFCERT founded.
  • 10 September 1993: AFIWC founded.
  • End of 1995: 26 Air Force sites instrumented by NSM.
  • End of 1996: 55 Air Force sites instrumented by NSM.
  • End of 1997: Over 100 Air Force sites instrumented by NSM.
  • 1999: Melissa worm prompts AFCERT to develop dedicated anti-malware team. This signaled a shift from detection of human adversaries interacting with victims to detection of mindless code interacting with victims.
  • 2001: Bamm Visscher deploys SPREG, the predecessor to Sguil, at our MSSP at Ball Aerospace.
  • 13 July 2001: Using SPREG, one of our analysts detects Code Red, 6 days prior to the public outbreak. I send a note to a mailing list on 15 July.
  • February 2003: Bamm Visscher recodes and releases Sguil as an open source NSM console.

As I noted in my presentation,. the purpose of the talk was to share the fact that NSM has a long history, some of which happened when many practitioners (including myself) were still in school.

This is not a complete history, either. For more information, please see my 2007 post Network Security Monitoring History and the foreword, written by Todd Heberlein, of my newest book The Practice of Network Security Monitoring.

Finally, I wanted to emphasize that NSM is not just full packet capture or logging full content data. NSM is a process, although my latest book defines seven types of NSM data. One of those data types is full content. You can read about all of them in the first first chapter of my book at the publisher Web site.

by Richard Bejtlich (noreply@blogger.com) at September 16, 2014 01:11 PM

Yellow Bricks

EVO:RAIL engineering interview with Dave Shanley (Lead Dev)


A couple of weeks ago we launched EVO:RAIL, a new VMware solution. I have been part of this since the very beginning, the prototype project started with just Dave and myself as part of the prototype team with Mornay van der Walt as the executive sponsor (interview with Mornay will follow shortly as this project involves many different disciplines). After Dave developed the initial UI mock-ups and we worked on the conceptual architecture, Dave started developing what then became known internally as MARVIN. If my memory serves correct it was our director at Integration Engineering (Adam Z.) who came up with the name and acronym (Modular Automated Rackable Virtual Infrastructure Node). All was done under the umbrella of Integration Engineering, in stealth mode with a very small team. I guess something not a lot of people know is that for instance William Lam was very instrumental when it came to figuring out in which order to configure what (a lot of dependencies as you can imagine) and which API calls to use for what. After a couple of months things really started to shape up, the prototype was demoed to C level and before we realized a new team was formed and gears shifted.

Personally whenever I talk to start-ups I like to know where they came from, what they’ve done in the past, how things went about… as that gives me a better understanding of why the product is what it is. Same applies to EVO:RAIL, no better start then with the lead developer and founding team member Dave Shanley

Good morning Dave, as not all of my readers will know who you are and what you did before joining the EVO:RAIL team can you please introduce yourself.
I’m the lead engineer, designer and software architect of the EVO:RAIL platform. I joined VMware about two and a half years ago. I started out in Integration Engineering, I got to see and experience a lot of the frustration that is often seen when trying to install, configure and integrate our technology. I’ve pretty much worked in web application engineering my entire career that has given me a really broad experience across consumer and enterprise technology. Before VMware I was the CTO of a really cool VC funded start-up in the UK as well as being the lead engineer over at McCann Erickson’s EMEA HQ.

One of the key messages in all EVO:RAIL sessions and discussions at VMworld was simplicity, what does simplicity mean to you?
Well, in my personal opinion, simplicity means an experience that feels natural and allows you to engage in it without having to think about it. It’s an intrinsic experience. ‘Don’t make me think’ is my professional mantra – I have always disliked systems that requires training up front in order to be able to understand it. Virtualization is extremely complex and our products are wildly powerful – but just because something is powerful and complex, it doesn’t mean it should leave you scratching your head.

Would you say not being over exposed to VMware products has helped shaping EVO:RAIL?
Well I would say that I spent a long time watching and observing and listening. I watched how we deployed our technology and consumed it (I built the online beta lab systems as well as designed the VMworld lab systems a couple of years ago). I saw and experienced a lot of the friction points that leave people dropping down to CLI commands and watching the wire. It was then I started thinking about ‘How would change this if I had the chance to re-architect these processes’. After that I literally threw away the rulebook, ignored everything we had done in the past and started from scratch to prototype something that would allow me to configure and manage this technology without any pain and without having to think about it.

If I look at the EVO:RAIL engine I cannot compare it to any VMware product out there, it seems to deviate from some of the VMware standards  like for instance the use of flash. How come?
In order to invoke change, you have to upset the status quo. I deliberately decided to ignore every single VMware standard in order to build something that was truly a revolutionary experience from what we have created before. You can’t do that when you start inside a box. I was lucky enough to be given free reign to break eggs and rip up the rule book. I couldn’t have done it without Mornay Van Der Walt, VP of Emerging solutions and my boss. He protected me from every internal process that tried to stop me and my team in our tracks.

Okay, lets get a bit more in the weeds. The User Experience is truly unique for an enterprise solution in my opinion, can you go over some of the unique features in the configuration UI?
One of the very first choices I ever made was to implement automatic saving and validation of every thing a user does during configuration. This means there are no save buttons anywhere, they just don’t exist – and the first time you start typing you will instantly know your work is being saved and validated. You don’t have to think about it. Another nice feature is that the configuration UI is not a wizard, it’s more like a preferences system – so you can move in and out of each different category without feeling like you’re lost or you can’t remember what you selected. You can configure networks by creating IP pools (this was actually your idea Duncan) so you don’t have to configure individual IP addresses. We also make DNS config as simple as it can be, no zone files or reverse lookups. NTP, Syslog, Timezone can all be set from a single place, it even installs, configures and licenses vCenter Log Insight as well.

The Management interface also allows you to deploy virtual machines. Is the EVO:RAIL UI intended to be a Web Client replacement. Is the Web Client even supported?
No, the EVO:RAIL management UI is not intended as a replacement. The Web Client is a very rich and powerful tool that unlocks full access to the magnitude of power that vSphere provides. The Web Client has its challenges but it’s still our primary management platform. EVO:RAIL’s management platform was designed to provide a much more lightweight view to manage your VM’s and get a simple overview of the health of your appliance/cluster as well as enjoy an extremely simple and hands off licensing and patch/upgrade experience. It’s designed for people who don’t necessarily want or need to use the Web Client for simple VM management and creation. Some administrators will have no need for one, or the other (or both). The option is there however if you want either the heavyweight or lightweight experience. You can jump straight over to the Web Client from the EVO:RAIL UI when ever you want or need more power and control.

I know there was an early access program for EVO:RAIL, how did customers respond to the simplified interface?
It was a really interesting mixed bag but with a few universal consistencies. Every customer loved the simplicity, cleanliness, responsiveness and the experience of the UI. Some wanted some more control than the first iteration of our product facilitates, some customers loved one part of the UI but had little need for another. The general consensus was extremely positive and felt like we’d got it right and we’re heading in the right direction. As far as I am aware, there has been little to no negative feedback regarding the UI.

The UI also seems to make a lot of choices for you, for instance there are VM sizing templates but also VM security templates. How did you go about sizing?
When I want to build a VM, I normally think in three sizes Small, Medium and Large (well four if I need a micro/appliance type VM, but those are not common workflows). The sizes however are relative to the guest OS that I am creating. So a small Linux VM needs very different resources for a small Windows Server 2012 VM. So the templates are dependent on that guest OS – this means you get a specification that we feel is best practice for that particular operating system. The security policies are actually advanced virtual machine settings taken from risk profiles. The risk profiles are a part of the vSphere Security Hardening guide. William Lam came up with the security profiles feature based on his years of experience and he also wrote the VM size specifications per guest OS which were then tweaked by you (Duncan Epping). So I’m more than confident with our templates and profile specifications. They have been defined by the best.

I personally particularly liked the simplicity of the monitoring interface. Can we expect more integration with the hardware platform soon?
We’re definitely looking into that, the goal is to build a platform that allows our partners to expose some of the magic of their own hardware and give customers a more tailored experience. We’re working on it, but that’s about all I can say at the moment.

If you can talk about it, what is the focus for EVO:RAIL from an “experience” point of view for the upcoming releases?
Well, I’m going to keep that under wraps for now, we’re getting a metric ton of great feedback that needs a lot of careful discussion. As I mentioned before, my goal is to not pack in endless features, it’s about simplifying the experience of the entire technology stack and bringing enjoyment and confidence back to enterprise software. I can tell you however that this is just the start – We built our first iteration of the product with a tiny team of just 6 engineers (including myself). We did it all ourselves (including all the design) in just 8 months. Imagine what we’re going to be able to create as we grow. We proved we can do it and that we can do it to a world class level – now it’s time to bring out the big guns and really start getting creative.

"EVO:RAIL engineering interview with Dave Shanley (Lead Dev)" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.


Pre-order my upcoming book Essential Virtual SAN via Pearson today!

by Duncan Epping at September 16, 2014 11:56 AM

Chris Siebenmann

My collection of spam and the spread of SMTP TLS

One of the things that my sinkhole SMTP server does that's new on my workstation is that it supports TLS, unlike my old real mail server there (which dates from a very, very long time ago). This has given me the chance to see how much of my incoming spam is delivered with TLS, which in turn has sparked some thoughts about the spread of SMTP TLS.

The starting point is that a surprising amount of my incoming spam is actually delivered with TLS; right now about 30% of the successful deliveries have used TLS. This is somewhat more striking than it sounds for two reasons; first, the Go TLS code I'm relying on for TLS is incomplete (and thus not all TLS-capable sending MTAs can actually do TLS with it), and second a certain amount of the TLS connection attempts fail because the sending MTA is offering an invalid client certificate.

(I also see a fair number of rejected delivery attempts in my SMTP command log that did negotiate TLS, but the stats there are somewhat tangled and I'm not going to try to summarize them.)

While there are some persistent spammers, most of the incoming email is your typical advance fee fraud and phish spam that's send through various sorts of compromised places. Much of the TLS email I get is this boring sort of spam, somewhat to my surprise. My prejudice is that a fair amount of this spam comes from old and neglected machines, which are exactly the machines that I would expect are least likely to do TLS.

(Some amount of such spam comes from compromised accounts at places like universities, which can and do happen to even modern and well run MTAs. I'm not surprised when they use TLS.)

What this says to me is that support for initiating TLS is fairly widespread in MTAs, even relatively old MTAs, and fairly well used. This is good news (it's now clear that pervasive encryption of traffic on the Internet is a good thing, even casual opportunistic encryption). I suspect that it's happened because common MTAs have enabled client TLS by default and the reason they've been able to do that is that it basically takes no configuration and almost always works.

(It's clear that at least some client MTAs take note when STARTTLS fails and don't try it again even if the server MTA offers it to them, because I see exactly this pattern in my SMTP logs from some clients.)

PS: you might wonder if persistent spammers use TLS when delivering their spam. I haven't done a systematic measurement for various reasons but on anecdotal spot checks it appears that my collection of them basically doesn't use TLS. This is probably unsurprising since TLS does take some extra work and CPU. I suspect that spammers may start switching if TLS becomes something that spam filtering systems use as a trust signal, just as some of them have started advertising DKIM signatures.

by cks at September 16, 2014 03:26 AM

September 15, 2014

Everything Sysadmin

Brewster Rockit explains network latency.

Sunday's "Brewster Rockit" comic strip explained bandwidth vs. latency better than I've ever seen is some text books:

When I interview anyone for a technical position I always ask them to explain the difference between bandwidth and latency. It is an important concept, especially in today's networked world. Years ago most candidates didn't know the difference. Lately most candidates I interview know the difference, but have a difficult time putting it into words. Fewer can explain it in a mathematical or scientific way.

Latency is how long information takes to get from one place to another. Bandwidth is how much data per second is sent. Suppose you are querying a database that is very far away. The time it takes the query to go to the server and for the reply to come back could be very long (say, 1 second). If you do 1,000 database queries and each time wait for the query to complete before moving on to the next one, the task will take 1,000 seconds. If you increase the bandwidth between the two points the speed improvement will be nil or next to nil because the time it takes for the information to travel outweighs the transmission time. However if you send all your queries one after the next, without waiting for the replies, then wait and catch all the replies as they arrive, the entire list of 1,000 queries might happen in a matter of seconds. However, if you can't do the second query until you have information from the first query (say, the first query looks up a person's ID number and the second query uses the ID number as part of the query) you have to find some other optimization or you will simply have to wait 1,000 seconds for the entire process to run.

One situation I give the candidate is that they are dealing with a developer or manager that can't understand why doubling the bandwidth between the NYC office and Australia won't improve performance of a particular technical issue. Whether the candidate is a software developer, operations engineer/sysadmin, or desktop support engineer, there is a good chance they will find themselves in this situation. I roleplay as the person that needs the explanation and ask dumb questions. The candidate should be able to give a reasonable explanation in a couple minutes.

Maybe next time I should just ask them if they read Brewster Rockit.

September 15, 2014 02:28 PM

Safari Books Online update

Previously Safari Books Online (the O'Reilly thing... not the Apple thing) had a rough draft of The Practice of Cloud System Administration. Now it has the final version:

http://my.safaribooksonline.com/9780133478549

Enjoy!

September 15, 2014 02:28 PM


Administered by Joe. Content copyright by their respective authors.