In testing the new Sun Fire T2000 and Solaris 10 zones, a user attempted to see if they could break a zone to see if it would break the whole machine. Essentially, the user followed the "Solaris Containers Consolidating Servers and Applications" guide and ...

  • Created pset0 with min and max of 1 cpu
  • Created pool0 and added the pset0 to it
  • Created zone0 and assigned pool0 to it
  • Installed and booted zone0
  • Run the following C code in zone0
    #include <stdio.h>
    main() {
        while (1) {
            fork();
        }
    }
    </stdio.h>

Now this is a very harsh test which will break (read grind it to a halt) pretty much any machine you throw this at. In this case it broke the zone almost instantly, but the user still had control over the global zone for a couple of minutes, in which time he managed to capture the load average...

10:03pm up 9:56, 1 user, load average: 29055.02, 21988.81, 10829.24

How's that for impressive. The machine was crawling, but technically still responsive with a load of nearly 30000 in the last minute. Eventually the global zone gave up the ghost too.

If you're interested, the only reason the zone brought down the whole box, when it shouldn't have, was because the user didn't enable Fair Share Scheduling for the resource pool he created (skipped one of the sections in the document). It would probably have also been a good idea to restrict the number of LWPs for the zone too.

If you're not a *nix user, especially a Solaris 10 user, this post isn't going to mean much to you, sorry.