IT'S ALIVE! IT'S ALIVE! Google's secretive Omega tech just like LIVING thing

One of Google's most advanced data center systems behaves more like a living thing than a tightly controlled provisioning system. This has huge implications for how large clusters of IT resources are going to be managed in the future.
...
"Strict enforcement of [cluster-wide] behaviors can be achieved with centralized control, but it is also possible to rely on emergent behaviors to approximate the desired behavior," Google wrote in an academic paper [PDF] that evaluated the performance of Omega against other systems.

By handing off job scheduling and management to Omega and Borg, Google has figured out a way to get the best performance out of its data centers, but this comes with the cost of increased randomness at scale.

"What if the number of workers could be chosen automatically if additional resources were available, so that jobs could complete sooner?" Google wrote in the paper. "Our specialized [Omega] MapReduce scheduler does just this by opportunistically using idle cluster resources to speed up MapReduce jobs. It observes the overall resource utilization in the cluster, predicts the benefits of scaling up current and pending MapReduce jobs, and apportions some fraction of the unused resources across those jobs according to some policy."

This sort of fuzzy chaos represents the new normal for massive infrastructure systems. Just as with other scale-out technologies – such as Hadoop, NoSQL databases, and large machine-learning applications – Google is leading the way in coming up against these problems and having to deal with them.
...
Though Omega is obscured from end users of Google's myriad services, the company does have plans to use some of its capabilities to deliver new types of cloud services, Magnusson confirmed. The company could use the system as the foundation of spot markets for virtual machines in its Compute Engine cloud, he said.

"Spot markets for VMs is a flavor of trying to adopt that," he said. "To adopt that moving forward [we might] use SLA bin packing. If you have some compute jobs that you don't really care exactly what is done – don't care about losing one percent of the results – that's a fundamentally different compute job. This translates into very different operational requirements and stacks."
...
Already, researchers at the University of California at Berkeley have taken tips from Google to create their own variant called Apache Mesos, which is an open-source Google Borg clone running at large web properties such as Twitter and Airbnb.

However, Mesos is also exhibiting strange behaviors.

"Depending on a combination of things like weights and priorities there's a potential reallocation of resources across and around these jobs that has a compounding affect that can exaggerate these non-determinisms," said Benjamin Hindman, VP of Apache Mesos.

http://www.theregister.co.uk/2013/11/04/google_living_omega_cloud/