2011/12/17

Smart Internet+Smart-Grid: making money and reducing carbon footprint

Two technology/commercial trends are coming our way in Australia:
  • the Internet everywhere (3G wireless or NBN), means "smart controllers" will be cheap, simple and everywhere. They will be able to trivially hook into 3-7 day local forecasts, especially useful for air-conditioning units.
  • "Smart Power" metering will start to charge power at different prices during the day, rather than the disconnected traditional pricing of "a single price whenever you use it".
    You can see real-time wholesale electricity prices on-line and a Pretty graph in 30-min periods.
    Yesterday (12/12/11), the 30-min price varied from $52/MWhr @ 4PM to $16/MWhr @ 2AM. In 5-min periods, the price ranged from $95 to $16.
There's a bit of background you may or may not know about Power Generators: they over-build capacity to meet any and all demands placed on them. There aren't just no incentives for Power Generators or their customers to reduce either aggregate or instantaneous demand, but the reverse: significant economic disincentives to reducing demand, and hence to lowered income and profit. This is a perverse economic outcome costing us a lot of money and burning carbon unnecessarily.

From "An EnergySmart Plan. Positioning Queensland for a Diversified Energy Future 2010 - 2050" [original dead link] (Nov 2010 report for Queensland Government):
Ergon and ENERGEX will each spend $6 billion (that is $12B combined) in capital expenditure over the next five years to cope with extraordinary consumption during a fraction of the year, rather than the average consumption over the course of the year.

To put this into perspective, ENERGEX has over $900M in assets that are only used for approximately 3.5 days per year. (Mark Paterson, ENERGEX, The SPRA Standard).
That's not good business, but how can a solution be converted into useful products that make a profit?

2011/12/03

Microsoft Troubles XIII: "Business Insider" articles

A couple of articles discussing Microsoft's future, directly and indirectly.
Others are starting to forecast the potential for a collapse and "Microsoft 2012 == IBM 1990".

Steve Jobs Was Right: Google IS Turning Into Microsoft
(reminding us that Microsoft has tried to get into TV, cable, music and a bunch of other things)
(Larry Pages asked Steve Jobs for advice.)
Jobs told him to focus on fewer things and do them really well.

2011/11/30

Apple needs to invent 'The Brick': screenless high-performance CPU, Graphics and Storage

Two new things appeared in my world recently:
Wilcox wonders what will happen to "Power Users" if Apple move on Desktops, as they've moved on from the rack-mount X-serve line. Customers needing servers now have the choice of a Mac Mini Server or a "Mac Pro" (tower).

Can Apple afford to cut-adrift and ignore the needs of good folk who rely on their product for their business/livelihood, some of whom may have used Macs for 20+ years?? Would seem a Bad Idea to alienate such core and influential users.

Clearly Apple look to the future, and like the floppy drive they expunged long ago in favour of Optical drives (now also obsolete), Desktops as we know them are disappearing from mainstream appeal and usefulness.

I think there are two markets that Apple needs to consider:
  • One they haven't won yet: Corporate Desktops, and
  • One that's been part of their core business for decades: High-end Graphics/Media
Thunderbolt on laptops means big, even dual, monitors are simple for Corporate Desktops, addressing a large part of the demand/needs. While Apple retain the Mac Mini line, they have a viable PC Desktop replacement for those organisations that like the "modular PC" model, especially those that don't want laptops walking out the door.

The simplicity and elegance of Just One Plug of the iMac makes it unbeatable in certain niche applications, such as public use PC's in Libraries or battery workstations in call centres.

Can Apple produce a "power" laptop with the processing, graphics and storage size/performance that meets the needs of High-end Media folk?
A: No, never. Because the fastest, most-powerful CPU's, GPU's, most RAM and largest/fastest storage only ever come with high-power and big footprint: you need a big box with a big power supply: The definition of a Desktop or Workstation.

One solution would be to licence OS/X to "tier 1" PC vendors like Dell or HP  for use on certified systems. But that's not going to happen, Apple is a hardware/manufacturing company - they will never go there.

Hence "the Brick" that is mainly accessed via "Remote Desktop".
My suggestions are a modular design, not dissimilar to the NGEN's expandable 'slices':
  • CPU's and RAM in a housing with capacity to gang together for scale-up.
  • GPU's in a PCI-slot chassis, with Thunderbolt available for physical displays.
  • Local storage via e-SATA, SAS or Thunderbolt.
  • remote bulk storage over the network
  • External power-supply, or part of a base-unit (CPU, RAM, PCI-slot, network, Thunderbolt).
The point of "the Brick" is ComputePower-on-Demand and Universal-Workspace-View, not unlike SUN's 1993 "Starfire video" prototype.
It can live in a (locked) cupboard, or many can be hosted on a server cluster as one of many Virtual Machines. For even a modest operation, high-power servers running VMware makes operational and economic sense. VM's mean another licensing deal. Perhaps VMware, part of EMC, might have the clout to do a deal like this with Apple. Or not.

Jim Gray authored a paper in 2004, "TerraServer Bricks" as an alternative architecture. The concept is not new/original and more than the usual low-power appliances.



An aside on "Jump Desktop", it uses well established (and secure) remote desktop protocols (RDP, VNC). But for Unix/Linux users interested in security and control, this is important:
Jump also supports SSH tunneling for RDP and VNC connections which also adds a layer of encryption but this must be configured manually.


2011/11/10

Surprises reading up on RAID and Disk Storage

Researching the history of Disk Storage and RAID since Patterson et al's 1988 paper has given me some surprises. Strictly personal viewpoint, YMMV.
  1. Aerodynamic drag of (disk) platters is ∝ ω³ r⁵  (RPM^3 * radius^5)
    • If you double the RPM of a drive, spindle drive power consumption is cubed. All that power is put into moving the air, which in a closed system, heats it.
      Ergo, 15K drives run hot!
    • If you halve the size of a platter, spindle drive power consumption is reduced by the fifth-power. This is why 2½ inch drives use under 5W (and can be powered by USB bus).
    There's a 2003 paper by Gurumurthi on using this effect to dynamically vary drive RPM and save power. Same author in 2008 suggests disks benefit from 2 or 4 sets of heads/actuators. Either to increase streaming rate or seek time, or reduce RPM and maintain seek times.

    The Dynamic RPM paper to be the genesis of the current lines of "Green" drives. Western Digital quote RPM as "intellidrive", but class these as 7,200RPM drives. Access time just got harder to predict.

    This is also the reason that 2.5" and 3.5" 15K drives use the same size platters.

  2. In Patterson's 1988 RAID paper. They compare 3 different drives and invented the term "SLED" - Single Large Expensive Disk to describe the IBM mainframe drives of the time.
    NameCapacityDrive
    Size
    Mb per
    Rack Unit
    Platter
    Size
    PowerSpecific
    Power

    IBM 33807.5Gbwhole
    cabinet
    180Mb/RU14"6.6kW0.9 W/Mb

    Fujitsu
    Super Eagle
    600Mb6RU,
    610mm deep
    60Mb/RU10.5"600W1.0 W/Mb

    Conner
    CP-3100
    100Mb4in x 1.63in,
    150-250mm deep
    350Mb/RU3.5"6-10W0.1 W/Mb

    And two smaller surprises, all these drives had 30-50,000 MTBF and the two non-SLED drives were both SCSI, capable of 7 devices per bus.
    8 or 9 3.5" drives could be fitted vertically in 3RU, or horizontally, 4 per RU.
    Because of the SCSI bus 7-device limit, and the need for 'check disks' in RAID, a natural organisation would be 7-active+1-spare in 2RU.
  3. 2½ inch drives aren't all the same thickness! Standard is ~ 70mmx100mm x 7-15mm
    • 9.5mm thick drives are currently 'standard' for laptops (2 platters)
    • 7mm drives, single platter, are used by many netbooks.
    • 7mm can also be form-factor of SSD's
    • "Enterprise" drives can be 12.5mm (½ inch) or 15mm (more common)
    • Upshot is, drives ain't drives. You probably can't put a high-spec Enterprise drive into your laptop.
  4. IBM invented the disk drive (RAMAC) in 1956. 50 platters of 100KB (= 5Mb). Platters loaded singly to read/write station.
    IBM introduced it's last SLED line, the 3390, in 1989. The last version, "Model 9" 34Gb, was introduced in 1993. Last production date not listed by IBM.
    IBM introduced the 9341/9345 Disk Array, a 3390 "compatible", in 1991.
    When Chen, Patterson et al published their follow-up RAID paper in 1994, they'd already spawned a whole industry and caused the demise of the SLED.
    IBM sold its Disk Storage division to Hitachi in 2003 after creating the field and leading it for 4 decades.
  5. RAID-6 was initially named "RAID P+Q" in the 1994 Chen, Patterson et al paper.
    The two parity blocks must be calculated differently to support any two drive failures, they aren't simply two copies of "XOR".
    Coming up with alternate parity schemes, the 'Q', is tricky - they can be computationally intensive.
    Meaning RAID-6 is not only the slowest type of RAID because of extra disk accesses (notionally, 3 physical writes per logical-block update), but it also consumes the most CPU resource.
  6. IBM didn't invent the Compact-Flash format "microdrive", but did lead its development and adoption. The most curious use was the 4Gb microdrive in the Apple iPod mini.
    In 2000, the largest Compact Flash was the 1Gb microdrive.
    By 2006, Hitachi, after acquiring from IBM in 2003, had increased the capacity to 8Gb, its last evolution.
    According to wikipedia, by 2009, development of 1.3", 1" and 0.85" drives was abandoned by all manufacturers.
  7. Leventhal in 2009 pointed out if capacities kept doubling every 2 years, then by 2020 (5 doublings or *32), then RAID would need to adopt triple-parity (and suggests "RAID-7").
    What I found disturbing is that the 1993 RAID-5 and 2009 RAID-6 calculations for the probability of a successful RAID rebuild after a single drive failure is 99.2%.

    I find an almost 1% chance of a RAID rebuild failing rather disturbing.
    No wonder Google invented it's own way of providing Data Protection!
  8. The UER (Unrecoverable Error Reading) quoted for SSD's is "1 sector in 10^15".
    We know that flash memory is organised as blocks, typically 64kB, so how can they only lose a single sector? Or do they really mean "lose 128 sectors every 10^17 reads"?
  9. Disk specs now have"load/unload cycles" quoted (60-600,000).
    Disk platters these days have a plastic unload ramp at the edge of the disk, and the drive will retract the heads there after a period of inactivity.
    Linux servers with domestic SATA drives apparently have a reputation for exceeding load/unload cycles. Cycles are reported by S.M.A.R.T., if you're concerned.
  10. Rebuild times of current RAID sets are 5hours to over 24 hours.
    Part of this is due to the large size of "groups", ~50. In 1988, Patterson et al expected 10-20 drives per group.
    As well, the time-to-scan a single drive has risen from ~100 seconds to ~6,000 seconds.
  11. One of the related problems with disks is archiving data. Drives have a 3-5 year service life.
    A vendor claims to have a writeable DVD-variant with a "1,000 year life".
    They use a carbon-layer (also called "synthetic stone") instead of a dye layer.
    There is also speculation that flash-memory used as 'write-once' might be a good archival medium. Keep those flash drives!
Update 11-Nov-2011:

Something new I learnt last night:
 The 1.8" disk format is very much alive and well.
 they're used in mobile appliances.
 I wonder if we'll see them "move up" into laptops, desktops or servers?
 Already I've seen a 2.5" SSD which is a 1.8" module in a carrier...

Another factoid:
 For the last 2 years, HP has only shipped servers with 2.5" internal
drives.

Apple lead the desktop world twice in this fashion:
  Mac's skipped 5.25" floppies, only ever 3.5".
  Mac removed floppy drives well before PC's.

Does the 'Air' w/o optical drive count too?
The Mac Classic used SCSI devices, which seemed like a very good idea at the time. But not great for consumer-level devices and they've gone to SATA now.
Apple did invent Firewire (IEEE 1394 a.k.a. "iLink"), which took off in the video market, and I believe still support it on most devices. 


Articles


"Triple-Parity RAID and Beyond", ACM Queue
 Adam Leventhal (SUN), December 17, 2009

"Calculating Mean Time To Data Loss (and probability of silent data corruption)"
Jeff Whitehead, Zetta, June 10, 2009

"A Better RAID Strategy for High Capacity Drives in Mainframe Storage"  [PDF],
ORACLE Corporation, Sept 2010.

"Comparison Test: Storage Vendor Drive Rebuild Times and Application Performance Implications"
Dennis Martin,  Feb 18, 2009

"Considerations for RAID-6 Availability and Format/Rebuild Performance on the DS5000" [PDF]
IBM, March 2010.

"Your Useable Capacity May Vary ..."
Chuck Hollis, EMC Corp, August 28, 2008.


"Five ways to control RAID rebuild times" [requires login. Only intro read]
George Crump. July, 2011 ???
 In a recent test we conducted, a RAID 5 array with five 500 GB SATA drives took approximately 24 hours to rebuild. 
 With nine 500 GB drives and almost the exact same data set, it took fewer than eight hours.
"DRPM: Dynamic Speed Control for Power Management in Server Class Disks",  Gurumurthi, Sivasubramaniam,  Kandemir, Franke, 2003, International Symposium on Computer Architecture (ISCA).

"Intra-Disk Parallelism: An Idea Whose Time Has Come", Sankar, Gurumurthi, Mircea R. Stan, ISCA, 2008.

2011/11/06

The importance of Design Rules

This started with an aside in "Crypto", Stephen Levy (2000), about Rivest's first attempt at creating an RSA Crypto chip failing because whilst the design worked perfectly on the simulator, it didn't work when fabricated.
[p134] Alderman blames the failure on their overreliance on Carver Mead's publications...
Carver Mead and Lynn Conway at CalTech revolutionised VLSI design and production around 1980, publishing "Introduction to VLSI System Design" and providing access to fabrication lines for students and academics. This has been widely written about:
e.g. in "The Power of Modularity", a short piece on the birth of the microchip from Longview Institute, and a 2007 Computerworld piece on the importance of Mead and Conway's work.

David A. Patterson wrote of a further, related, effect in Scientific American, September 1995, p63, "Microprocessors in 2020"

Every 18 months microprocessors double in speed. Within 25 years, one computer will be as powerful as all those in Silicon Valley today

Most recently, microprocessors have become more powerful, thanks to a change in the design approach.
Following the lead of researchers at universities and laboratories across the U.S., commercial chip designers now take a quantitative approach to computer architecture.
Careful experiments precede hardware development, and engineers use sensible metrics to judge their success.
Computer companies acted in concert to adopt this design strategy during the 1980s, and as a result, the rate of improvement in microprocessor technology has risen from 35 percent a year only a decade ago to its current high of approximately 55 percent a year, or almost 4 percent each month.
Processors are now three times faster than had been predicted in the early 1980s;
it is as if our wish was granted, and we now have machines from the year 2000.
Copyright 1995 Scientific American, Inc.
The important points are:
  • These acts, capturing expert knowledge in formal Design Rules, were intentional and deliberate.
  • These rules weren't an arbitrary collection just thrown together, they were a three-part approach, 1) the dimensionless scalable design rules, 2) the partitioning of tasks and 3) system integration and testing activities.
  • The impact, through a compounding rate effect, has been immense e.g. through Moore's Law doubling time, bringing CPU improvements forward 20 years.
  • The Design Rules have become embedded in software design and simulation tools, allowing new silicon devices to be designed much faster, with more complexity and with orders fewer errors and faults.
  • It's a very successful model that's been replicated in other areas of I.T.
So I'm wondering why vendors don't push this model in other areas?
Does it not work, not scale or is not considered 'useful' or 'necessary'?

There are some tools that contain embedded expert knowledge, e.g. for server storage configuration. But they are tightly tied to particular vendors and product families.

Update 13-Nov-2011: What makes/defines a Design Rule (DR)?

Design Rules fall in the middle ground between  "Rules-of-Thumb" used in Art/Craft of Practice and  the authoritative, abstract models/equations of Science.

They define the middle ground  of Engineering:
 more formal than R-o-T's but more general and directly applicable than the theories models and equations of pure Science, suitable for creating and costing Engineering designs.

This "The Design Rule for I.T./Computing" approach is modelled after the VLSI technique used for many decades, but is not a slavish derivation of it.

Every well understood field of Engineering has one definitive/authoritative "XXX Engineering Handbook" publication that covers all the sub-fields/specialities, recites all the formal Knowledge, Equations, Models, Relationships and Techniques, provides Case Studies, Tutorials, necessary Tables/Charts and worked examples. Plus basic material of ancillary, related or supporting fields.

The object of these "Engineering Handbooks" is that any capable, competent, certified Engineer in a field can rely on its material to solve problems, projects or designs that come their way. They have a reference they can rely upon for their field.

Quantifying specific costs and materials/constraints comes from vendor/product specifications and contracts or price lists. These numbers are used for the detailed calculations and pricing using the techniques/models/equations given in The Engineering Handbook.

A collection  of "Design Rules for I.T. and Computing" may serve the same need.

What are the requirements of a DR?:
  • Explicitly list aspects covered and not covered by the DR:
     eg. Persistent Data Storage vs Permanent Archival Storage
  • Constraints and Limits of the DR:
    What's the largest, smallest or complex system applicable.
  • Complete: all Engineering factors named and quantified.
  • Inputs and Outputs: Power, Heat, Air/Water, ...
  • Scalable: How to scale the DR up and down.
  • Accounting costs: Whole of Life, CapEx and Opex models.
  • Environmental Requirements: 
  • Availability and Serviceability:
  • Contamination/Pollution: Production, Supply and Operation.
  • Waste generation and disposal.
  • Consumables, Maintenance, Operation and Administration
  • Training, Staffing, User education.
  • Deployment, Installation/Cutover, Removal/Replacement.
  • Compatibility with systems, components and people.
  • Optimisable in multiple dimensions.  Covers all the aspects traded off in Engineering decisions:
    • Cost: per unit, 'specific metric' (eg $$/Gb),
    • Speed/Performance:  how it's defined, measured, reported and compared.
    • 'Space' (Speed and 'Space' in the sense of Algorithmn trade-off)
    • Size, Weight, and other Physical characteristics
    • 'Quality' (of design and execution, not the simplistic "fault/error rate")
    • Product compliance to specification, repeatability of 'performance'. (manufacturing defects, variance, problems, ...)
    • Usability
    • Safety/Security
    • Reliability/Recovery
  • other factors will be needed to achieve a model/rule that is:
     {Correct, Consistent, Complete, Canonical (ie min size)}

2011/09/16

QUPSRSTCO: Software Design has more dimensions than 'Functionality'

Summary: There are multiple Essential Dimensions of Software Design besides "Functionality".

There are three Essential External Dimensions, {Function, Time, Money} and multiple Internal Dimensions.
I'm ot sure where/how "Real-Time" is covered, it isn't just "Performance": the necessary concurrency (not just "parallelism") and asynchronous events/processing require 10-100 times the cognitive capacity to deal with, and problems scale-up extraordinarily (faster than exponential) due to this added complexity. This is why Operating Systems and embedded critical systems (health/medicine, aerospace control, nuclear, Telecomms, Routers/Switches, Storage Devices, ...) are so difficult and expensive.

Not understanding and enumerating these multiple Dimensions whilst seemingly teaching Functionality only is perhaps currently the single biggest single failure of the discipline of Software Engineering.

The Necessary or Essential Dimensions of the further Software phases of Software Construction, Software Deployment and Software Maintenance besides the meta-processes of Software Project Management and I.T. Operations are beyond the scope of this piece.

This non-exhaustive taxonomy implies that there are additional Essential Dimensions, such as Maintainability and Manageability, elsewhere in the Computing/I.T. milieu.

My apologies in advance that this piece is in itself a first pass and not yet definitive.


+++ Need to deal with "Documentation" vs "Literate Programming" vs "Slices & tools"
+++ Dev - Ops. Infrastructure is part of the deliverable. Scripts on PRD/DEV/TST must be same. Software Config Mgt and Migration/Fail-back/Fail-over are different and essential/necessary



Software Design:
I'm using Software Design in an unconventional sense:
  everything that precedes and defines Coding and Construction.

While noting that Software Design and Construction are closely intertwined and inter-dependent and that all Software Projects are iterative, especially after notional Deployment and during Software Maintenance.

The acts of coding and testing uncover/reveal failings, errors, assumptions, blind-spots and omissions in the Design and its underlying models and concepts.

Where do the various Testing activities belong?
Wherever your Process or Project Methodology define them to be.
Many Software Design problems are revealed when first attempting to construct tests and later in performing them. Thus creating feedback, corrections and additional requirements/constraints.


What's an "Essential Dimension"?
In Formal Logic and Maths, there's the notion of "necessary and sufficient conditions" for a relationship or dependency to hold.
It is in this sense that I'm defining an "Essential Dimension" of elements or phases in the Software  process, that they individually be Necessary and together be Sufficient for a complete solution/result.
A Dimension is Essential if it's removal, omission or non-performance results in Defective, Incomplete, Ineffective, Non-Performing or Non-Compliant Software and Systems.
Or more positively, a Dimension is Essential if it must be performed to achieve the desired/specified process and product outputs and outcomes.
A marker of an Essential is

Defective, or colloquially "Buggy", Software, has many aspects, not just "Erroneous, Invalid or Inconsistent Results".

The term is meant to be parsed against each of the Essential Design Dimensions for specific meanings, such as "Hacked or Compromised" (Security), "Failure to Proceed or Complete" (i.e. crash or infinite loop: Quality), "Too Slow" (Performance),  "Corrupt or Lose Data" (Quality), "Unmaintainable" (Quality) and "Maxed out" (Scalability).


Initial candidate Essential Dimensions.
From my experience and observations of the full Software cycle and I.T. Operations, a first cut, not in order of importance:
  • F - Functionality
  • Q - Quality
  • U - Usability
  • P - Performance
  • S - Security/Safety
  • R - Reliability/Recovery
  • S - Scalability
  • T - Testability
  • C - Concurrency/Asynchronousity
  • O - Operability/Manageability

Relative Importance of the Design Dimensions
Which Dimension is most important?
All and None: it depends on the specific project or task and its goals, constraints and requirements.

An essential outcome of the Specification phase of Software Design is to precisely define:
  • The criteria for each  Essential Design Dimensions for the Product, Project, all Tasks and every Component.
  • The relative importance of the Dimensions.
  • How to assess final compliance to these criteria in both Business and Technical realms.
The one universally applicable Design Dimension is Quality.

Which of its many aspects are critical for any project, sub-system, task, phase or component, and how they will be monitored, controlled and confirmed, must be defined by your meta-processes or derived through the execution of your Methodology.

Minimally, any Professionally produced Software component or product must be shown to conform both to the Zeroth Law requirements (keep running, terminate, Do no Damage  and produce results) and its written Functional Requirements/Specifications.


Quality


Zeroth Law requirements (keep running, terminate, Do no Damage and produce results)

From "The quality of software", Hoare, Software-Practice and Experience Vol 2, 1972 p103-5 

Hoare's Software Quality Criteria:
(1) Clear definition of purpose
(2) Simplicity of use
(3) Ruggedness
(4) Early availability
(5) Reliability
(6) Extensibility and improvability in light of experience
(7) Adaptability and easy extension to different configurations
(8) Suitability to each individual configuration of the range
(9) Brevity
(10) Efficiency (speed)
(11) Operating ease
(12) Adaptability to wide range of applications
(13) Coherence and consistency with other programs
(14) Minimum cost to develop
(15) Conformity to national and international standards
(16) Early and valid sales documentation
(17) Clear accurate and precise user’s documents

Security/Safety

Performance

Usability

Reliability/Recovery

Scalability

Testability
  • Functional Testing or Specification Compliance Testing?
  • Load Testing
  • Regression Testing, post-Release esp.
  • Acceptance Testing. Commercial Compliance?
  • Others?
Concurrency/Asynchronousity

Operability/Manageability

2011/09/14

A new inflection point? Definitive Commodity Server Organisation/Design Rules

Summary:

For the delivery of general purpose and wide-scale Compute/Internet Services there now seems to be a definitive hardware organisation for servers, typified by the E-bay "pod" contract.

For decades there have been well documented "Design Rules" for producing Silicon devices using specific technologies/fabrication techniques. This is an attempt to capture some rules for current server farms. [Update 06-Nov-11: "Design Rules" are important: Patterson in a Sept. 1995 Scientific American article notes that the adoption of a quantitative design approach in the 1980's led to an improvement in microprocessor speedup from 35%pa to 55%pa. After a decade, processors were 3 times faster than forecast.]

Commodity Servers have exactly three possible CPU configurations, based on "scale-up" factors:
  • single CPU, with no coupling/coherency between App instances. e.g. pure static web-server.
  • dual CPU, with moderate coupling/coherency. e.g. web-servers with dynamic content from local databases. [LAMP-style].
  • multi-CPU, with high coupling/coherency. e.g. "Enterprise" databases with complex queries.
If you're not running your Applications and Databases in Virtual Machines, why not?
[Update 06-Nov-11: Because Oracle insists some feature sets must run on raw hardware. Sometimes vendors won't support your (preferred) VM solution.]

VM products are close to free and offer incontestable Admin and Management advantages, like 'teleportation' or live-migration of running instances and local storage.

There is a special non-VM case: cloned physical servers. This is how I'd run a mid-sized or large web-farm.
This requires careful design, a substantial toolset, competent Admins and a resilient Network design. Layer 4-7 switches are mandatory in this environment.

There are 3 system components of interest:
  • The base Platform: CPU, RAM, motherboard, interfaces, etc
  • Local high-speed persistent storage. i.e. SSD's in a RAID configuration.
  • Large-scale common storage. Network attached storage with filesystem, not block-level, access.
Note that complex, expensive SAN's and their associated disk-arrays are no longer economic. Any speed advantage is dissolved by locally attached SSD's, leaving only complexity, resilience/recovery issues and price.
Consequentially, "Fibre Channel over Ethernet" with its inherent contradictions and problems, is unnecessary.

Designing individual service configurations  can be broken down into steps:
  • select the appropriate CPU config per service component
  • specify the size/performance of local SSD per CPU-type.
  • architect the supporting network(s)
  • specify common network storage elements and rate of storage consumption/growth.
Capacity Planning and Performance Analysis is mandatory in this world.

As a professional, you're looking to provide "bang-for-buck" for someone else who's writing the cheques. Over-dimensioning is as much a 'sin' as running out of capacity. Nobody ever got fired for spending just enough, hence maximising profits.

Getting it right as often as possible is the central professional engineering problem.
Followed by, limiting the impact of Faults, Failures and Errors - including under-capacity.

The quintessential advantage to professionals in developing standard, reproducible designs is the flexibility to respond to unanticipated load/demands and the speed with which new equipment can be brought on-line, and the converse, retired and removed.

Security architectures and choice of O/S + Cloud management software is outside the scope of this piece.

There are many multi-processing architectures, each best suited to particular workloads.
They are outside the scope of this piece, but locally attached GPU's are about to become standard options.
Most servers will acquire what were known as vector processors and applications using this capacity will start to become common. This trend may need their own Design Rule(s).

Different, though potentially similar design rules apply for small to mid-size Beowulf clusters, depending on their workload and cost constraints.
Large-scale or high-performance compute clusters or storage farms, such as the IBM 120 Petabyte system, need careful design by experienced specialists. With any technology, "pushing the envelope" requires special attention by the best people you have,  to even have a chance of success.

Not unsurprisingly, this organisation looks a lot like the current fad, "Cloud Computing" and the last fad, "Services Oriented Architecture".



Google and Amazon dominated their industry segments partly because they figured out the technical side of their business early on. They understood how to design and deploy datacentres suitable for their workload, how to manage Performance and balance Capacity and Cost.

Their "workloads", and hence server designs, are very different:
  • Google serves pure web-pages, with almost no coupling/communication between servers.
  • Amazon has front-end web-servers is backed by complex database systems.
Dell is now selling a range of "Cloud Servers" purportedly based on the systems they supply to large Internet companies.





An App too far? Can Windows-8 gain enough traction.

Summary:
"last to market" worked as a strategy in the past for Microsoft.
But "everything is a PC" is probably false and theyll be sidelined in the new Mobile Devices world.

2011/06/12

Why Apple won't add peer-peer to iCloud

The sister post to this speculates that Apple could add peer-peer protocols/functionality to its iCloud service, and the benefits that would flow.

I'm firmly of the opinion that Apple won't go there, not soon, not in-a-while, not ever.

They have too firmly entrenched attitudes about constructing and maintaining "full-control" over the euphemistically named, "User Experience".

Apple don't do "collaboration", sharing their technology or allowing the mere User to tinker with its Gorgeous Stuff. It'd no longer be "their Design" and they anathema to Apple.

Apple are into "control", which in itself is not a bad thing, but severely limits their software and system design decisions and implementations.

This isn't some simple "we know best" thing, but much deeper, intimately tied to their focus on High Concept Design and a finely crafted "User Experience". Which also means controlled experience.

Apple could make huge inroads into the PC market by licensing OS/X - something it could've done anytime in the last 10 years. Now that "classic" computers are under 25% of its business, Apple could let go of its stranglehold on its computer hardware and light the fires of innovation: "let a thousand roses bloom".  But they cannot and won't.

This translates to iCloud in two ways:
  • they haven't thought of the idea themselves, and
  • they probably couldn't model the response times of torrent-like service and would baulk at any service which is in the least unpredictable, perhaps sometimes not-quite-perfect.
Apple need to control its "User Experience", which means they can't let other players on-board and can't adopt "radical" or "unproven" solutions. (ie. "not invented here").

So they will build and run some very large datacenters to run iCloud.

The trouble with this approach is they are leaving the field of Innovation open to their competitors.
We know Microsoft won't embrace it, but Google and Android will and do.

Even Great Design can be copied and tweaked, sometimes even improved.That the British lost it home-grown motorcycle and motor car industries, know for radical and innovative design, to the Japanese and their "continuous improvement/refinement cycle" demonstrates this thesis.

In 10 years, will the "iPhone 15" be a patch on Android and the gazillion services/applications it runs?
I suspect not. The most amazing and usable devices are unlikely to come from the standalone corporations.

It could be Google and Android, it could be something completely new.

It just won't be Apple at the front, again.
Think how they blew their advantage of the Apple II. They've got form and the same fixed, rigid mindset is still rampant. That's good for Bold New Steps, poor for continuous stepwise refinement.

Apple iCloud and peer-peer (Torrents)

Will Apple add a torrent like ability to its iCloud offering??

iCloud is a remote filesystem with a lot of metadata and does 4 things:
  • provides "second copy of my precious data" (for files I've generated)
  • allows synchronisation of those files across the multiple devices/platforms a user connects. This is the aspect Apple 'sells': email, contact and calendar sync and restore/recover.
  • mediates the enforcement of copyright and content distribution
  • does Internet-Scale data de-duplication.
    • By data volume, the Internet is a 'Viewing Platform'.
    • 1 upload == zillions downloads [write once, download ~infinite]

Apple could create an Internet-Scale "Content Delivery Network" with iCloud if ran a peer-peer network, something like the hugely successful bit-torrent protocol/service.

Because you've got authorised content and validated entities/logins in a vendor controlled environment, there isn't a direct copyright or leakage problem, just the ever-present and non-removable "analogue hole".
There is scope for scanning never-before-seen files to see if they are recodings, subsets or 'analogue rerecordings' of know files.
What action then? Automatically remove the file, "Bill the User" or send a Summons?

'Backups' of already known files take the time to transfer and compare the checksum/identifier. That's a incredible compression ratio/speed-up. Those checksum/identifiers also are the natural keys for both the 'torrent' and backing-store key.

Storing the per-machine file/directory structure is another layer and doesn't yield to the same de-duplication/compression techniques.
If I were implementing the local filesystem, I'd do two things:
  • calculate and store checksums on-the-fly and store in the metadata.
  • make sure part of the metadata was as whole-file checksum or UUID-type identifier.
Possibly also calculate and store large-chunk (8-64Mb) checksums.

This enables two services usually only seen at Enterprise scale:
  • Document-Management-System like controls, searches, functionality.
  • user collaboration: tagging, comments, hilighting, edits/recuts + mashups, annotation, linking, etc.

As bit-torrent shows, using distributed {storage, net-bandwidth, CPU} scales, amplifies 'Server Effectiveness' and gives apparent 100% uptime/availability to services.
It's really cheap and easy for the content provider, though causes more traffic on the ISP network.

BTW, this isn't limited to Desktops and mobile devices.
It scales to IP-TV and on-demand content services.
Your Apple-TV box effectively has infinite & perfect storage...
Internode has announced 'fetchTv' - so these services are on the radar for ISP's.

It also has significant consequences for Ozzie customers who pay per Gb.
You really don't want your iPad on a 1Gb wireless 3G plan acting as a torrent server. A nasty $2000 bill surprise!

There are serious performance issues with local congestion (POP, ISP, backhaul, main-site) inter-network links  and dealing with ADSL bandwidth characteristics.

The NBN is going to be a Layer 2 network, (2-level VLANs or 802.11 "Q in Q").
The presumption is ISP's will offer PPPoE to begin with, as for the current ADSL services.

PPoE is not well suited to distributed data/torrents:
  •  your source device puts packets onto your LAN,
  •  your firewall/router fields those packets and pushes them to your ADSL modem which encapsulates packets into PPPoE, then puts these new packets onto the wire
  • the bytes go down your line to the DSLAM
  • are routed down the 'backhaul' link to the ISP's nearest site
  • into the 'Access Concentrator' to become a public IP addr
  • then routed towards the destination public IP address, which could be on the same Concentrator, the same POP, the same ISP, a shared 'Interconnect', or an upstream provider
  • into the destination Access Concentrator
  • down the backhaul, DSLAM, ADSL model, firewall/router and eventually appear on the destination LAN to the receving device.

Which gives you so many single-point-of-failure/congestion/saturation that it isn't funny...

If the other person is across the hall or across the road, this incurs a massive and needless overhead, not to mention delays and multiple local resource contention.

The wikipedia PPPoE article discusses problems and current solutions

So, if iCloud becomes a torrent-like service, will it overload the NBN??

2011/05/24

Microsoft Troubles XII: IBM market cap re-overtakes MSFT

Update 1. Influential hedge-fund manager, David Einhorn president of Greenlight Capital, calls for Ballmer to stand aside.

Update 2. The MSFT board stands behind Ballmer, rejects David Einhorn's call to stand aside.

Einhorn has 9M MSFT shares (0.011%). He's bought because he thinks they're undervalued.
This could just be a media beat-up by him to make some money - the share price has increased.

Whatever the cause, this is a significant milestone.
The MSFT board has had to consciously and publicly defend their continued choice of Ballmer as CEO.

2011/05/11

Microsoft Troubles XI: APPL more profitable than MSFT

Adam Harthung at Forbes wrote Why Not All Earnings Are Equal; Microsoft Has the Wal-Mart Disease, byline is May3, 2011.

 Read the article, says more than I can, with more (economic) facts and more eloquently.

It's only ever been about the company's economic performance.
Poor products and ignoring your customers only ever have one outcome.
Sad for the employees and shareholders, though.

Please note, I am specifically saying that Microsoft products are NOT doomed, just the company.

Winders on the Corporate Desktop isn't going away anytime soon (2-3 decades to run at least).
Too much invested, too many careers tied to it and the Lemmings Rush hasn't turned elsewhere yet...

The next two Big Questions for Microsoft, the company:
  • How soon before the Board notices and removes Ballmer?
  • Who will be the eventual purchaser of the profitable lines-of-business - Windows and Corporate solutions?

2011/01/15

Microsoft Troubles X: Ballmer as CEO being questioned

Richard Waters published a piece, "Ballmer's opportunity to prove his worth", on 12-Jan-2011 in the Financial Times. It's been picked up and reprinted - I become aware of this through an investment newsletter.  Microsoft's performance is now a concern/topic for mainstream investors. That can't be good.