Planet Cloud [Computing]

January 27, 2012

Lori MacVittie

Gravatar

F5 Friday: Goodbye Defense in Depth. Hello Defense in Breadth.

#adcfw #infosec F5 is changing the game on security by unifying it at the application and service delivery layer. f5friday

Over the past few years we’ve seen firewalls fail repeatedly. We’ve seen business disrupted, security thwarted, and reputations damaged by the failure of the very devices meant to prevent such catastrophes from happening. These failures have been caused by a change in tactics from invaders who seek no longer to find away through or over the walls, but who simply batter it down instead. A combination of traditional attacks – network-layer – and modern attacks – application-layer – have become a force to be reckoned with; one that traditional stateful firewalls are often not equipped to handle. Encrypted traffic flowing into and out of the data center often bypasses security solutions entirely, leaving another potential source of a breach unaddressed. And performance is being impeded by the increasing number of devices that must “crack the packet” as it were and examine it, often times duplicating functionality with varying degrees of success. This is problematic because the resolution to this issue can be as disconcerting as the problem itself: disable security. Seriously. Security functions have been disabled, intentionally, in the name of performance.

IT security personnel within large corporations are shutting off critical functionality in security applications to meet network performance demands for business applications.

SURVEY: SECURITY SACRIFICED FOR NETWORK PERFORMANCE 

What the company [NSS Labs] found would likely startle any existing or potential customers: three of the six firewalls failed to stay operational when subjected to stability tests, five out of six didn't handle what is known as the "Sneak ACK attack," that would enable attackers to side-step the firewall itself. Finally, according to NSS Labs, the performance claims presented in the vendor datasheets "are generally grossly overstated."

Independent lab tests find firewalls fall down on the job 

Add in the complexity from the sheer number of devices required to implement all the different layers of security needed, which increases costs while impairing performance, and you’ve got a broken model in need of repair. This is a failure of the defense in depth strategy; the layered, multi-device (silo) approach to operational security. Most importantly, it’s one that’s failing to withstand attacks.

What we need is defense in breadth – the height of the stack –to assure availability and security using a more intelligent, unified security strategy.

DEFENSE in BREADTH

While it’s really not as catchy as “defense in the depth” the concept behind the admittedly awkward sounding phrase is sound: to assure availability and security simultaneously requires a strong security strategy from the bottom to the top of the networking stack, i.e. the application layer. The ability of the F5 BIG-IP platform to provide security up and down the stack has existed for many years, and its capabilities to detect, prevent, and withstand concerted attacks has been appreciated by its customers (quietly) for some time. While basic firewalling functions have been a part of BIG-IP for years, there are certain capabilities required of a firewall – specifically an ICSA certified firewall – that it didn’t have. So we decided to do something about that.

The result is the ICSA certification of the BIG-IP platform as a network firewall. Combined with its existing

ICSA certification for web application firewall (BIG-IP Application Security Manager) and SSL-TLS VPN 3.0 (BIG-IP Edge Gateway), the BIG-IP platform now supports a full-spectrum security solution in a single, unified system. What is unique about F5’s approach is that the security capabilities noted above can be deployed on BIG-IP Application Delivery Controllers (ADCs)—best known for providing industry-leading intelligent traffic management and optimization capabilities. This firewall solution is part of F5’s comprehensive security architecture that enables customers to apply a unified security strategy. For the first time in the industry, organizations can secure their networks, data, protocols, applications, and users on a single, flexible, and extensible platform: BIG-IP.

Combining network-firewall services with the ability to plug the hole in modern security implementations (the application layer) with a platform-based solution provides the opportunity to consolidate security services and leverage a shared infrastructure platform resulting in a more comprehensive, strategic deployment that is not only more secure, but more cost effective. 

adc fw

 

Resources:


Connect with Lori: Connect with F5:
o_linkedin[1] google  o_rss[1] o_facebook[1] o_twitter[1]   o_facebook[1] o_twitter[1] o_slideshare[1] o_youtube[1] google

Related blogs & articles:


by Lori MacVittie at January 27, 2012 12:45 PM

OakLeaf Systems

Gravatar

Introducing Microsoft Codename “Cloud Numerics” from SQL Azure Labs

Introduction

Table of Contents

  • “Cloud Numerics” Background
  • The MSCloudNumerics.sln Project Template and Sample Solution
  • “Cloud Numerics” Prerequisites (updated 1/26/2012, see below)
  • Installing the HPC and “Cloud Numerics” Components
  • “Cloud Numerics” Mathematic Libraries for .NET
  • “Cloud Numerics” Distributed Array, Algorithm and Runtime Libraries for .NET
  • Limitations of “Cloud Numerics”
  • Running the MSCloudNumerics Sample Project Locally
  • References

Updated 1/27/2012: Added Visual C++ as an (undocumented) required component of Visual Studio 2010 SP1 or downloading and copying the Microsoft Visual C++ 2010 SP1 Redistributable Package (x64) .dll files to added folders as a prerequisite for submitting executable files for the Windows Azure HPC Cluster to run. (See the “Cloud Numerics” Prerequisites section.)

Updated 1/25/2012: My (@rogerjenn) Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters of 1/25/2012 describes how to configure and deploy two 8-core HPC clusters hosted in Windows Azure and submit the Latent Semantic Indexing (LSICloudApplication) project to the Windows Azure HPC Scheduler for processing.


“Cloud Numerics” Background

imageCodename “Cloud Numerics” is the latest in a series of new SQL Azure Labs tools for managing and analyzing Big Data in the Cloud with Windows Azure and SQL Azure. Ronnie Hoogerwerf’s introductory The “Cloud Numerics” Programming and runtime execution model post of 1/11/2012 to the Microsoft Codename “Cloud Numerics” blog begins:

Microsoft Codename “Cloud Numerics” is a new .NET® programming framework tailored towards performing numerically-intensive computations on large distributed data sets. It consists of

  • a programming model that exposes the notion of a partitioned or distributed array to the user
  • an execution framework or runtime that efficiently maps operations on distributed arrays to a collection of nodes in a cluster
  • imagean extensive library of pre-existing operations on distributed arrays and tools that simplify the deployment and execution of a “Cloud Numerics” application on the Windows Azure™ platform

Writing numerical algorithms is challenging and requires thorough knowledge of the underlying math; typically this line of work is the realm of experts with job titles such as: data scientist, quantitative analyst, engineer, etc. Writing numerical algorithms that scale-out to the cloud is even harder. At the same time the ever increasing appetite for and availability of data is making it more and more important to be able to scale-out data analytics models and this is exactly what “Cloud Numerics” is all about. For example, with “Cloud Numerics” it is possible to write document classification applications using powerful linear algebra and statistical methods, such as Singular Value Decomposition or Principle Component Analysis, or to write applications that search for correlations in financial time series or genomic data that work on today’s cloud-scale datasets. [Links added.]

“Cloud Numerics” provides a complete [C#] solution for writing and developing distributed applications that run on Windows Azure. To use “Cloud Numerics” you start in Visual Studio with our custom project definition that includes an extensive library of numerical functions. You develop and debug your numerical application on your desktop, using a dataset that is appropriate for the size of your machine. You can read large datasets in parallel, allocate and manipulate large data objects as distributed arrays, and apply numerical functions on these distributed array[s]. When your application is ready and you want to scale-out and run on the cloud you start our deployment wizard, fill out your Azure information, deploy, and run you[r] application.

An important takeaway from the preceding excerpt is that the BigData input to “Cloud Numerics” applications must be a partitioned or distributed numeric array. You can load data into distributed arrays with data that implements the Numerics.Distributed.IO.ParallelReader interface or is processed by the sample Distributed.IO.CSVLoader class, which implements that interface.

Note: Source code for the Distributed.IO.CSVLoader class is included in the Cloud Numerics - Examples download, which is described in the “Install the HPC and ‘Cloud Numerics’ Components” section below.

imageRonnie’s Using Data post of 1/20/2012 is a useful reference for array data; it contains the following topics:

A rectangular array of numbers, symbols or expressions is called a matrix. Wikipedia has very detailed Matrix Theory and Linear Algebra topics. Matrix theory is a part of linear algebra. image


The MSCloudNumerics.sln Project Template and Sample Solution

imageThe first “Cloud Numerics” deliverable is a C# project template and sample program for Visual Studio 2010 Professional or Ultimate edition that takes advantage of the following newly available High-Performance Computing (HPC) components, which supersede Microsoft Research’s Dryad and DryadLINQ initiatives for high-performance, parallel computing in the cloud:

An MSI installer for the “Cloud Numeric” software sets up the following components for Visual Studio 2010:

  • Math, Statistics, and Signal Processing libraries as managed Dynamically Linked Library (DLL) files.
  • DLLs for initializing and running jobs on Windows Azure used by the Math, Signal, and Statistics libraries.
  • Associated IntelliSense files for the DLLs.
  • A project deployment template and utility for deploying your application package to Azure.

I’ve covered the following three earlier SQL Azure Labs with illustrated, multi-part tutorials and overview articles:

All four SQL Azure Labs projects require self-nomination for access. Sign up for “Cloud Numerics” here. The above three projects require an invitation code for access to resources; “Cloud Analytics” doesn’t.


“Cloud Numerics” Prerequisites

The project template and sample program have the following operating system and software prerequisites:

  • Windows 7 or Windows Server 2008 R2 SP1, 32 or 64-bit
  • Visual Studio 2010 Professional or Ultimate Edition with SP1 with Visual C++ components installed*
  • SQL Server 2008 R2 Express or higher
  • Windows Azure SDK v1.6 and Windows Azure Tools for Visual Studio, November 2011 or later edition
  • A Windows Azure subscription for deploying projects from local, debugging mode to Windows Azure.

If any of the following obsolete components are present, installation will appear to succeed but you probably won’t be able to open a new “Cloud Numerics” project:

  • Microsoft HPC Pack 2008 R2 Azure Edition
  • Microsoft HPC Pack 2008 R2 Client Components
  • Microsoft HPC Pack 2008 R2 MS-MPI Redistributable Pack
  • Microsoft HPC Pack 2008 R2 SDK
  • Windows Azure SDK v1.5
  • Windows Azure AppFabric v1.5
  • Windows Azure Tools for Microsoft VS2010 1.5

*Update 1/27/2011: The build script expects to find msvcp100.dll and msvcr100.dll files installed by VS2010 in the C:\Program Files(x86)\Microsoft Visual Studio 10.0\VC\redist\x64\Microsoft.VC100.CRT folder and the msvcp100d.dll and msvcr100d.dll debug versions in the C:\Program Files(x86)\Microsoft Visual Studio 10.0\VC\redist\Debug_NonRedist\x64\Microsoft.VC100.DebugCRT folder. If these files aren’t present in the specified locations, the Windows Azure HPC Scheduler will fail when attempting to run MSCloudNumericsApp.exe as Task 5.

Note: You only need to take the following steps if you intend to submit the application to the Windows Azure

If you don’t have the Visual C++ compilers installed, do the following:

  1. Download the Microsoft Visual C++ 2010 SP1 Redistributable Package (x64) (vcredist_x64.exe) to a well-known location
  2. Run vcredist_x64.exe to add the msvcp100.dll, msvcp100d.dll, msvcr100.dll and msvcr100d.dll files to the C:\Windows\System32 folder. 
  3. Create a C:\Program Files(x86)\Microsoft Visual Studio 10.0\VC\redist\x64\Microsoft.VC100.CRT folder and copy the msvcp100d.dll and msvcr100d.dll files to it.

Stay tuned for more details about this issue.


Installing the HPC and “Cloud Numerics” Components

Follow the instructions in the Microsoft Codename "Cloud Numerics" wiki article’s “Software Requirements” section to install the four components listed earlier.

Note: Links to http://connect.microsoft.com/ in the wiki article won’t work because you don’t receive an invitation code to enter.

Go directly to the "Cloud Numerics" Microsoft Connect Site to download:

image

The wiki article’s Simple Examples section includes several example programs that you can run by replacing the code in the MSCloudNumerics project’s Sample.cs file. (See the “Run the MSCloudNumerics Sample Project” section below.)


“Cloud Numerics” Mathematic Libraries for .NET

imageThe CloudNumericsLab.chm help file provides the details of the Microsoft.Numerics classes and their members’ syntax, categorized by namespace:

image

Note: The sample application that follows uses the Cholesky Decomposition. You can replace the code MSCloudNumerics sample application’s Program.cs file with sample code from the help files.

This table from the TechNet wiki article describes the Cloud Numerics Mathematic Libraries for .NET:

image


“Cloud Numerics” Distributed Array, Algorithm and Runtime Libraries for .NET

imageThis table from the Tech*Net wiki article describes the Cloud Numerics Distributed Array and Runtime Libraries for .NET.

image


Limitations of “Cloud Numerics”

From Ronnie’s The “Cloud Numerics” Programming and runtime execution model post of 1/11/2012:

First, the “Cloud Numerics” programming model is primarily based around distributed array operations (c.f. data parallel or SIMD-style of programming). Certain relational operations such as “selects” with user-defined functions or complex joins are simpler to express on top of languages such as Pig, Hive and SCOPE. Similarly, while “Cloud Numerics” is designed to deal with large data sets, it is currently constrained to operate on arrays that can fit in the main memory of a cluster. On the other hand, data on disk can be pre-processed via existing “big data” processing tools and ingested into a “Cloud Numerics” application for further processing.

Second, “Cloud Numerics” is not just a convenient C# wrapper around message-passing libraries such as MPI, for example MPI.NET [3]; all aspects of parallelism are expressed via operations on distributed arrays and the “Cloud Numerics” runtime transparently handles the efficient execution of these high-level array operations on a cluster.

A key aspect that distinguishes “Cloud Numerics” from parallelization techniques such as PLINQ and DryadLINQ [4], that are based on implementing a custom LINQ provider, is that parallelization in “Cloud Numerics” occurs purely at runtime and does not involve any code generation from (say) LINQ expression trees; a user’s application can be developed as a regular .NET application by referencing the “Cloud Numerics” runtime and library DLLs and executed on the cluster in Azure.

Finally, the underlying communication layer in “Cloud Numerics” is built on top of the message passing interface (MPI) and does inherit some of the limitations in the underlying implementation such as:

  1. The process model is currently inelastic; once a “Cloud Numerics” application has been launched on (say) P cores in a cluster, it is not possible to dynamically grow or shrink the resources as the application is running.
  2. The implementation is not resilient against hardware failure. Unlike frameworks like Hadoop that are designed explicitly to operate on unreliable hardware, if one or more nodes in a cluster fails, it is not possible for a “Cloud Numerics” application to automatically recover and continue executing.

On the other hand, having MPI as the underlying communication layer in the “Cloud Numerics” runtime does endow it with certain advantages. For instance, “Cloud Numerics” applications can automatically take advantage of high-speed interconnects such as Infiniband between nodes in a cluster and optimizations such as zero-copy memory transfers and shared-memory-aware collectives within a single multi-core node. More importantly, array operators in “Cloud Numerics” can leverage the vast ecosystem of high-performance distributed memory numerical libraries such as ScaLAPACK built on top of MPI.


Running the MSCloudNumerics Sample Project Locally

image1. Launch Visual Studio, choose New, Project, Visual C#, and select Microsoft Cloud Numerics Application:

image

2. Click OK to generate a new MSCloudNumerics1 console project and press F5 to run it. Mark the Windows Security Alert’s Private Networks check box:

image

Note: The firewall must permit interprocess communication between cores on your machine in the form of network calls to localhost.

3. Click Allow Access for the application to close the dialog and repeat step 2 for the HPC MPI Process manager.

image

4. Click Allow Access to close the dialog. The console displays the dimensions of the distributed array processed by the following code:

image

Note: Wikipedia has more information about the Choleski Decomposition.

5. While the application is running, launch TaskMan and display the CPU cores’ usage:

image

Note: The lab release of the local distributed application runs on a maximum of two cores. Microsoft states that you will be able to specify the number of cores in future versions. My development computer runs Windows 7 on a 2.83 GHz Q9550 Intel Core 2 Quad CPU on a DQ45CB motherboard with 8 GB of RAM.

6. Press Enter to close the console.

The application’s references include the Microsoft.Numerics namespaces from C:\Program Files\Microsoft Numerics\v0.1\Bin:

image


References

imageRonnie Hoogerwerf’s articles from the Microsoft Codename “Cloud Numerics” blog, in chronological order:

imageMy (@rogerjenn) Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters of 1/25/2012 describes how to configure and deploy two 8-core HPC clusters hosted in Windows Azure and submit the Latent Semantic Indexing (LSICloudApplication) project to the Windows Azure HPC Scheduler for processing.


Stay tuned for additional tutorials detailing local execution of Statistics and Time-series application, as well as deployment of these sample projects to Windows Azure.

by Roger Jennings (--rj) (noreply@blogger.com) at January 27, 2012 11:53 AM

Windows Azure and Cloud Computing Posts for 1/26/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222

image433

• Updated 1/27/2012 with new articles marked .

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:


Azure Blob, Drive, Table, Queue and Hadoop Services

Avkash Chauhan (@avkashchauhan) described Node.js and Windows Azure: Creating a blog application using Node.JS and Windows Azure Table & Blob Storage Part 1 in a 1/25/2012 post:

imageIn this example I will create a node.js based blob application which will storage all the blog articles on Azure Storage. When application starts it reads blog article from Windows Azure table storage and then render it using EJS viewer. This sample is part of Azure Node SDK however I am going to enhance it to make it look like a full scale blog application. This is just a start. I will write this blog assume you are a new to node programming. This application uses following node packages:

  • Express
  • EJS
  • Jade
  • Stylus
  • Azure
  • Node-uuid

Let’s start with downloading package one by one:

Express:

C:\Azure\nodeprojects\BlogUsingAzureStorage>npm install express
npm http GET https://registry.npmjs.org/express
npm http 200 https://registry.npmjs.org/express
npm http GET https://registry.npmjs.org/mime
npm http GET https://registry.npmjs.org/qs
npm http GET https://registry.npmjs.org/mkdirp/0.0.7
npm http GET https://registry.npmjs.org/connect
npm http 304 https://registry.npmjs.org/qs
npm http 304 https://registry.npmjs.org/mkdirp/0.0.7
npm http 304 https://registry.npmjs.org/connect
npm http 200 https://registry.npmjs.org/mime
npm http GET https://registry.npmjs.org/formidable
npm http 304 https://registry.npmjs.org/formidable
express@2.5.6 ./node_modules/express
├── mime@1.2.4
├── qs@0.4.0
├── mkdirp@0.0.7
└── connect@1.8.5
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

EJS:

C:\Azure\nodeprojects\BlogUsingAzureStorage>npm install ejs
npm http GET https://registry.npmjs.org/ejs
npm http 304 https://registry.npmjs.org/ejs
ejs@0.6.1 ./node_modules/ejs
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Jade:

C:\Azure\nodeprojects\BlogUsingAzureStorage>npm install jade
npm http GET https://registry.npmjs.org/jade
npm http 200 https://registry.npmjs.org/jade
npm http GET https://registry.npmjs.org/mkdirp
npm http GET https://registry.npmjs.org/commander
npm http 304 https://registry.npmjs.org/mkdirp
npm http 304 https://registry.npmjs.org/commander
jade@0.20.0 ./node_modules/jade
├── commander@0.2.1
└── mkdirp@0.3.0
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Stylus:

C:\Azure\nodeprojects\BlogUsingAzureStorage>npm install stylus
npm http GET https://registry.npmjs.org/stylus
npm http 304 https://registry.npmjs.org/stylus
npm http GET https://registry.npmjs.org/mkdirp/0.0.7
npm http GET https://registry.npmjs.org/growl/1.1.0
npm http GET https://registry.npmjs.org/cssom/0.2.1
npm http 304 https://registry.npmjs.org/mkdirp/0.0.7
npm http 304 https://registry.npmjs.org/growl/1.1.0
npm http 304 https://registry.npmjs.org/cssom/0.2.1
stylus@0.22.6 ./node_modules/stylus
├── growl@1.1.0
├── mkdirp@0.0.7
└── cssom@0.2.1
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

imageAzure:

C:\Azure\nodeprojects\BlogUsingAzureStorage>npm install azure
npm http GET https://registry.npmjs.org/azure
npm http 304 https://registry.npmjs.org/azure
npm http GET https://registry.npmjs.org/qs
npm http GET https://registry.npmjs.org/mime
npm http GET https://registry.npmjs.org/sax
npm http GET https://registry.npmjs.org/xmlbuilder
npm http GET https://registry.npmjs.org/xml2js
npm http GET https://registry.npmjs.org/log
npm http 304 https://registry.npmjs.org/qs
npm http 304 https://registry.npmjs.org/mime
npm http 304 https://registry.npmjs.org/sax
npm http 304 https://registry.npmjs.org/xmlbuilder
npm http 304 https://registry.npmjs.org/xml2js
npm http 304 https://registry.npmjs.org/log
azure@0.5.1 ./node_modules/azure
├── xmlbuilder@0.3.1
├── mime@1.2.4
├── log@1.2.0
├── qs@0.4.0
├── xml2js@0.1.13
└── sax@0.3.5
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Node-uuid:

C:\Azure\nodeprojects\BlogUsingAzureStorage>npm install node-uuid
npm http GET https://registry.npmjs.org/node-uuid
npm http 200 https://registry.npmjs.org/node-uuid
npm WARN node-uuid@1.3.3 dependencies field should be hash of <name>:<version-range> pairs
node-uuid@1.3.3 ./node_modules/node-uuid
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Now if you will look your application node_modules folder you will all the packages are download as below:

C:\Azure\nodeprojects\BlogUsingAzureStorage>dir node_modules
Volume in drive C has no label.
Volume Serial Number is 8464-7B7C
Directory of C:\Azure\nodeprojects\BlogUsingAzureStorage\node_modules
01/25/2012 11:00 PM <DIR> .
01/25/2012 11:00 PM <DIR> ..
01/25/2012 10:59 PM <DIR> .bin
01/25/2012 11:00 PM <DIR> azure
01/25/2012 10:58 PM <DIR> ejs
01/25/2012 10:57 PM <DIR> express
01/25/2012 10:59 PM <DIR> jade
01/25/2012 11:00 PM <DIR> node-uuid
01/25/2012 10:59 PM <DIR> stylus
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Now please clone nodeblogwithazurestorage.git repo from GitHub as below:

C:\Azure\nodeprojects\BlogUsingAzureStorage>git clone https://Avkash@github.com/Avkash/nodeblogwithazurestorage.git
Cloning into 'nodeblogwithazurestorage'...
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 16 (delta 0), reused 16 (delta 0)
Unpacking objects: 100% (16/16), done.
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

You will see a new folder name “'nodeblogwithazurestorage'” which includes all the files from repo. Please copy all of these files to your root folder so your work folder will look like as below:

C:\Azure\nodeprojects\BlogUsingAzureStorage>dir
01/25/2012 11:25 PM 2,828 blog.js
01/25/2012 11:25 PM <DIR> nodeblogwithazurestorage
01/25/2012 11:00 PM <DIR> node_modules
01/25/2012 11:25 PM 161 package.json
01/25/2012 11:29 PM <DIR> public
01/25/2012 11:29 PM <DIR> routes
01/25/2012 11:25 PM 2,073 server.js
01/25/2012 11:29 PM <DIR> views
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

That’s it. Let’s run it.

C:\Azure\nodeprojects\BlogUsingAzureStorage>node server.js
Express server listening on port 40506 in development mode
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Now open your browser using http://localhost:40506 or http://127.0.0.1:4506 and you will see the node blog application is running as below:

In next blog:

  • We will use Windows Azure Table Storage to store and retrieve Blog articles.
  • Change package.json for correct dependencies
  • Updating package
  • We will deploy this application to Windows Azure

image_thumb3_thumb


<Return to section navigation list>

SQL Azure Database, Federations and Reporting

• Gregory Leake posted Announcing SQL Azure Data Sync Preview Refresh to the Windows Azure blog on 1/26/2012:

imageIt has been just over three months since we made the SQL Azure Data Sync Preview release available in the Windows Azure portal. We are thrilled with the adoption of the service and are pleased to make available an updated preview release with some requested features and fixes. SQL Azure Data Sync allows organizations to easily synchronize between multiple on-premises databases and SQL Azure cloud databases—a key hybrid IT scenario. If you have not used the Data Sync service and want to learn about it, there is a new video demonstration available that provides an overview of the service capabilities, target scenarios and shows the service in use. There is also a series of videos available on Channel 9 here.

imageThis release addresses several pieces of feedback we’ve heard from customers and brings us a step closer to General Availability. This is the third update to the Preview service since October and contains the following updates:

  • Data Sync servers can now be created in all Windows Azure data centers, enabling Data Sync servers to be created close to the SQL Azure databases for the best possible performance.
  • The Data Sync section of the Windows Azure portal is now localized in ten languages.
  • Miscellaneous fixes and numerous usability improvements including:
    • Progress indicators are now available in the log for long running synchronizations.
    • Error messages have been improved to better help you troubleshoot problems.
    • Synchronization of self-referencing tables is now supported.
    • A new version of the Data Sync Agent is available in the Download Center and it is highly recommended that existing agents are updated to the new version (available here).

Here’s a brief summary of the changes we made in the first two service updates:

  • Addressed the issue that sometimes led to failed syncs for narrow tables with a small number of columns.
  • Allow logins when either username@server or just username are specified.
  • Column names with spaces are now supported.
  • Columns with a NewSequentialID constraint are converted to NewID for SQL Azure databases in the sync group.
  • Both Administrators and non-Administrators are able to install the Data Sync Agent.
  • A new version of the Data Sync Agent was made available on the Download Center.

The team is hard at work on future updates as we approach General Availability and we really appreciate your feedback to date! Please keep the feedback coming and use the SQL Azure Forum to ask questions or get assistance with issues. Have a feature you’d like to see in SQL Azure Data Sync? Be sure to vote on features you’d like to see added or updated using the Feature Voting Forum.


Cihan Biyikoglu (@cihangirb) asked How much overcapacity are you running with today? I bet SQL Azure Federations can trim that! in a 1/25/2012 post:

imageI know I already posted a whole bunch on “why use federations” or “what are federation for” but most conversations on federations, I get the question on ‘why’? so I wanted to go back to basics and what the combination of SQL Azure (a.k.a PaaS database in the cloud) and Federations is a killer combination.

imageObviously federations can be used in many different ways; multi-tenancy for scaling to spikes and bursts or for gradually growing workloads. It is great for getting you over the capacity limitation of a single node in public cloud (typically a commodity machine) or the limitation of a single SQL Azure database like storage or computation capacity or simply transaction throughput with a single SQL azure database before you get throttled. But… But… But all that aside… The main reason in all these cases it is the amount of overcapacity you maintain. Mark Russinovich shows a similar chart in his talks and I’ll gladly borrow this for federations;

Imagine the isolated capacity you need at the database tier; Here is your capacity for the next 6-18 months and here is what you maintain as capacity on premise; You buy some more HW, fire it up and you get more cores, more memory and more IO capacity etc. You release new functionality that changes your workload, you get more customers, your customers data grows or whatever else that changes your workload over time, you push things to limits so you are under provisioned so you acquire some more HW and life goes on…

image

The above picture is also the representation of systems with static partitioning or sharding today on any system that offers no repartitioning operations. Lets say you start life with 20-30 partitions, you distribute and size things for the peak loads. Or if you are multi-tenant architecture already, you place 100 or 200 tenants per database or shard. But those tenants change and grow so these static decisions require some level of overprovisioning to be safe because repartitioning is offline and could be error prone every time.

With the cloud, the picture looks like the one below; You provision just in time and simply trace along the capacity line closely.

image

Federation is there for trimming overcapacity as well. You don’t need to make a static decision about how many tenants to put into a shard, you don’t need to decide how many shards you need for the web app up front for the next 3 months or the year. You can change your mind over time and Federations let you do repartitioning online without downtime so you don’t need to take down the app or the database. If it turns out some tenants grow and you cannot no longer fit 10000 tenants into 1 database and you need to go 5000 tenants per db… OR if you want to handle the black friday or the tax day or the end of month reporting and you need more capacity… OR if your service takes off and you acquire a whole bunch of customers… OR if you release a new version or a new functionality that changes the workload, you can prepare for it with federations. Kick off a SPLIT and it will engage more nodes. All online!


<Return to section navigation list>

MarketPlace DataMarket, Social Analytics and OData

Glen Gailey (@ggailey777) described a New and Improved T4 Template for OData Client and Local Database in a 1/25/2012 post:

imageIf you recall from my previous post Sync’ing OData to Local Storage in Windows Phone (Part 1), I had written a T4 template for my Windows Phone 7.5 (“Mango”) project to generate a proxy client needed to access both an OData service and local database on the device. My template was based on an existing T4 template,which was published in a blog post by Alexey Zakharov on Silverlight Show, that generated a generic OData proxy client. I had promised to publish my first stab a T4 template to generate this hybrid proxy. However, because my original template was based on Alexey’s OSS sample, it was taking a long time to get the go ahead to post it.

A New T4 Template for OData Clients

imageFortunately, the other day I heard about a new T4 template written by the OData team to generate an OData client proxy to access an OData v3 data service.

Perfect!

With this new Microsoft-developed template, I have been able to port my previous LINQ-to-SQL additions into a new template without too much work. And, I have now updated my previously published project Using Local Storage with OData on Windows Phone To Reduce Network Bandwidth to now include the actual T4 template. To use this project on your computer, follow the instructions in the main page.

Considerations for My New Hybrid T4 Template

Since I have posted this template to MSDN Samples Gallery Code under the Apache 2.0 license, I should probably mention a few caveats for your using this template:

  • This template requires the libraries that are part of WCF Data Services 1.0 (for OData v3), which you can install from the Microsoft WCF Data Services October 2011 CTP. In particular, it uses EDMLib to parse the .edmx metadata.
  • The original T4 template that I used as my starting point is a preview version that is published to Nuget.org. Since it’s a preview, I will need to port my updates into the final version, when it becomes available.
  • The original T4 template was designed to support the upcoming release of WCF Data Services 1.0, which includes new behaviors like collection properties. My template does not (yet) support collection properties because I have not yet figured out the best way to do this (I will probably have to end up serializing them to string values).
  • The original T4 template doesn’t yet include the data contract serialization attributes needed to support tombstoning on Window Phone, so I added those too in my version.
  • As before, my template supports complex type properties, but I’m not sure that it will handle nested complex types.
  • I’ve tested my template against the Netflix service (since that’s what my sample app consumes), which is the most complex public OData service that I have found. However, I haven’t tested it against a true OData v3 service.
  • You have to manually set the namespace and path variable to the generated .edmx file on your local machine (T4 doesn’t support Visual Studio macros).
Installing The Hybrid T4 Template into a New Project

In case you want to try out my T4 template in your own Windows Phone project, here’s how you would do it:

  1. Make sure that you have NuGet installed. You can install it from here: https://nuget.org/.
  2. If you haven’t already done so, use the Add Service Reference tool Visual Studio to add a reference to the OData service.
    (The template needs the service.edmx file generated by the tool).
  3. In your project, use the NuGet Package Manager Console to download and install the ODataT4-CS package:
    PM> Install-Package ODataT4-CS
  4. Remove the Reference.tt template and replace it with the ReferenceWithLocalDatabase.tt template from my sample.
  5. Open the ReferenceWithLocalDatabase.tt template file and change the value of the MetadataFilepath property in the TransformContext constructor to the location of the .edmx file generated by the service reference and update the Namespace property to a namespace that doesn’t collide with the one generated y the service reference.

Now, when you save the template file, VS should access the local .edmx file to generate a new proxy class in C#.

As I mentioned, I will post an update to my hybrid template after the final T4 template is released by the OData team.


<Return to section navigation list>

Windows Azure Access Control, Service Bus and Workflow

Brian Loesgen (@BrianLoesgen) reported a New Azure ServiceBus Demo Available in a 1/22/2012 post (missed when published):

imageI’m pleased to announce that I FINALLY have finished and packaged up a cool little ServiceBus demo.

I say “finally” because this demo has a long lifeline, it began over a year ago. I enhanced it, and showed it to a colleague, Tony Guidici, for his comments. He ended up enhancing it, and putting it into his Azure book. I then took it back, enhanced it further, and, well, here it is. Thanks also to my colleagues David Chou and Greg Oliver for their feedback.

imageThere are several resources associated with this demo:

Note that this is based on the current-when-I-did-this version 1.6 of the Azure SDK and .NET libraries.

At a high level, the scenario is that this is a system that listens for events, and when critical events occur, they are multicast to listeners/subscribers through the Azure ServiceBus. The listeners use the ServiceBus relay bindings, the subscribers use the topical pub/sub mechanism of the ServiceBus.

Why relay *and* subscription? They serve different models. For example, using the subscription model, a listener could subscribe to all messages, or just a subset based on a filter condition (in this demo, we have examples of both). All subscribers will get all messages. By contrast, a great example of the relay bindings is having a Web service deployed on-prem, and remoting that by exposing an endpoint on the ServiceBus. The ServiceBus recently introduced a load balancing feature, where you could have multiple instances of the same service running, but if a message is received only one of them is called.

Both models work very well for inter-application and B2B scenarios.

The moving parts in this particular demo look like this:

image

Subscriptions are shown above as ovals, the direct lines are relay bindings. The red lines are critical events, the black line is all events.

The projects in the solutions are:

image

Their purposes are:

Client

  • EventPoint.ConsoleApp: Listens for critical messages multicast through the ServiceBus relay binding
  • EventPoint.ConsoleApp.Topics: Listens for critical messages multicast through the ServiceBus eventpoint-topics namespace
  • EventPoint.Generator: Test harness, publishes messages to the ServiceBus eventpoint-topics namespace
  • EventPoint.Monitor: WinForms app that listens for critical messages multicast through the ServiceBus relay binding

Cloud

  • EventPoint.CriticalPersister: Listens for critical messages multicast through the ServiceBus relay binding and persists them to SQL Azure
  • EventPoint.Data: Message classes
  • EventPoint_WebRole: Table browser UI to see all events that have been persisted to Azure table storage
  • EventPoint_WorkerRole: Worker role that sets up eventpoint-topics subscriptions for 1) All events and 2) critical (priority 0) messages that get multicast to the ServiceBus relay

Common

  • EventPoint.Common: Config, message factory to support push notifications
  • Microsoft.Samples.ServiceBusMessaging: NuGet package to support push notifications

There are a few things you’ll need to do in order to get the demo working. Remarkably few things actually, considering the number of moving parts in the flow diagram!

First off, in the admin portal, you will need to create two ServiceBus namespaces:

image

NOTE THAT SERVICEBUS NAMESPACES MUST BE GLOBALLY UNIQUE. The ones shown above are ones I chose, if you want to run the code you will have to choose your own and cannot re-use mine (unless I delete them).

The “eventpoint-critical” namespace is used for the relay bindings, the “eventpoint-topics” is used for the pub/sub (apparently you cannot use the same namespace for both purposes, at least at the time this was written). You don’t have to use those names, but if you change them, you’ll need to change them in the config file too, so I’d suggest just leaving it this way.

Because there are multiple types of apps, ranging from Azure worker roles through console and winforms apps, I created a single shared static config class that is shared among the apps. You can, and need to, update the app.config file with your appropriate account information:

image

Note: there are more things you need to change that did not fit in the screen shot, they will be self-evident when you look at the App.Config file.

To get the ServiceBus issuer name and secret, you may need to scroll as it is bottom-most right-hand side of the ServiceBus page:

image

Lastly, you’ll need to add the name/creds of your storage account to the Web and worker roles.

When you run the app, five visible projects will start, plus a web role and a worker role running in the emulator.

In the screen shot below, I generated 5 random messages. Three of them were critical, and you can see they were picked up by the console apps and the WinForms app.

image

Just as with Windows Azure queues, the Azure ServiceBus is a powerful tool you can use to decouple parts of your application. I hope you find this demo helpful, and that it gives you new ideas about how you can use it in your own solutions.


<Return to section navigation list>

Windows Azure VM Role, Virtual Network, Traffic Manager, Connect, RDP and CDN

Michael Washam (@MWashamMS) announced a New Management API for Windows Azure Traffic Manager in a 1/26/2012 post:

imageAs you may have noticed in the Windows Azure developer portal, we recently released a new management API for Windows Azure Traffic Manager. The new API improves Traffic Manager by allowing developers and IT professionals to script interactions with the service and to interface with the service programmatically.

imageFor those of you who aren’t familiar with Windows Azure Traffic Manager, it gives you control of how traffic is distributed between hosted services in different datacenters. Traffic Manager increases the perceived performance of your application by sending customer traffic to the closest datacenter, and it improves reliability by not sending traffic to hosted services that are down.

With the release of the new API, developers now have full access to the management and creation of Traffic Manager policies, including the creation of a profile from scratch. In this post, we’ll walk through how to create, update, and manage profiles using the new API. Documentation for the new Traffic Manager REST APIs can be found here.
Let's look at a typical configuration in the portal to see how we could accomplish the same configuration using the APIs. Before we get started, if you are new to Windows Azure Traffic Manager, I highly recommend reading an overview of how the service works before continuing.
For reference, I have included a screenshot of a configuration I have setup that uses the performance load balancing method to distribute traffic between an application endpoint setup in the North Central US datacenter and the North Europe datacenter.
Figure 1: Traffic Manager Policy Page

The Edit Traffic Manager Policy dialog allows you to configure a Windows Azure Traffic Manager policy in one screen. Behind the scenes there are multiple API calls that create multiple entities on your behalf that represent this policy configuration that you as a developer will need to be aware of.
So how can you accomplish the same configuration programmatically?
Understanding the entities is the first step. The policy represented above consists of a profile with a domain name specified, at least one definition, which in turn consists of the following configuration: load balancing method, DNS TTL, endpoints and a monitoring configuration among other things.

Figure 2: Traffic Manager Entities

Each profile can have multiple definitions associated with it. However, only one definition can be active at a time. Creating multiple definitions is not currently exposed in the portal. It is entirely possible to define multiple distinct definitions and provide the ability to switch between them without rebuilding them.

Create Profile API

POST

https://management.core.windows.net/<subscription-id>/services/WATM/profiles

Figure 3: Create Profile Request Parameters

The Create Profile API requires you to specify a profile name and the Traffic Manager Domain name. The domain name consists of a DNS prefix (host name) and .trafficmanager.net. In the management portal there is not a location for a profile name; this is generated for you when you use the portal. This is not the case when you create the profile programmatically. How the profile name is generated is something important to understand as a developer. When you create a profile from the portal the name is generated by taking the hostname of the domain name you are specifying and appending -trafficmanager-net to it. For example if the domain name you specified was: woodgrove.trafficmanager.net the internal name of the profile would be woodgrove-trafficmanager-net. When creating a profile programmatically the profile name is whatever you pass into the Create Profile API.

Create Definition API

POST

https://management.core.windows.net/<subscription-id>/services/WATM/profiles/<profile-name>/definitions

Figure 4: Create Definition Request Parameters

Once a profile is created, you can then create a definition using the Create Definition API to specify the rest of your Windows Azure Traffic Manager configuration.
The definition configuration is not as complex as it looks. Defining the monitor consists of specifying the relative path to an HTTP/HTTPS resource that will tell Traffic Manager the health of your application via the returned status code. You may change the port, protocol and the relative path but the remaining settings have to be set to the default values.
Each endpoint consists of the URL to one of the Windows Azure applications that you want managed in the Windows Azure Traffic Manager and a flag indicating whether it is currently enabled or disabled.

The URL specified when creating the Traffic Manager profile (<dnsprefix>.trafficmanager.net) will be mapped to one of the specified endpoints when a DNS name is resolved. Which endpoint is resolved is based on the load balancing method specified (Performance, Failover or RoundRobin).

For Example:
WoodGroveUS.cloudapp.net could reside in the North Central data center.
WoodGroveEU.cloudapp.net could reside in the North Europe data center.

WoodGrove.trafficmanager.net would be the parent domain name that when resolved would be mapped to one of the data center endpoints.

Update Profile

PUT

https://management.core.windows.net/<subscription-id>/services/WATM/profiles/<profile-name>

Figure 5: Update Profile Request Parameters

There can be multiple definitions associated with a profile but only at most one can be active at a time. For a Traffic Manager profile to be active you must enable one of the definitions associated with the profile. You enable a definition by calling the Update Profile API passing in the version that was returned when you called the Create Definition API.

Managing Existing Profiles and Definitions
Beyond the core operations of creating a profile and its associated definitions, the Traffic Manager REST API also supports List Profiles, Get Profile, List Definitions, Get Definition and Delete Profile. These APIs provide full functionality for building an application to manage Windows Azure Traffic Manager configurations.
If you would like to automate the management of your Windows Azure Traffic Manager profiles but you do not want to write code against the REST API to do it we also have an answer for you. We have updated the Windows Azure PowerShell Cmdlets (now version 2.2) to have full support for the Windows Azure Traffic Manager.
Windows Azure Traffic Manager Cmdlets

  • New-TrafficManagerProfile
  • Get-TrafficManagerProfile
  • Remove-TrafficManagerProfile
  • Set-TrafficManagerProfile
  • Get-TrafficManagerDefinition
  • New-TrafficManagerDefinition
  • Add-TrafficManagerEndpoint
  • New-TrafficManagerEndpoint
  • Set-TrafficManagerEndpoint
  • Remove-TrafficManagerEndpoint
  • New-TrafficManagerMonitor

Here is an example of how you can use PowerShell to create a new profile and definition:


Windows Azure Traffic Manager is a key technology for enabling global and highly available applications. The new REST APIs will allow application developers to build applications that make the management of Traffic Manager a native part of their application. We have also opened the door for automating deployments to Windows Azure customers by exposing this functionality in the new release of the Windows Azure PowerShell Cmdlets 2.2.


<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Tim Huckaby interviewed Vishwas Lele in a 00:05:42 Bytes by MSDN: January 24 - Vishwas Lele video on 1/26/2012:

Join Tim Huckaby, Founder of InterKnowlogy and Actus Software, and Vishwas Lele, CTO of Applied Information Sciences, as they discuss the latest announcements around Windows Azure from the Build 2011 Conference. Vishwas is the king of Azure and has worked with many companies including ISV’s, point of sale systems, start-ups and more to migrate to Windows Azure. One new feature in Windows Azure announced at Build that Vishwas is excited about is, the geo-replication capability where you can geo-replicate one Windows Azure storage account in one data center to another data center. This is a great interview showing exciting times in the Azure world with many more changes to come!

About Vishwas
Vishwas Lele is an AIS Chief Technology Officer and is responsible for the company vision and execution of creating business solutions using .NET technologies. Vishwas brings over 20 years of experience and thought leadership to his position, and has been at AIS for 17 years. A noted industry speaker and author, Vishwas is the Microsoft Regional Director for the Washington, D.C. area.

About Tim
Tim Huckaby is focused on the Natural User Interface (NUI)- Touch, Gesture, and Neural, in Rich Client Technologies on a broad spectrum of devices

Tim has been called a "Pioneer of the Smart Client Revolution" by the press. Tim has been awarded many times for the highest rated technical presentations and keynotes for Microsoft and many other technology conferences around the world. Tim has been on stage with, and done numerous keynote demos for many Microsoft executives including Bill Gates and Steve Ballmer.

Tim founded InterKnowlogy, a custom application development company, in 1999 and Actus Interactive Software in 2011 and has over 30 years of experience including serving on a Microsoft product team as a development lead on an architecture team on a Server Product. Tim is a Microsoft Regional Director, a Microsoft MVP and serves on many Microsoft councils and boards like the Microsoft .NET Partner Advisory Council.


<Return to section navigation list>

Visual Studio LightSwitch and Entity Framework 4.1+

image_thumb1The Video Studio LightSwitch (@VSLightSwitch) team published a link to a List of 97 LightSwitch Controls, Exensions and Add-Ins on 1/26/2012. Here’s a screen capture of the first few:

 image


Satish Kumar wrote a Microsoft Visual Studio LightSwitch 2011 Review for Software News Daily on 1/25/2012:

image_thumb1Many organizations are constantly looking to address their business needs with flexible and scalable applications. In most cases, the time and resources necessary to build those applications are not always available.

Here’s the solution: Microsoft Visual Studio LightSwitch 2011 has the potential to meet these business needs.

Microsoft Visual Studio LightSwitch 2011 is a flexible development tool that helps developers to rapidly develop polished and fantastic business applications for the desktop as well as the Cloud.

With an intuitive development environment, timesaving tools and templates, Visual Studio LightSwitch helps speed the development process. It also reduces the complexity from user-interface design to Windows Azure cloud deployment. LightSwitch is essential in the development of affordable, scalable custom software solutions that connect with existing applications, legacy systems and web services. It automatically handles the routine code and lets you focus on developing the custom logic that makes your application unique. It facilitates comprehensive and user friendly views of your business data.

Key Features of LightSwitch 2011

  • LightSwitch 2011 supports exporting data to Microsoft Office Excel for easy reporting.
  • The asynchronous data loading routines help in building load-responsive applications.
  • The built-in authentication models provide all the users with varying degrees of authorization and accessibility.
  • Automatic generation of administration console.
  • A simple and innate way of setting user roles and permissions.
  • The pre-built components and templates of LightSwitch 2011 are absolutely extensible.
  • LightSwitch 2011 ships with a set of Application Shells that gives a feel of popular Microsoft software.
  • It consists of predefined data types for commonly used fields like email addresses and phone numbers.

Major Benefits

  • You can easily enhance the functionality of LightSwitch application by adding extensions from third party vendors.
  • It is very easy to collect, analyze and reuse the content from various sources such as Microsoft SQL Azure, Microsoft SQL Server, Microsoft SharePoint, Oracle and other databases.
  • As LightSwitch 2011 handles the code, you can create user friendly business applications.
  • It helps in building the applications with built-in paging, filtering and sorting capabilities. This makes it easier in handling huge amounts of data.
  • The development environment simplifies all the phases of the development by providing assistance as and when required.
  • You can create custom business logic and rules that are unique to a particular business and the users.
  • You can change the behavior and appearance of an application by just changing one setting of shell and theme extensions.
  • With LightSwitch 2011, you can build applications that can be deployed to desktop clients, browser clients or through the Cloud. You can choose the deployment method according to your requirements.

As you create a new LightSwitch project, the only decision you need to make is whether to use Visual C# or Visual Basic. The projects are logically 3-tier applications and follow n-tier best practices. They also utilize Entity Framework and RIA services.

Microsoft Visual Studio LightSwitch 2011 is in stock at SoftwareMedia.com!

Satish also includes a link to a 00:02:12 promotional video in his post.


Return to section navigation list>

Windows Azure Infrastructure and DevOps

Mary Jo Foley (@maryjofoley) asserted “Microsoft is moving steadily ahead with its plan to enable Linux to run on its Windows Azure cloud platform” in a deck for her Microsoft seeking open-source expert to help put Linux on Azure article for ZDNet’s All About Microsoft blog:

imageAs I blogged earlier this month, Microsoft is preparing to enable Linux to run on its Windows Azure cloud platform. A test build of the coming Linux virtual-machine capability is slated for March, according to my contacts.

For those still doubting this is on the Microsoft roadmap, I’ve got a new piece of evidence. A contact of mine provided me with a link to a Microsoft job posting for a software development engineer at Microsoft that calls for some serious Linux credentials.

imageThe job posting states quite plainly that the person the Server and Tools team is seeking will be charged with “Defin(ing) and scop(ing) open source projects designed to enable Linux on Microsoft’s virtualization and cloud platforms.” (Emphasis mine.)

Here is the pertinent part of the post:

SR Software Development Engineer (SDE) Job
Date: Jan 22, 2012
Location: Redmond, WA, US
Job Category: Software Engineering: Development
Location: Redmond, WA, US
Job ID: 764856-52821
Division: Server & Tools Business

Senior Software Development Engineer/Linux Virtualization

This position requires a proven track record in the open source community.

imageThe Windows Interoperability Team at Microsoft has an immediate opening for a senior software development engineer. The purpose of this position is to become a key member of a highly specialized development team whose mission is to identify, define, scope, implement and drive to completion software projects that promote full, transparent interoperability between Windows and Linux in Microsoft virtual and cloud environments.

The primary responsibilities for this position are the following:

Define and scope open source projects designed to enable Linux on Microsoft’s virtualization and cloud platforms

Work directly with the Linux kernel community to develop Linux device drivers and kernel technology to support Linux on Microsoft platforms

Work with Microsoft product groups to help ensure the design and implementation of Microsoft virtualization and cloud technology will support Linux architectures and runtime paradigms.

Mary Jo continues with the job qualifications and a report about a forthcoming IaaS workshop.


Gavin Clarke (@gavin_clarke) asserted “Cloud biz falls short of $80m revenue target” in a deck for his Microsoft's magic bullet for Azure: Red Hat Linux article of 1/26/2012 for The Register:

imageIf Microsoft loves money, and it does, then making Linux publicly available on its proprietary Azure cloud can't come soon enough.

Last June Microsoft ran a build of Linux on its Windows Azure compute fabric in the labs of the Server and Tools division, which is responsible for its cloud.

imageWhat flavour of Linux? Red Hat, sources close to the company now tell The Reg.

That's a critical pick given North Carolina's favourite brand of Linux continues to reign as the market's number-one distro and is a preferred choice for Windows shops when going Linux.

imageMicrosoft knows Red Hat is important: as much as it hurt his eyes, in 2005 Steve Ballmer presided over a demonstration of Microsoft's Virtual Server at a Microsoft Management Summit running Red Hat and managed through Operations Manager. This rapprochement came five years into a Redmond campaign to dismiss and vilify Linux with Ballmer saying his company had listened to customers who'd demanded better support for non-Windows machines.

imageMicrosoft now loves Linux when it's running as a virtualised instance on its gear.

By embracing Linux, Microsoft managed to contain the Penguin's once rapid advance in the server room and, according to IDC, Windows now accounts for nearly 50 per cent of server revenues compared to just under 20 per cent for Linux.

imageThe closed, controlled environment of the server room however is no longer Microsoft's big problem: it's the cloud.

We knew that several years into Windows Azure, Microsoft's cloud platform was struggling, only we didn't know by how much. Now we have some unofficial figures.

Sources tell us the revenue target for Windows Azure in Microsoft's current fiscal year, which started on 1 July 2011, is $80m - a relatively modest number for a company the size of Microsoft. Halfway in, it looks like the target will be missed and come in at $60m, The Reg has been told.

We asked Microsoft to comment on the numbers, but the company declined.

How that $80m figure compares

To give some perspective: Microsoft's Server and Tools division, which runs Azure, raked in an overall $4.7bn for the most recent quarter, up 11 per cent. Amazon, the game everybody wants to beat, in October reported $407m revenue for a business segment it calls "other". That segment contains money made from EC2 as the retailer doesn't break out cloud figures.

Amazon also doesn't release customer data, but does tell you how much data is pouring through its cloud: 566 billion objects by the end of 2011, almost double the number of 2010. To help contain that and grow, Amazon opened three data centres in 2011.

Microsoft's struggle towards cloud revenue is believable. In the last year or so, Microsoft's been tweaking and re-working Windows Azure pricing with the direction consistently towards cheaper at the low end as an on-ramp for new developers. Microsoft claims to have more than 10,000 Windows Azure customers; if that's correct then they are either paying tiny amounts of money for the service or paying nothing because Microsoft is giving it away to existing Windows shops.

Microsoft's been trying to emulate Amazon as a haven for developers of all languages and tools: it's made Azure friendly for Java and PHP in addition to .NET. It's used startups and internet companies as poster children to lure consumers and web entrepreneurs to Azure. …

Gavin continues with a “There's no business like Node.js business” section.


<Return to section navigation list>

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

image

No significant articles today.


<Return to section navigation list>

Cloud Security and Governance

No significant articles today.


<Return to section navigation list>

Cloud Computing Events

Brian Loesgen (@BrianLoesgen) reported new Azure Discovery Events in a 1/22/2012 post:

imageWe are going to be running another series of Windows Azure Discovery events in US West region.

Azure Discovery Events

imageWe would like to offer you the opportunity to attend the Windows Azure Platform Discover Event. A Discover Event will provide a business and technical overview of the Windows Azure Platform. The target audience for these events includes business decision makers, technical decision makers, architects, and development leads. The sessions are targeted at the 100-200 level with a mix of business focused information as well as technical information.

Date

Location

Time

Registration

January 31, 2012

Redmond, WA

9:00 AM – 1:00 PM

Register here

February 8, 2012

Boulder, CO

9:00 AM – 1:00 PM

Register here

February 27, 2012

Mountain View, CA

9:00 AM – 1:00 PM

Register here

March 1, 2012

Irvine, CA

9:00 AM – 1:00 PM

Register here

To register by phone, call: 1.877.MSEVENT (1.877.673.8368).


<Return to section navigation list>

Other Cloud Computing Platforms and Services

AT&T (@ATTBusiness) announced the availability of its new AT&T Cloud Architect Public, Private and Bare Metal Instances in Web page that appeared 1/26/2012:

AT&T Cloud Architect - Public Instance
Start here, scale there

Whether you need at-the-ready cloud resources for rapid deployment, extra compute capacity for unexpected workloads or a short-term testing and development platform without a long-term investment, consider a public instance from AT&T Cloud Architect.

It can be a great starting point for gaining basic cloud benefits and a scalable springboard into other cloud server solutions that meet more specialized computing needs.

A public instance from AT&T Cloud Architect lets you turn computing capacity up when you need it and down when you don’t via our online customer portal, where it’s also fast and easy to reconfigure and resize cloud servers on the fly. A monthly fee or pay-as-you go pricing makes this multi-tenant cloud solution both flexible and affordable. Start your cloud servers now.

AT&T Cloud Architect provides a scalable stepping stone into the cloud.

Standard Configuration Pricing

Monthly and Hourly plans include unlimited inbound and private network bandwidth. Monthly plans include 1000GB of outbound bandwidth per month ($0.10/GB charge for additional bandwidth). Hourly plans do not include any outbound bandwidth ($0.10/GB for all outbound bandwidth).

Local Storage Based

image

AT&T’s pricing is competitive, but I haven’t found SLA details so far.


Jeff Barr (@jeffbarr, pictured below) published AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB (Guest Post) as a guest post on 1/25/2012:

imageToday's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.

-- Jeff;


imageApache Hadoop and NoSQL databases are complementary technologies that together provide a powerful toolbox for managing, analyzing, and monetizing Big Data. That’s why we were so excited to provide out-of-the-box Amazon Elastic MapReduce (Amazon EMR) integration with Amazon DynamoDB, providing customers an integrated solution that eliminates the often prohibitive costs of administration, maintenance, and upfront hardware. Customers can now move vast amounts of data into and out of DynamoDB, as well as perform sophisticated analytics on that data, using EMR’s highly parallelized environment to distribute the work across the number of servers of their choice. Further, as EMR uses a SQL-based engine for Hadoop called Hive, you need only know basic SQL while we handle distributed application complexities such as estimating ideal data splits based on hash keys, pushing appropriate filters down to DynamoDB, and distributing tasks across all the instances in your EMR cluster.

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

We will also use sample product order data stored in S3 to demonstrate how you can keep current data in DynamoDB while storing older, less frequently accessed data, in S3. By exporting your rarely used data to Amazon S3 you can reduce your storage costs while preserving low latency access required for high velocity data. Further, exported data in S3 is still directly queryable via EMR (and you can even join your exported tables with current DynamoDB tables).

The sample order data uses the schema below. This includes Order ID as its primary key, a Customer ID field, an Order Date stored as the number of seconds since epoch, and Total representing the total amount spent by the customer on that order. The data also has folder-based partitioning by both year and month, and you’ll see why in a bit.

Creating a DynamoDB Table
Let’s create a DynamoDB table for the month of January, 2012 named Orders-2012-01. We will specify Order ID as the Primary Key. By using a table for each month, it is much easier to export data and delete tables over time when they no longer require low latency access.

As no other applications will be using our DynamoDB table, let’s tell EMR to attempt to use 100% of the available read throughput (by default it tries to use 50%). Keep in mind that this is a best effort attempt and not a guarantee for throughput usage. You should also note that this setting can adversely affect the performance of other applications that are simultaneously using your DynamoDB table and should be set cautiously.

Launching an EMR Cluster
Please follow Steps 1-3 in the EMR for DynamoDB section of the Elastic MapReduce Developer Guide to launch an interactive EMR cluster and SSH to its Master Node to begin submitting SQL-based queries. Note that we recommend you use at least three instances of m1.large size for this sample.

At the hadoop command prompt for the current master node, type hive. You should see a hive prompt: hive>

As no other applications will be using our DynamoDB table, let’s tell EMR to use 100% of the available read throughput (by default it will use 50%). Note that this can adversely affect the performance of other applications simultaneously using your DynamoDB table and should be set cautiously.

SET dynamodb.throughput.read.percent=1.0;

Creating Hive Tables
Outside data sources are referenced in your Hive cluster by creating an EXTERNAL TABLE. First let’s create an EXTERNAL TABLE for the exported order data in S3. Note that this simply creates a reference to the data, no data is yet moved.

CREATE EXTERNAL TABLE orders_s3_export ( order_id string, customer_id string, order_date int, total double )
PARTITIONED BY (year string, month string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://elastic-mapreduce/samples/ddb-orders' ;

You can see that we specified the data location, the ordered data fields, and the folder-based partitioning scheme.

Now let’s create an EXTERNAL TABLE for our DynamoDB table.

CREATE EXTERNAL TABLE orders_ddb_2012_01 ( order_id string, customer_id string, order_date bigint, total double )
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES (
"dynamodb.table.name" = "Orders-2012-01",
"dynamodb.column.mapping" = "order_id:Order ID,customer_id:Customer ID,order_date:Order Date,total:Total"
);

This is a bit more complex. We need to specify the DynamoDB table name, the DynamoDB storage handler, the ordered fields, and a mapping between the EXTERNAL TABLE fields (which can’t include spaces) and the actual DynamoDB fields.

Now we’re ready to start moving some data!

Importing Data into DynamoDB
In order to access the data in our S3 EXTERNAL TABLE, we first need to specify which partitions we want in our working set via the ADD PARTITION command. Let’s start with the data for January 2012.

ALTER TABLE orders_s3_export ADD PARTITION (year='2012', month='01') ;

Now if we query our S3 EXTERNAL TABLE, only this partition will be included in the results. Let’s load all of the January 2012 order data into our external DynamoDB Table. Note that this may take several minutes.

INSERT OVERWRITE TABLE orders_ddb_2012_01
SELECT order_id, customer_id, order_date, total
FROM orders_s3_export ;

Looks a lot like standard SQL, doesn’t it?

Querying Data in DynamoDB Using SQL
Now let’s find the top 5 customers by spend over the first week of January. Note the use of unix-timestamp as order_date is stored as the number of seconds since epoch.

SELECT customer_id, sum(total) spend, count(*) order_count
FROM orders_ddb_2012_01
WHERE order_date >= unix_timestamp('2012-01-01', 'yyyy-MM-dd')
AND order_date < unix_timestamp('2012-01-08', 'yyyy-MM-dd')
GROUP BY customer_id
ORDER BY spend desc
LIMIT 5 ;

Querying Exported Data in S3
It looks like customer: ‘c-2cC5fF1bB’ was the biggest spender for that week. Now let’s query our historical data in S3 to see what that customer spent in each of the final 6 months of 2011. Though first we will have to include the additional data into our working set. The RECOVER PARTITIONS command makes it easy to

ALTER TABLE orders_s3_export RECOVER PARTITIONS;

We will now query the 2011 exported data for customer ‘c-2cC5fF1bB’ from S3. Note that the partition fields, both month and year, can be used in your Hive query.

SELECT year, month, customer_id, sum(total) spend, count(*) order_count
FROM orders_s3_export
WHERE customer_id = 'c-2cC5fF1bB'
AND month >= 6
AND year = 2011
GROUP BY customer_id, year, month
ORDER by month desc;

Exporting Data to S3
Now let’s export the January 2012 DynamoDB table data to a different S3 bucket owned by you. We’ll first need to create an EXTERNAL TABLE for that S3 bucket. Note that we again partition the data by year and month.

CREATE EXTERNAL TABLE orders_s3_new_export ( order_id string, customer_id string, order_date int, total double )
PARTITIONED BY (year string, month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://';

Now export the data from DynamoDB to S3, specifying the appropriate partition values for that table’s month and year.

INSERT OVERWRITE TABLE orders_s3_new_export
PARTITION (year='2012', month='01')
SELECT * from orders_ddb_2012_01;

Note that if this was the end of a month and you no longer needed low latency access to that table’s data, you could also delete the table in DynamoDB. You may also now want to terminate your job flow from the EMR console to ensure you do not continue being charged.

That’s it for now. Please visit our documentation for more examples, including how to specify the format and compression scheme for your exported files.

-- Adam Gray, Product Manager, Amazon Elastic MapReduce.


<Return to section navigation list>

by Roger Jennings (--rj) (noreply@blogger.com) at January 27, 2012 07:55 AM

Amazon Web Services

Gravatar

New Tagging for Auto Scaling Groups

You can now add up to 10 tags to any of your Auto Scaling Groups. You can also, if you'd like, propagate the tags to the EC2 instances launched from your groups.

Adding tags to your Auto Scaling groups will make it easier for you to identify and distinguish them.

Each tag has a name, a value, and an optional propagation flag. If the flag is set, then the corresponding tag will be applied to EC2 instances launched from the group. You can use this feature to label or distinguish instances created by distinct Auto Scaling groups. You might be using multiple groups to support multiple scalable applications, or multiple scalable tiers or components of a single application. Either, way the tags can help you to keep your instances straight.

Read more in the newest version of the Auto Scaling Developer Guide.

-- Jeff;

by AWS Evangelist at January 27, 2012 03:29 AM

Cloud Musings (Kevin L Jackson)

Gravatar

NJVC® and Virtual Global Announce Release of PaaS White Paper: Paper Clarifies the Confusion Surrounding PaaS for Federal IT Buyers—Why It Is Important and How It Can Cut Development Costs by 50 Percent

VIENNA, Va., Jan. 23, 2012 —NJVC®, one of the largest information technology solutions providers supporting the U.S. Department of Defense, and Virtual Global, a premier provider of software and cloud computing platform solutions for a variety of industry and federal customers, announce the release of a joint white paper, “Platform as a Service (Paas): What Is It? Why Is It So Important?.” 
 
The paper clarifies the confusion surrounding PaaS for IT decision makers in the federal government. The National Institutes of Standards and Technology (NIST) suggests PaaS as a component of the Federal Cloud Computing Reference Architecture, but one major challenge exists: Most buyers do not understand what PaaS is, why it is important and how it can help federal agencies cut development costs by more than 50 percent.

Federal Chief Information Officer Steven VanRoekel said that platform as a service is the next major value set for federal cloud computing, and it also aligns closely with his Shared Services initiative to knock down stovepipe software and save money,” said Kevin Jackson, co-author and general manager, NJVC cloud services. “I hope that this whitepaper will help raise awareness of its importance in the federal marketplace.”
Many PaaS vendors require their customers to make long-term commitments to proprietary infrastructures. Some early adopters of PaaS unknowingly have already made casual, long-term commitments to infrastructure providers. “It's somewhat like buying gum at the counter, but needing to rent the store for 10 years,” said Cary Landis, co-author and Virtual Global senior platform architect and founder. “That is why NIST is stressing the importance of openness and portability. IT buyers must understand PaaS to make the right decisions early.”

PaaS makes it possible for software developers to participate in the cloud. Until recently, the cloud has been dominated by big email and infrastructure providers selling commodity services.  “PaaS changes the landscape—it opens the playing field to hundreds of thousands of software developers and integrators, giving them a way to actively participate,” according to Jackson. “Whereas the first wave of cloud computing was about consolidating data centers, the PaaS wave is about consolidating applications. It will be a more complex ride, but the savings will be greater,” Landis said.

Download the white paper at no cost at http://www.slideshare.net/kvjacksn/njvcvirtual-global-paas-white-paper. NJVC and Virtual Global are team members on the GovCloud™ initiative.


###

About NJVC

With a focus on information technology automation, NJVC specializes in supporting highly secure, complex IT enterprises in mission-critical environments, particularly for the intelligence and defense communities. We offer a wide breadth of IT and strategic solutions to our customers, ranging from strategic consulting to managed flexible services in five business areas:  Cloud Services, Cyber Security, Data Center Services, IT Services and Print Solutions.  Our global workforce includes dedicated and talented employees with 94 percent holding security clearances located at more than 170 customer sites. We partner with our customers to support their missions. To learn more, visit www.njvc.com.


About Virtual Global

Virtual Global is a premier provider of software and cloud computing platform solutions for a variety of industry and federal clients. The SaaS Maker™ family of platform products is open and modular, so that you can integrate with existing open source, legacy and 3rd-party web services. It is also portable across data centers. http://www.virtualglobal.com


Contact

Michelle Snyder, NJVC, 703.893.7609, michelle.snyder@njvc.com
Audra Capas, 5StarPR, 703.437.9301, audra@5starpr.com


Bookmark and Share
Cloud Musings on Forbes
( Thank you. If you enjoyed this article, get free updates by email or RSS - KLJ )


by noreply@blogger.com (Kevin L. Jackson) at January 27, 2012 02:51 AM

ReadWriteCloud

Gravatar

Cloud Roundup for January 26, 2012

bitnami-cloud-icon.jpgOn tap for today, we've got a new jQuery Mobile release, a look at Tendril Connect, and the latest BitNami Stack for Ruby on Rails.

jQuery Mobile 1.0.1 Released – The jQuery Mobile folks have pushed 1.0.1 out the door. This fixes a bunch of issues and adds Samsung's Bada platform and Dolphin browser to the "officially supported" list. See the post for a full list of supported platforms and their "grades." If you're using iOS, Android and newer BlackBerry devices you should be fine.

Sponsor

Tendril courting developers for its cloud-delivered energy app platform – Tom Raftery takes a look at Tendril Connect. "The idea is to allow developers to build on Tendril's cloud platform and to deploy the developed applications on Tendril's Tendril Connect cloud platform. For developers this is an opportunity to develop applications addressing the energy challenge and have them deployed in a ready-made marketplace of up-to 70 million addressable households."

Smooth Scaling with Stackato and vSphere – Explaining how to run ActiveState's Stackato on vSphere.

Launch Relational Database Service Instances in the Virtual Private Cloud – Amazon has set it up so you can use their Relational Database Service (RDS) with their Virtual Private Cloud (VPC). Works in all regions, except AWS GovCloud.

New RubyStack upgraded to Rails 3.2.0 – BitNami has upgraded its RubyStack to Rails 3.2.0. It now includes Ruby 1.9.3-p0, SQLite 3.7.3, and Nginx 1.0.10.

Have a cloud news tip for me? Drop me a note at jzb@readwriteweb.com or to @jzb on Twitter.

Discuss

by Joe Brockmeier at January 27, 2012 12:30 AM

January 26, 2012

Cristofer Hoff

Gravatar

With Cloud, The PaaSibilities Are Endless…

I read a very interesting article from ZDNet UK this morning titled “Amazon Cuts Off Stack at the PaaS

The gist of the article is that according to Werner Vogels (@werner,) AWS’ CTO, they have no intention of delivering a PaaS service and instead expect to allow an ecosystem of PaaS providers, not unlike Heroku, to flourish atop their platform:

“We want 1,000 platforms to bloom,” said Vogels, before explaining Amazon has “no desire to go and really build a [PaaS].”

That’s all well and good, but it lead me to scratch my head, especially with regard to what I *thought* AWS already offered in terms of PaaS with BeanStalk, which is described thusly in their FAQ:

Q: What is AWS Elastic Beanstalk?
AWS Elastic Beanstalk makes it even easier for developers to quickly deploy and manage applications in the AWS Cloud. Developers simply upload their application, and Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring.
Q: How is AWS Elastic Beanstalk different from existing application containers or platform-as-a-service solutions?
Most existing application containers or platform-as-a-service solutions, while reducing the amount of programming required, significantly diminish developers’ flexibility and control. Developers are forced to live with all the decisions pre-determined by the vendor – with little to no opportunity to take back control over various parts of their application’s infrastructure. However, with AWS Elastic Beanstalk, developers retain full control over the AWS resources powering their application. If developers decide they want to manage some (or all) of the elements of their infrastructure, they can do so seamlessly by using AWS Elastic Beanstalk’s management capabilities.
While these snippets from the FAQ certainly seem to describe infrastructure components that enable PaaS (meta-PaaS?) when you combine the other elements of AWS’ offerings, it sure as heck sounds like PaaS regardless of what you call it.
In fact, a Twitter exchange with @GeorgeReese, @krishnan and @jamessaull well summarized the headscratching:
With all those components, AWS can certainly enable PaaS platforms like Heroku to “flourish.” 
However, suggesting that despite having all the raw components and not pointing to it and saying “PaaS” is like having all the components to assemble a bomb, not package it as such, and declaring it’s not dangerous because in that state it won’t go off.
I’d say the potential for going BOOM! are real.  It appears Marten Mickos was hinting at the same thing:

However, Mickos disputed Vogels’ claim that Amazon is going to let a thousand platforms bloom.

“He will always say that, and Amazon will slowly take a step higher and higher,” he said, before pointing to Beanstalk as an example. “[But] in my view PaaS has middleware components… and I could agree that it is okay to add [those] to an IaaS.”

In the long term, as I’ve stated prior, the value in platforms will be in how easy they make it for developers to create and deliver applications fluidly.

I may not be as good at marketing as some, but that sounds less like an infrastructure-centric business model and much more like an application-centric one.

Moving on up is where it’s at.  I saw the scratching on the cave walls when I wrote “Silent Lucidity: IaaS — Already A Dinosaur. The Evolution of PaaSasarus Rex” back in 2009.

What do you think?  Is AWS being coy?

Enhanced by Zemanta

Share

by beaker at January 26, 2012 09:46 PM

SearchCloudComputing (Carl Brooks)

Gravatar

The rise of the non-Amazon IaaS cloud provider

Amazon isn?t the only vendor in the IaaS universe. With several other large vendors out there and niche providers cropping up, enterprises have choices.

Add to digg Add to StumbleUpon Add to del.icio.us Add to Google

by Dan Sullivan, Contributor(editor@searchcloudcomputing.com at January 26, 2012 07:26 PM

CloudHarmony

Gravatar

An unofficial EC2 outage postmortem - the sky is not falling

Last week Amazon Web Services (AWS) experienced a high profile outage affecting Elastic Cloud Compute (EC2) and Elastic Block Storage (EBS) in 1 of 4 data centers in the US East region. This outage caused some high profile website outages including Reddit, Quora and FourSquare and scores of negative PR. In the proceeding days media outlets and bloggers have written literally hundreds of articles such as Amazon's Trouble Raises Cloud Computing Doubts (New York Times), The Day The Cloud Died (Forbes), Amazon outage sparks frustration, doubts about cloud (Computerworld), and many others.

EC2 and EBS in a nutshell

In case you are not familiar with the technical jargon and acronyms, EBS is one of two methods provided by AWS for setting up an EC2 instance (an EC2 instance is essentially a server) storage volumes (basically a cloud hard drive). Unlike a traditional hard drive that is located physically inside of a computer, EBS is stored externally on dedicated storage boxes and connected to EC2 instances over a network. The second storage option provided by EC2 is called ephemeral, which uses this more traditional method of hard drives located physically inside the same hardware that an EC2 instance runs on. Using EBS is encouraged by AWS and provides some unique benefits not available with ephemeral storage. One such benefit is the ability to recover quickly from a host failure (a host is the hardware that an EC2 instance runs on). If the host fails for an EBS EC2 instance, it can quickly be restarted on another host because its storage does not reside on the failed host. On the contrary, if the host fails for an ephemeral EC2 instance, that instance and all of the data stored on it will be permanently lost. EBS instances can also be shutdown temporarily and restarted later, whereas ephemeral instances are deleted if shut down. EBS also theoretically provides better performance and reliability when compared to ephemeral storage.

Other technical terms you may hear and should understand regarding EC2 are virtualization and multi-tenancy. Virtualization allows AWS to run multiple EC2 instances on a single physical host by creating simulated "virtual" hardware environments for each instance. Without virtualization, AWS would have to maintain a 1-to-1 ratio between EC2 instance and physical hardware, and the economics just wouldn't work. Multi-tenancy is a consequence of virtualization in that multiple EC2 instances share access to physical hardware. Multi-tenancy often causes performance degradation in virtualized environments because instances may need to wait briefly to obtain access to physical resources like CPU, hard disk or network. The term noisy neighbor is often used to describe this scenario in very busy environments where virtual instances are waiting frequently for physical resources causing noticeable declines in performance.

EC2 is generally a very reliable service. Without a strong track record high profile websites like Netflix would not use it. We conduct ongoing independent outage monitoring of over 100 cloud services which shows 3 of the 5 AWS EC2 regions having 100% availability the past year. In fact, our own EBS backed EC2 instance in the affected US East region remained online throughout last week's outage.

AWS endorses a different type of architectural philosophy called designing for failure. In this context, instead of deploying highly redundant and fault tolerant (and very expensive) "enterprise" hardware, AWS uses low cost commodity hardware and designs their infrastructure to expect and deal gracefully with failure. AWS deals with failure using replication. For example, each EBS volume is stored on 2 separate storage arrays. In theory, if one storage array fails, its' volumes are quickly replaced with the backup copies. This approach provides many of the benefits of enterprise hardware, such as fault tolerance and resiliency, while at the same time providing substantially lower hardware costs enabling AWS to price their services competitively.

The outage - what went wrong?

Disclaimer: This is our own opinion of what occurred during last week's EC2 outage based on our interpretation of the comments provided on the AWS Service Health Dashboard and basic knowledge of the EC2/EBS architecture.

At about 1AM PST on Thursday April 21st, one of the four availability zones in the AWS US East region experienced a network fault that caused connectivity failures between EC2 instances and EBS. This event triggered a failover sequence wherein EC2 automatically swapped out the EBS volumes that had lost connectivity with backup copies. At the same time, EC2 attempted to create new backup copies of all of the affected EBS volumes (they refer to this as "re-mirroring"). While this procedure works fine for a few isolated EBS failures, this event was more widespread which created a very high load on the EBS infrastructure and the network that connects it to EC2. To make matters worse, some AWS users likely noticed problems and began attempting to restore their failed or poorly performing EBS volumes on their own. All of this activity appears to have caused a meltdown of the network connecting EC2 to EBS and exhausted the available EBS physical storage in this availability zone. Because EBS performance is dependent on network latency and throughput to EC2, and because those networks were saturated with activity, EBS performance became severely degraded, or in many cases completely failed. These issues likely bled into other availability zones in the region as users attempted to recover their services by launching new EBS volumes and EC2 instances in those availability zones. Overall, a very bad day for AWS and EC2.

The sky is not falling

Despite what some media outlets, bloggers and AWS competitors are claiming, we do not believe this event is reason to question the viability AWS, external instance storage, or the cloud in general. AWS has stated they will evaluate closely the events that triggered this outage, and apply appropriate remedies. The end result will be a more robust and battle hardened EBS architecture. For users of AWS affected by this outage, this should be cause to re-evaluate their cloud architecture. There are many techniques suggested by AWS and prominent AWS users that will help to deal with these types of outages in the future without incurring significant downtime. These include deploying load balanced servers across multiple availability zones and using more than one AWS region.

Netflix is a large and very visible client of AWS that was not affected by this outage. The reason for this is that they have learned to design for failure. In a recent blog post, Adrian Cockroft (Netflix's Cloud Architect), wrote about some of the technical details and shortcomings of EBS. At a high level, the take away points from his post are:

  • EC2, EBS and the network that attach them are all shared resources. As such, performance will vary significantly depending on multi-tenancy and shared load. Performance variance will be greater on smaller EC2 instances and EBS volumes where multi-tenancy is a greater factor
  • Users can reduce the potential affects of multi-tenancy by using larger EC2 instances and EBS volumes. To reduce EBS mulit-tenancy, Netflix uses the largest possible volume size, 1TB. Because each EBS storage array has a limited amount of storage capacity, using larger sized volumes reduces the number of other users that may share that hardware. The same is true of larger EC2 instances. In fact, the largest EC2 instances (any of the 4xlarges) run on dedicated hardware. Because each physical EC2 host has one shared network interface, use of larger EBS volumes and EC2 instances also has the added benefit of increased network throughput
  • Use ephemeral storage on EC2 instances where predictable and consistent performance is necessary. Netflix uses ephemeral storage for their Cassandra datastore and has found it to be more consistently reliable compared to EBS

Too early to throw in the towel

AWS is not alone in experiencing performance and reliability issues with external storage. Based on our independent monitoring Visi, GigeNet, Tata InstaCompute, Flexiscale, Ninefold and VPS.NET have all experienced similar outages. Our monitoring shows that external storage failures are a very significant cause of cloud outages. When external storage systems fail, vendors often have a very difficult time recovering quickly. Designing fault tolerant and performant external storage for the cloud is a very complex problem, so much so that many vendors including Rackspace Cloud and Joyent avoid it entirely. Joyent for example, recently documented their unsuccessful attempt to deploy external storage in their cloud service. However, despite the complexity of this problem, we believe it is far too early for cloud vendors and users to throw in the towel. There are significant advantages to external storage versus ephemeral including:

  • Host failure tolerance: If the power supply, motherboard, or any component of a host system fails, the instances running on it can be quickly migrated to another host
  • Shutdown capability: With most providers, external storage instances can be shutdown temporarily and then incur only storage fees
  • Greater flexibility: External storage offers features and flexibility generally unavailable with ephemeral storage. These may include the ability to backup volumes, create snapshots, clone, create custom OS templates, resize partitions and attach multiple storage volumes to a single instance

Innovation in external storage

Besides AWS, there are other providers innovating in the external storage space. OrionVM, a cloud startup in Australia, has developed their own distributed, horizontally scalable, external storage architecture based on a high performance communication link called Infiniband. Instead of using dedicated storage hardware, OrionVM uses the same hardware for both storage and server instances. The server instances use storage located on multiple external hosts connected to it via redundant 40 Gb/s InfiniBand links. If a physical host fails, the instances running on it can be restored on another host because their storage resides externally. OrionVM also replicates storage across multiple host systems allowing for fault tolerance should a storage host fail. This hybrid approach combines the benefits of ephemeral storage (i.e. lower multi-tenancy ratio, faster IO throughput) with those of external storage (i.e. host failure tolerance). Multi-tenancy performance degradation is also not a significant factor because OrionVM uses a distributed, non-centralized storage architecture. This approach scales well horizontally because adding a new host increases both instance and storage capacity. Use of 40 Gb/s Infiniband also provides very high instance to storage throughput. Our own benchmarking shows very good IO performance with OrionVM. Complete results for these benchmarks are available on our website. A summary is provided below comparing OrionVM to both external and ephemeral instances with EC2, GoGrid, Joyent, Rackspace and SoftLayer. In these results, OrionVM performed very well as did EC2's cluster compute instance using ephemeral or EBS raid 0 volumes. GoGrid also performed well running on their new Westmere hardware and ephemeral storage. Details on the IO metric are available here. We are including these benchmark results to demonstrate that external storage can perform as well or better than ephemeral storage.

Legend

LabelStorage TypeDescription
ec2-us-east.cc1.4xlarge-raid0-localEphemeralEC2 cluster instance cc1.4xlarge, Raid 0, 2 ephemeral volumes
ec2-us-east.cc1.4xlarge-raid0x4-ebsExternalEC2 cluster instance cc1.4xlarge, Raid 0, 4 EBS volumes
ec2-us-east.cc1.4xlarge-localEphemeralEC2 cluster instance cc1.4xlarge, single ephemeral volume
gg-16gb-us-eastEphemeral16GB GoGrid instance
or-16gbExternal16GB OrionVM instance
jy-16gb-linuxEphemeral16GB Joyent Linux Virtual Machine
ec2-us-east.cc1.4xlargeExternalEC2 cluster instance cc1.4xlarge, single EBS volume
ec2-us-east.m2.4xlarge-raid0x4-ebsExternalEC2 high memory instance m2.4xlarge, Raid 0, 4 EBS volumes
rs-16gbEphemeral16GB Rackspace Cloud instance
ec2-us-east.m2.4xlargeExternalEC2 high memory instance m2.4xlarge, single EBS volume
sl-16gb-wdcExternal16GB SoftLayer CloudLayer instance

Summary

Last week's EBS outage has shed some light on what we consider to be one of the biggest cruces of the cloud, the problem of external storage. However, we see this event more in terms of the glass half full. First, we believe that AWS will thoroughly dissect this outage and use it to improve the fault tolerance and reliability of EBS in the future. Next, cloud users affected by this outage will re-evaluate their own cloud architecture and adopt a more failure tolerant approach. Finally, we hope that AWS and other vendors like OrionVM will continue to innovate in the external storage space.

 

by CloudHarmony.com (noreply@blogger.com) at January 26, 2012 07:15 AM

ReadWriteCloud

Gravatar

Cloud Roundup for January 25, 2012

suse.jpgFireHost is expanding and offering European services, Dell is letting its customers have Linux their way, and EnterpriseDB wants to "cloudify" PostgreSQL.

FireHost's European-Based Secure Cloud Hosting Services Go Live – FireHost has announced an expansion into Europe, with services through data centers in London and Amsterdam.

Sponsor

Microsoft's plan for Hadoop and big data – Edd Dumbill looks into Microsoft's plan for Hadoop. "One of the most interesting features of Microsoft's work with Hadoop is the addition of a JavaScript API. Working with Hadoop at a programmatic level can be tedious: this is why higher-level languages such as Pig emerged."

Dell OEM Solutions Makes Available SUSE Linux Enterprise Server to Customers – Now it's possible to get Dell servers preloaded with images built using SUSE Studio. Would be interesting to know just how many SUSE Studio images are actually being used in production. It's a very slick tool for creating custom Linux distributions based on SUSE or openSUSE.

Postgres Plus Cloud Database – EnterpriseDB is launching a "cloudified" version of PostgreSQL that runs on top of Amazon EC2. The company is offering stock PostgreSQL or its "Postgres Plus Advanced Server with Oracle database compatibility features." Pricing starts at $0.11 an hour on EC2.

Have a cloud news tip for me? Drop me a note at jzb@readwriteweb.com or to @jzb on Twitter.

Discuss

by Joe Brockmeier at January 26, 2012 01:45 AM

Amazon Web Services

Gravatar

AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB (Guest Post)

Today's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.

-- Jeff;


Apache Hadoop and NoSQL databases are complementary technologies that together provide a powerful toolbox for managing, analyzing, and monetizing Big Data. That’s why we were so excited to provide out-of-the-box Amazon Elastic MapReduce (Amazon EMR) integration with Amazon DynamoDB, providing customers an integrated solution that eliminates the often prohibitive costs of administration, maintenance, and upfront hardware. Customers can now move vast amounts of data into and out of DynamoDB, as well as perform sophisticated analytics on that data, using EMR’s highly parallelized environment to distribute the work across the number of servers of their choice. Further, as EMR uses a SQL-based engine for Hadoop called Hive, you need only know basic SQL while we handle distributed application complexities such as estimating ideal data splits based on hash keys, pushing appropriate filters down to DynamoDB, and distributing tasks across all the instances in your EMR cluster.

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

We will also use sample product order data stored in S3 to demonstrate how you can keep current data in DynamoDB while storing older, less frequently accessed data, in S3. By exporting your rarely used data to Amazon S3 you can reduce your storage costs while preserving low latency access required for high velocity data. Further, exported data in S3 is still directly queryable via EMR (and you can even join your exported tables with current DynamoDB tables).

The sample order data uses the schema below. This includes Order ID as its primary key, a Customer ID field, an Order Date stored as the number of seconds since epoch, and Total representing the total amount spent by the customer on that order. The data also has folder-based partitioning by both year and month, and you’ll see why in a bit.

Creating a DynamoDB Table
Let’s create a DynamoDB table for the month of January, 2012 named Orders-2012-01. We will specify Order ID as the Primary Key. By using a table for each month, it is much easier to export data and delete tables over time when they no longer require low latency access.

For this sample, a read capacity and a write capacity of 100 units should be more than sufficient. When setting these values you should keep in mind that the larger the EMR cluster the more capacity it will be able to take advantage of. Further, you will be sharing this capacity with any other applications utilizing your DynamoDB table.”

Launching an EMR Cluster
Please follow Steps 1-3 in the EMR for DynamoDB section of the Elastic MapReduce Developer Guide to launch an interactive EMR cluster and SSH to its Master Node to begin submitting SQL-based queries. Note that we recommend you use at least three instances of m1.large size for this sample.

At the hadoop command prompt for the current master node, type hive. You should see a hive prompt: hive>

As no other applications will be using our DynamoDB table, let’s tell EMR to use 100% of the available read throughput (by default it will use 50%). Note that this can adversely affect the performance of other applications simultaneously using your DynamoDB table and should be set cautiously.

SET dynamodb.throughput.read.percent=1.0;

Creating Hive Tables
Outside data sources are referenced in your Hive cluster by creating an EXTERNAL TABLE. First let’s create an EXTERNAL TABLE for the exported order data in S3. Note that this simply creates a reference to the data, no data is yet moved.

CREATE EXTERNAL TABLE orders_s3_export ( order_id string, customer_id string, order_date int, total double )
PARTITIONED BY (year string, month string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://elastic-mapreduce/samples/ddb-orders' ;

You can see that we specified the data location, the ordered data fields, and the folder-based partitioning scheme.

Now let’s create an EXTERNAL TABLE for our DynamoDB table.

CREATE EXTERNAL TABLE orders_ddb_2012_01 ( order_id string, customer_id string, order_date bigint, total double )
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES (
"dynamodb.table.name" = "Orders-2012-01",
"dynamodb.column.mapping" = "order_id:Order ID,customer_id:Customer ID,order_date:Order Date,total:Total"
);

This is a bit more complex. We need to specify the DynamoDB table name, the DynamoDB storage handler, the ordered fields, and a mapping between the EXTERNAL TABLE fields (which can’t include spaces) and the actual DynamoDB fields.

Now we’re ready to start moving some data!

Importing Data into DynamoDB
In order to access the data in our S3 EXTERNAL TABLE, we first need to specify which partitions we want in our working set via the ADD PARTITION command. Let’s start with the data for January 2012.

ALTER TABLE orders_s3_export ADD PARTITION (year='2012', month='01') ;

Now if we query our S3 EXTERNAL TABLE, only this partition will be included in the results. Let’s load all of the January 2012 order data into our external DynamoDB Table. Note that this may take several minutes.

INSERT OVERWRITE TABLE orders_ddb_2012_01
SELECT order_id, customer_id, order_date, total
FROM orders_s3_export ;

Looks a lot like standard SQL, doesn’t it?

Querying Data in DynamoDB Using SQL
Now let’s find the top 5 customers by spend over the first week of January. Note the use of unix-timestamp as order_date is stored as the number of seconds since epoch.

SELECT customer_id, sum(total) spend, count(*) order_count
FROM orders_ddb_2012_01
WHERE order_date >= unix_timestamp('2012-01-01', 'yyyy-MM-dd')
AND order_date < unix_timestamp('2012-01-08', 'yyyy-MM-dd')
GROUP BY customer_id
ORDER BY spend desc
LIMIT 5 ;

Querying Exported Data in S3
It looks like customer: ‘c-2cC5fF1bB’ was the biggest spender for that week. Now let’s query our historical data in S3 to see what that customer spent in each of the final 6 months of 2011. Though first we will have to include the additional data into our working set. The RECOVER PARTITIONS command makes it easy to

ALTER TABLE orders_s3_export RECOVER PARTITIONS;

We will now query the 2011 exported data for customer ‘c-2cC5fF1bB’ from S3. Note that the partition fields, both month and year, can be used in your Hive query.

SELECT year, month, customer_id, sum(total) spend, count(*) order_count
FROM orders_s3_export
WHERE customer_id = 'c-2cC5fF1bB'
AND month >= 6
AND year = 2011
GROUP BY customer_id, year, month
ORDER by month desc;

Exporting Data to S3
Now let’s export the January 2012 DynamoDB table data to a different S3 bucket owned by you (denoted by YOUR BUCKET in the command). We’ll first need to create an EXTERNAL TABLE for that S3 bucket. Note that we again partition the data by year and month.

CREATE EXTERNAL TABLE orders_s3_new_export ( order_id string, customer_id string, order_date int, total double )
PARTITIONED BY (year string, month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://YOUR BUCKET';

Now export the data from DynamoDB to S3, specifying the appropriate partition values for that table’s month and year.

INSERT OVERWRITE TABLE orders_s3_new_export
PARTITION (year='2012', month='01')
SELECT * from orders_ddb_2012_01;

Note that if this was the end of a month and you no longer needed low latency access to that table’s data, you could also delete the table in DynamoDB. You may also now want to terminate your job flow from the EMR console to ensure you do not continue being charged.

That’s it for now. Please visit our documentation for more examples, including how to specify the format and compression scheme for your exported files.

-- Adam Gray, Product Manager, Amazon Elastic MapReduce.

by AWS Evangelist at January 26, 2012 01:42 AM

ReadWriteCloud

Gravatar

How Salesforce Chatter Connect Ate the Social Network

Salesforce logo.pngOne thing you can plainly say about Salesforce CEO Marc Benioff: You know where he stands, and he's never on the fence. Over the past two years, one of Benioff's key themes at conferences and speeches is how software design, as part of the inevitable journey of all software to the cloud, is embracing the concepts of social networking. Facebook, he professes, is a lesson in itself.

Then last August at the Dreamforce conference, Salesforce kicked the evolution of its Chatter platform into overdrive. Chatter is the communications layer that's integrated into its cloud-based CRM platform, but which is open for other developers to utilize as well - not freely, mind you, but by way of extending the Salesforce ecosystem. In a demonstration for RWW, Salesforce's director of product marketing for Chatter, Dave King, revealed elements of the platform that showed the direction Salesforce is intending for it - as clear and unmistakable a direction as a theme in a Marc Benioff speech: Chatter has already become a social network for business, and we're just now waking up to that fact.

Sponsor

120120 Salesforce Chatter Connect 01.jpg

Click for full-size screenshot


"It really changes the paradigm of how you consume information," says King. He's referring to a function in the current Chatter application where resources, schedule items, projects, 'opportunities,' or groups that collect any of these things together with people, may be followed like a feed in Twitter or a member of Google+.
"In the past, you would have to go and search. You would say, 'Gosh, has anything changed with my opportunity?' You'd go log in, search, and look for that update. Well now, in the social era, those updates come to you. You just specify, 'What am I interested in staying up-to-date on?'"

Each Chatter user's feed is updated with updates, some of which are submitted, others generated, others triggered by events. In fact, we learned, some of the events which trigger updates that appear in the feed may be programmed using Force.com instructions. The rules of these triggers create actions, which may in turn generate triggers for events in other users' feeds. One example King showed us appears here: In this test system, there's a business object representing a deal for "Green Dot Media" which has been followed by the fictitious user "Valerie Eastwood." The workflow rules programmed with Force.com dictate how the terms of this approval appear. This isn't some e-mail message where someone typed, "Discount %," hit the Tab key, typed "15%," hit Enter, and went to the next line. Instead, user "Sean Reynolds" entered the required parameters and triggered the approval request, which was then forwarded to Valerie.

It's the same concepts as Facebook and Google+ are using to develop functional apps for its users around their social graph APIs. But Chatter has sneaked up on business from an unexpected angle, not by competing with Microsoft Office up-front but by absorbing the social ethos about which Microsoft has lagged behind. Salesforce obvious goal: to replace e-mail.

120120 Salesforce Chatter Connect 02.jpg

One aspect of business communication which Outlook cannot possibly get a handle on is its quality. Much of Web communication today is impacted by analytics - by assessments of its value in a broader context. For individuals, the fear of being shy in public has recently been replaced with the threat of becoming declared irrelevant by social status indicators. Since Chatter is effectively, in a broad context, a content management system, it can analyze the relevance of businesspersons' individual contributions to the network of business transactions. It can leverage what we've learned from digital sociology to drive greater business value from personal interactions.

"You need to be connected to your social graph wherever you are," explains King, "whether it's in a browser or it's on your mobile device, or maybe it's in a separate application." He goes on to state that the typical software-based collaboration scenario, found in programs like SharePoint, tend to corral teams into silos for the convenience of the software. Those silos become echo chambers where employees eventually hear little else but their own noise.

"We believe that collaboration has to be in the context of your business process. There's a lot of communication tools, but what's really valuable is when you take collaboration and you put it on top of the work you're actually doing - around an account, or maybe a customer service case. That's where the power of Chatter comes in... Silos of collaboration are really not very powerful," the Salesforce team leader remarks. "What's really powerful is when you take the social experience, and you marry it up with a process."

120120 Salesforce Chatter Connect 03.jpg

One of the many resources that a Chatter user may follow is a file. (Thus not only is Salesforce applying social leverage to compete with SharePoint, it's also outmoding Megaupload along the way.) Following a file such as a presentation enables the user to track the processes that others do around that file - downloading, commenting, editing. If there are other processes that one can imagine, then conceivably a Force.com workflow may be developed for them.

120120 Salesforce Chatter Connect 04.jpg

Click for full-size screenshot


Another concept borrowed from social networking by Chatter is the recommendations engine - where the software actually provides the user with leads as to whom to include in a project. In a way, it's not just social networking - some might see it being dangerously close to stepping into what some would consider a managerial role. "Maybe you'd like to include Wendy in this project," for example.

"We're building this with a lot of social intelligence," explains King, "so the system - based on who you are, what you click on, what you like, what files you access, what accounts you follow - presents recommendations on who you should follow, what groups you should participate in. We're helping you with the discovery of finding out what you don't even know... It's a powerful way of building the social IQ of employees."
There's a large and growing number of functions in Salesforce Chatter Connect that resemble, or mirror, or borrow ideas from concepts we've seen in LinkedIn. Does Chatter compete with LinkedIn? Should we start considering the two in the same market segment?

Dave King answers no, citing the fact that LinkedIn and Salesforce are partners on a social data integration project. "But Facebook showed us the initial way of the user interface and the feed paradigm. And a lot of other social networks have adopted that. What's different about Chatter is that it's around your co-workers. It's how you're getting work done, and it includes business process and workflow."

That having been said, King did count Chatter as at least among the other social networks. The line between online activity and application is being blurred. And when it's Salesforce that's doing the blurring, both competitors and partners will need to take heed of whether what comes next is Salesforce doing all the talking.

Discuss

by Scott M. Fulton, III at January 26, 2012 12:00 AM

January 25, 2012

ReadWriteCloud

Gravatar

New VMware VCenter Ops Suite Geared More Toward Managers

120123 VMware vC Ops Suite (150 sq).jpgOn the surface, it would seem to make sense that management is a task best performed in an organization by managers. When you apply that ethic to the emerging structure of data centers, which now use virtualization and private cloud foundations, you realize there are changes that can be made. Casting business resources as cloud services moves the budgeting process from capital expenditures to operating expenditures. And for more organizations, it means relocating management responsibilities from IT administration to a newly combined resource administration.

For these managers newly tasked with administering clouds along with people, admin tools don't make much sense. In a sweeping restructuring of its key virtualization management tools suite this morning, VMware is introducing a completely renovated dashboard for monitoring virtual data center operations, with graphs and 100-point-scale ratings designed to make better sense to people who might not, at first glance - or even second - know what any of this means.

Sponsor

120123 VMware vC Ops Suite 5 01.jpg

Retooling VM administration as analytics

"This transition from an architecture standpoint requires a new approach to IT management," states Martin Klaus, VMware's director of product marketing for vCenter Operations Management Suite, in an interview with RWW. With managers having a more marketing-centered view of the world, the new vCenter Ops will take more cues from the marketing mindset, beginning with using analytics more prominently in boiling down streams of data into need-to-know bullet points.

"As there are more moving parts that come and go more quickly, especially when you think about self-service portals," Klaus continues, "where the demand for more resources cannot be predicted up front, you need something that allows you to use analytics that give the IT administrator much more predictive controls over what's happening in environments, to intervene and intercept issues that are building before those issues impact the end users."

120123 VMware vC Ops Suite 5 02.jpg

The revised dashboard is designed to communicate more ideas in shorter spaces. Perhaps you've noticed dashboards these days taking their cues from mobile apps, which have learned (out of necessity) to communicate greater information in shorter spaces. VMware group product manager Jai Malkani says the inspiration for nugget-izing vCenter's information into nuggets came from a recent reassessment of how its customers divide responsibilities among themselves.

"The customer teams working today in a cloud environment, what are they really focused on, and what's the top thing on their minds?" remarks Malkani. "As I worked with some 140-odd customers in the beta program over the last year, [I found] the main two areas that an ops team focuses on are: insuring and restoring service levels, making sure the problems are resolved and the environment is up and running at all times; and being pro-active toward optimizing the environment for efficiency and costs."

Administration is a two-cycle engine

Some of VMware's competitors perceive such tasks as workflows that can be diagrammed using flowchart tools. VMware is gambling that these two task areas that Malkani has identified are instead perpetual and ongoing cycles, where problems are identified, mitigated, and resolved; and ideas are generated, perfected, and implemented on a continual basis.

120123 VMware vC Ops Suite 5 06.jpg

With respect to the cycle on the left, Malkani explains, the cloud administrator is focused on three areas: 1) The applications profile of the VM that appears to be the source of the problem. In other words, an app on that VM may be the actual source of the problem, but any remedial measure involving that VM will affect the entire profile of the apps or services in its purview; 2) Determining whether the problem is on account of how a problem app may be behaving in its environment, or instead the characteristics of the VM which presents that environment to the app - such as available capacity, or the current state of security monitoring; 3) Whether a corrective action is available in an immediately accessible manner - preferably, something which the admin can simply do and be done with it. A manager acting as an admin here would like to be given a multiple choice question, and choose the answer that appears to best resolve the problem - or to use Malkani's phrase for it, "closes the loop."

The "ideas" part of VMware's cycle are what Malkani calls optimizations - little improvements that come incrementally, instead of in great batches or overhauls. We see this concept emerging in so-called "resilience architectures," which replace typical crisis remediation methods with regular workflows that mitigate problems by their very nature, so that the responses to problems are essentially the same as everyday maintenance. "Do this [on the right-hand side] so that the left-hand side doesn't happen in the first place," he illustrates.

Fewer silos sounds like a good idea

The drive toward easier-to-understand metrics, and reducing and compartmentalizing dashboards with graphs and icons, is not a trend that was started by VMware. It's an increasingly competitive field, with an emerging ecosystem around management tools that fill in the gaps that management suites have left open.

Speaking with RWW, VMware's Martin Klaus admitted that one of his company's explicit goals for vCenter Ops was the reduction of the need for certain third-party tools, or for anyone else to come in thinking they need to patch the holes in vCenter. Its strategy here takes a cue from its competition, even borrowing some of its language: The new suite will incorporate tools that were offered for VMware's previous vCenter Operations tools as add-ons, including the Chargeback Manager tool that produces forecasts of future expenditures when current conditions are left as they are, compared to the savings recouped from making adjustments.

"Before, there were too many element levels, management tools in place that made for overlaps in some of the data," says Klaus. "But there's only one person sitting in front of these tools. And it was not possible for him to correlate the data that each of these tools were collecting. So with vCenter Operations, you can now see the data from third-party monitoring tools - from the storage layer, the networking layer, the database layer, the Web server layer - coming together. And now you can take a step back: What is really needed in terms of net-new data points? In this case, you can reduce the overall number of monitoring tools that are needed in the environment."

General availability of vCenter Operations Management Suite begins now, with various licensing rates beginning at $50 per VM.

Discuss

by Scott M. Fulton, III at January 25, 2012 05:30 PM

OakLeaf Systems

Gravatar

Windows Azure and Cloud Computing Posts for 1/23/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222

image433

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:


Azure Blob, Drive, Table, Queue and Hadoop Services

Brian Swan (@brian_swan) described Improving Performance by Batching Azure Table Storage Inserts in a 1/25/2012 post to the [Windows Azure’s] Silver Lining blog:

imageThis is a short post to share the results of a little investigation I did that was inspired by comments on a post I wrote about using SQL Azure for handling session data. The comment was by someone reporting that SQL Azure seemed to be faster than Azure Table Storage for handling session data. My experiments show that SQL Azure and Table Storage have very similar performance when doing single writes (YMMV), so I can’t verify or refute the claim. However, I got to wondering which is faster for inserting and retrieving many “rows” of data. I know that Table Storage is supposed to be faster, but I wondered how much faster. So I wrote a two-part PHP script that does the following:

  1. imageConnects to SQL Azure.
  2. Inserts 100 rows to an existing database.
  3. Retrieves the 100 rows.

Here’s the code:

$conn = sqlsrv_connect(SQLAZURE_SERVER_ID.".database.windows.net,1433", array("UID"=>SQLAZURE_USER."@".SQLAZURE_SERVER_ID
                                                                            , "PWD"=>SQLAZURE_PASSWORD
                                                                            , "Database"=>SQLAZURE_DB
                                                                            , "ReturnDatesAsStrings"=>true));    
    
for($i = 0; $i < 100; $i++)
{    
    $id = $i;
    $data = "GolferMessage".$i;
    
    $params = array($id, $data);
    $stmt1 = sqlsrv_query($conn, "INSERT INTO Table_1 (id, data) VALUES (?,?)", $params);
    if($stmt1 === false)
        die(print_r(sqlsrv_errors()));
 
}
 
$stmt2 = sqlsrv_query($conn, "SELECT id, data, timestamp FROM Table_1");
while($row = sqlsrv_fetch_array($stmt2, SQLSRV_FETCH_ASSOC))
{
 
}

Note: The code above uses the SQL Server Driver for PHP to connect to SQL Azure.

The second part of the script does the equivalent for Table Storage:

  1. Connects to Azure Storage.
  2. Inserts 100 entities to an existing table.
  3. Retrieves the 100 entities.

Here’s the code:

$tableStorageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', STORAGE_ACCOUNT_NAME, STORAGE_ACCOUNT_KEY);

 
$batch = $tableStorageClient->startBatch();
for($i = 0; $i < 100; $i++)
{
    $name = $i;
    $message = "GolferMessage".$i;
    
    $mbEntry = new MessageBoardEntry();
    $mbEntry->golferName = $name;
    $mbEntry->golferMessage = $message;
    $tableStorageClient->insertEntity('MessageBoardEntry', $mbEntry);
}
$batch->commit();
 
$messages = $tableStorageClient->retrieveEntities("MessageBoardEntry", null, "MessageBoardEntry");
 
foreach($messages as $message)
{
    
}

Note: The code above uses the Windows Azure SDK for PHP to connect to Azure Storage.

The result of the test was that Table Storage was consistently 4 to 5 times faster than SQL Azure (again, YMMV). The key, however, was to use the $tableStorageClient->startBatch() and $batch->commit() methods with Table Storage. Without using batches, Table Storage opens and closes a new HTTP connection for each write, which results in slower performance than SQL Azure (which keeps a connection open for writes). When using batches with Table Storage, the connection is kept open for all writes.

Note: Many thanks to Maarten Balliauw who, when I was perplexed about the results of my tests without batching (I expected Table Storage to be faster, but because I didn’t know about batches for Table Storage, I was not getting the results I expected), suggested I try batching.

The complete script (with set up/tear down of database and Table) is attached in case you want to try for yourself.


Richard Mitchell produced a 00:13:57 SQL Azure Training 7: Blobs video for Red Gate Software’s ACloudyPlace blog:

imageThe Blob – a pun that just won’t die! Also, it’s a Binary large object used in Windows Azure storage and our subject for today.This is the first of a couple videos that’s really delving into Azure Storage, and it’s slightly longer at 13 minutes so let’s begin! There are three main kinds of Blobs:

Simple Blobs – max 64MB used for binary and text files.

Block Blobs – max 200GB used for images/videos. Block Blobs have a little bit more going on than Simple Blobs, which we probably could have figured out from the naming conventions.

Page Blobs – max 1TB – really for Drives, watch the video to find out more.

imageRMitch also discusses Snapshots, Shared Access Signatures (SAS) and some Best Practices. More information can also be found in Nuno’s article on Connected Device Applications where he discusses Blobs, security, and SASs.

Full disclosure: I’m a paid contributor to Red Gate Software’s ACloudyPlace blog.

image_thumb3_thumb


<Return to section navigation list>

SQL Azure Database, Federations and Reporting

Gregory Leake posted Announcing SQL Azure Import/Export Service Now in Production to the Windows Azure Team blog on 1/24/2012:

We are pleased to announce the general availability of SQL Azure Import/Export! Now available as a production service, SQL Azure Import/Export helps organizations deploy on-premises databases to SQL Azure, and archive SQL Azure and SQL Server databases to Windows Azure Storage. Key improvements in the new production release include:

  • Increased performance & resiliency
  • Progress reporting
  • Selective Export
  • Production support
  • New usage sample EXE

This service is provided free of charge to customers using SQL Azure and Windows Azure Storage. For more information and video tutorials visit the DAC blog.


Benjamin Guinebertière posted B2B: SQL Azure as a meeting point | B2B: SQL Azure comme point de rencontre on 1/22/2012. From the English version:

imageFABRIKAM is a manufacturing company who has CONTOSO as one of its resellers. FABRIKAM would like to get their sales number thru CONTOSO channel, but CONTOSO does not want to give access to their own database and they don’t have the time to create an application to expose this data.

imageCONTOSO is willing to put data they have about FABRIKAM products they sell in a CONTOSO database, but CONTOSO cannot expose their database to the Internet either. So they decide to rent a database on the Internet.

SQL Azure is a perfect fit for this. There is no hardware, infrastructure or other high availability mechanisms to worry about, SQL Azure is a relational cloud PaaS level database as a service (note from the author: think I put quite a few keywords in that sentence!!!).

CONTOSO will be able to push filtered data (only FABRIKAM sales data) to a database outside their firewall, and FABRIKAM will be able to query that data with SQL or build reports with SQL Azure Reporting Services.

image

FABRIKAM creates à SQL Azure server (let’s call it axcdlm02uz.database.windows.net). In this server, they create a SQL Azure database named salesthrucontoso with a dedicated login salesthrucontosodbo.

CONTOSO has communicated to FABRIKAM the range of IP adddresses they use to go to the salesthrucontoso database, so that FABRIKAM can configure SQL Azure server firewall rules

image

In return, FABRIKAM communicates to CONTOSO all the details to connect to the SQL Azure database:

  • server name: axcdlm02uz.database.windows.net
  • database name: salesthrucontoso
  • login: salesthrucontosodbo and its password

As SQL Azure uses the same tabular data stream (TDS) protocol as SQL Server, CONTOSO can use SQL Server drivers to access SQL Azure. They do it thru their usual ETL(*) tool. So CONTOSO can easily build an interface from their database (SQL Server, or Oracle, DB2, or any other vendor’s database) towards the cloud database and run it everyday so that FABRIKAM easily access their data without any risk to compromise their own internal databases.

(*) ETL = Extract Transform Load.
As an example, SQL Server’s ETL is SSIS (SQL Server Integration Services).


Cihan Biyikoglu (@cihangirb) continued his series with Fan-out Querying for Federations in SQL Azure (Part 2): Scalable Fan-out Queries with TOP, ORDER BY, DISTINCT and Other Powerful Aggregates, MapReduce Style! on 1/19/2012 (missed when posted):

imageWelcome back. In the previous post: Introduction to Fan-out Querying, we covered the basics and defined the fragments that make up the fan-out query namely the member and summary queries. Fan-out querying refer to querying multiple members.

imageWe looked at queries examples in the previous post with member queries with no summary queries, that is the member queries were simply UNIONed. In this post, we’ll take a look at Summary queries. when are they needed? what are some common patterns and examples?

Summary queries are required for post processing the member queries. Simply put, summary queries can help reshape the unioned member results into the desired final share. Summary queries refer to object generated by the member queries and depending on the implementation can be executed on the client side or the server side. That said, today Federations do not provide a built in server side processing option for summary queries. Here are a few options for processing summary queries:

  • LINQ To DataSets offers a great option for querying datasets. Some examples here. LINQ is best suited for the job in my opinion with flexible language constructs, dynamic execution and parallelism options.
  • ADO.Net Expressions in DataSets offers a number of options for summary query processing as well. For example, DataColumn.Expressions allow you to add aggregate expressions to your Dataset for this type of processing. Or you can use DataTable.Compute for processing a rollup value.
  • Obviously server-side full fan-out processing is also an option. This option refers to server side running member and summary query in a single round-trip from the client to SQL Azure. However as of Jan 2012, this is not built into SQL Azure Federations. We’ll take a look at a simulated version of this in the sample tool here; You can use the deployment closer to your database for efficiency;

Americas Deployment: http://federationsutility-scus.cloudapp.net/
European Deployment: http://federationsutility-weu.cloudapp.net/
Asian Deployment: http://federationsutility-seasia.cloudapp.net/

The tool provides a basic and full ‘fanout’ page. Basic page contains only member queries through a simplified interface. The results are simply unioned (or ‘union all’-ed to be precise) together. Full page provides additional capabilities including member and summary queries. Both member and summary queries is expressed in TSQL. The full page also allows for parallelism and allows specifying a federation key range other than all members. Tool has a help page with detailed notes on each of these capabilities.

Lets dive in and take a look at where summary queries can be useful. We’ll start with Ordering, TOP, aggregations and finally DISTINCT processing. By the way, for the examples in this post, I’ll continue to use the BlogsRUs_DB schema posted at the bottom of this article.

GROUP BY and HAVING with Fan-out Queries

With simple group-by items the rule is simple; if the grouping is aligned to the federation key in fan-out queries in federations, processing of group-by and having needs no special consideration. Simply union the results (or union-all to be precise) and we are done. However processing unaligned groupings (any grouping that does not include the federation key) requires a summary query.

When grouping isn’t aligned to federation key, grouping isn’t completely done with member queries. that means processing HAVING will generate incorrect results in member queries. Lets take the example form the previous post; here is the query simply reporting the months and counts of posts with more than a million posts;

SELECT DATEPART(mm, be.created_date) mon, COUNT(be.blog_entry_title) cnt 
FROM blog_entries_tbl be 
GROUP BY DATEPART(mm, be.created_date) 
HAVING COUNT(be.blog_entry_title) > 100000
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

The grouping on month in each member yields the results from each member but grouping all members isn’t fully done yet! We need to GROUP_BY the same columns and expressions to finish grouping before we can apply the HAVING predicate.

So to correctly process HAVING predicate, we need to push t to the summary query;

SELECT DATEPART(mm, be.created_date) mon, COUNT(be.blog_entry_title) cnt 
FROM blog_entries_tbl be 
GROUP BY DATEPART(mm, be.created_date)
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

and the summary query should process the HAVING.

SELECT mon, sum(cnt) FROM #Table 
GROUP BY mon 
HAVING sum(cnt) > 100000
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

‘ORDER BY’ and ‘TOP’ with Fan-out Queries:

Lets start with ordering and TOP functionality in TSQL. The member queries can include ORDER BY and TOP but you will need the summary query to reprocess and finalize the ordering and top filtering of the member query results. Take the following example; The query is calculating the top 10 blogs created across the entire BlogsRUs_DB. Remember the blog_entries_tbl is federated on blog_id.

SELECT TOP 10 blog_entry_text FROM blog_entries_tbl 
ORDER BY created_date DESC
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Here is how to break this into a fan-out query with a member and a summary query; The #Table in the summary query refer to the resultset generated from the member query.

Member Query:

SELECT TOP 10 blog_entry_text, created_date FROM blog_entries_tbl 
ORDER BY created_date DESC 
Summary Query: 
SELECT TOP 10 blog_entry_text FROM #Table 
ORDER BY created_date DESC
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Here is the output for the member query:

image

And the output with the summary query. Now the output only contains the true TOP 10 rows.

image

Additive Aggregates: MIN, MAX & SUM with Fan-out Queries:

When the processing of the MIN, MAX and SUM align with a grouping on the federation key, a UNION ALL of the results can simply yield the result. For example the count of blog entries per blog could be proceed with the following query;

SELECT blog_id, COUNT(blog_entry_id) 
FROM blog_entries_tbl GROUP BY blog_id
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Same applies to the OVER clause used for defining windows when processing aggregates. that is, as long as the PARTITION BY includes the federation key, a UNION ALL of the results is sufficient for the summary query.

However when the grouping does not align to the federation key, you will need summary queries. Here is the example that gets us the latest date that a blog entry was created. The grouping in the absence of a GROUP BY clause is a single bucket.

SELECT MAX(created_date) FROM blog_entries_tbl
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Here is the way to break this down into a fan-out query;

Member Query:

SELECT MAX(created_date) FROM blog_entries_tbl
Summary Query: 
SELECT MAX(Column1) FROM #Table
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Fan-out Queries Processing AVG and COUNT

Average is a none additive aggregate thus it takes a rewrite to take an average when grouping does not align to the federation key. Here is an example of an average aggregate that aligns with a federation key: the next query calculates the average days between the blog entry and the last comment for the post for bloggers. the inner query gets the MAX date for the last comment on the blog entry. AVG is calculated per blogger for all the blog entries with the days between the create date of the blog and the last comment date on all blog entry.

SELECT blog_id, AVG(LEN(blog_entry_text)) 
FROM blog_entries_tbl 
GROUP BY blog_id
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

There is no need for a summary query other than a UNION ALL of the results when fanning this query out given it is aligned to the federation key: blog_id.

However when the grouping is not aligned, there is work to do! Average and Count are not additive. That is, one cannot take 2 averages and average those to get the correct average or take 2 counts and use another count to calculate the output of 2 counts; we need SUM in the case of 2 counts to correctly get the final correct count and we need to use sum of values we want to average and the item count to calculate avg correctly from multiple member queries. Here is an example: Lets take the average length of blog entries across all each month. This time the grouping is on the month of the blog posts.

SELECT DATEPART(mm,be.created_date) month_of_entry, 
AVG(LEN(blog_entry_text)) 
FROM blog_entries_tbl 
GROUP BY DATEPART(mm,be.created_date)
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Lets use SUM and COUNT to get to the correct average.

Member Query:

SELECT DATEPART(mm,created_date) month_of_entry, 
SUM(LEN(blog_entry_text)) sum_len_blog_entry, 
COUNT(blog_entry_text) count_blog_entry 
FROM blog_entries_tbl 
GROUP BY DATEPART(mm,created_date)
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Summary Query:

SELECT month_of_entry, 
SUM(sum_len_blog_entry)/SUM(count_blog_entry) avg_len_blog_entry 
FROM #Table 
GROUP BY month_of_entry
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

‘DISTINCT’ with Fan-out Queries:

DISTINCT is fairly easy to calculate much like the other additive aggregates. As long as the grouping is on the federation key. When grouping isn’t aligned to the federation key, a summary query reapplying the distinct for de-duplication is needed. Here is the query for DISTINCT count of languages used for blog comments across our entire dataset;

SELECT DISTINCT bec.language_id 
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc 
ON bec.language_id=lc.language_id
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

To break this apart, you have to run distinct in each member and then rerun the distinct on the summary query much like MIN and MAX;

Member Query:

SELECT DISTINCT bec.language_id 
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc 
ON bec.language_id=lc.language_id 
Summary Query: 
SELECT DISTINCT language_id FROM #Table
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Now lets take a look at some more challenging none additive aggregates.

Fan-out Queries with TON N PERCENT and DISTINCT COUNT

TOP N PERCENT is not as popular as TOP N but for those who use it, lets dissect what you need to do with fanout queries. TOP N PERCENT simply require you collect all results without TOP from members and only apply the TOP N PERCENT in the summary query. Imagine the same query we used for TOP with TOP 10 PERCENT. Query gets latest 10% blog entries.

SELECT TOP 10 PERCENT blog_entry_text FROM blog_entries_tbl 
ORDER BY created_date DESC
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Here are the member and summary queries.

Member Query:

SELECT blog_entry_text FROM blog_entries_tbl 
ORDER BY created_date DESC
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Summary Query:
SELECT TOP 10 PERCENT blog_entry_text FROM #Table 
ORDER BY create_date DESC
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Alarm bell should go off whenever we are not able to push the filtration predicate (TOP clause) down to the member queries. So processing TOP N PERCENT is costlier that TOP N and additive keywords.

DISTINCT COUNT calculation is trivial if grouping is on the federation key. However when grouping isn’t aligned, it takes more work to calculate distinct count given that it isn’t additive. That means; you cannot simply add distinct-counts from each member given you may not know if you are counting certain things multiple times without full de-duplication of all the items. Here is an example; this query get the distinct count of languages per month across the whole resultset:

SELECT DATEPART(mm,created_date), COUNT(DISTINCT bec.language_id) 
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc 
ON bec.language_id=lc.language_id 
GROUP BY DATEPART(mm,created_date)
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

We need to centralized processing so all comments language ids across days first need to be grouped across all members and then distinct can be calculated on that resultset. Here is the member and summary query for calculating distinct-count:

Member Query:

SELECT DISTINCT DATEPART(mm,created_date), bec.language_id 
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc 
ON bec.language_id=lc.language_id
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Summary Query:

SELECT Column1, COUNT(DISTINCT language_id) 
FROM #Table 
GROUP BY Column1
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Here is what the member query output looks like;

image

Here is the output with the summary query;

image

In conclusion, additive operations like MIN, MAX, COUNT or SUM and predicates like TOP or order by can easily be processed with fan-outs using simple modifications to original queries to break them into member and summary queries. With additive queries big advantage is that you can push these operation to the member queries for efficiencies. In cases where you need to deal with none-additive operations like average, there is some rewrite to help push filtrations and aggregate predicates to member queries. However there are a few cases like TOP N PERCENT or DISTINCT COUNT, where fan-out queries may require larger dataset shoveling thus will be more expensive to calculate because you cannot push these predicates to member queries and can only process them in the summary queries.


*Sample Schema

Here is the schema I used for the sample queries above.

-- Connect to BlogsRUs_DB
CREATE FEDERATION Blogs_Federation(id bigint RANGE) 
GO 
USE FEDERATION blogs_federation (id=-1) WITH RESET, FILTERING=OFF 
GO 
CREATE TABLE blogs_tbl( 
blog_id bigint not null, 
user_id bigint not null, 
blog_title nchar(256) not null, 
created_date datetimeoffset not null DEFAULT getdate(), 
updated_date datetimeoffset not null DEFAULT getdate(), 
language_id bigint not null default 1, 
primary key (blog_id) 
) 
FEDERATED ON (id=blog_id) 
GO 
CREATE TABLE blog_entries_tbl( 
blog_id bigint not null, 
blog_entry_id bigint not null, 
blog_entry_title nchar(256) not null, 
blog_entry_text nchar(2000) not null, 
created_date datetimeoffset not null DEFAULT getdate(), 
updated_date datetimeoffset not null DEFAULT getdate(), 
language_id bigint not null default 1, 
blog_style bigint null, 
primary key (blog_entry_id,blog_id) 
) 
FEDERATED ON (id=blog_id) 
GO 
CREATE TABLE blog_entry_comments_tbl( 
blog_id bigint not null, 
blog_entry_id bigint not null, 
blog_comment_id bigint not null, 
blog_comment_title nchar(256) not null, 
blog_comment_text nchar(2000) not null, 
user_id bigint not null, 
created_date datetimeoffset not null DEFAULT getdate(), 
updated_date datetimeoffset not null DEFAULT getdate(), 
language_id bigint not null default 1 
primary key (blog_comment_id,blog_entry_id,blog_id) 
) 
FEDERATED ON (id=blog_id) 
GO 
CREATE TABLE language_code_tbl( 
language_id bigint primary key, 
name nchar(256) not null, 
code nchar(256) not null 
) 
GO
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }


<Return to section navigation list>

MarketPlace DataMarket, Social Analytics and OData

imageNo significant articles today.


<Return to section navigation list>

Windows Azure Access Control, Service Bus and Workflow

image72232222222No significant articles today.


<Return to section navigation list>

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

imageNo significant articles today.


<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

My (@rogerjenn) Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters post of 1/25/2012 begins:

imageIntroduction

In addition to the basic MSCloudNumerics Visual Studio template and “Cloud Numerics” sample application described in my Introducing Microsoft Codename “Cloud Numerics” from SQL Azure Labs of 1/23/2011, the "Cloud Numerics" Microsoft Connect Site’s Example applications download offers three additional end-to-end examples:

  • Latent Semantic Indexing (LSI) document classification example
  • Statistics functionality demonstration
  • Time-series analysis of cereal yield data

imageThis post describes how to configure and deploy two 8-core HPC clusters hosted in Windows Azure and submit the LSICloudApplication to the Windows Azure HPC Scheduler for processing.

Table of Contents

  • Creating a Windows Azure Pay-As-You-Go Subscription
  • Learning about the Latent Semantic Indexing Example
  • Extracting and Running the Latent Semantic Indexing Example
  • Configuring and Deploying the Windows Azure HPC Cluster
  • Using the Windows Azure HPC Scheduler Web Portal
  • Running the Latent Semantic Indexing Example Locally

And continues with the remaining sections described in the TOC above.


My (@rogerjenn) Microsoft cloud service lets citizen developers crunch big data article of 1/24/2011 about Microsoft Codename “Data Explorer” for SearchCloudComputing.com begins:

imageMicrosoft is devoting substantial resources to developing codenamed apps and APIs targeting a new breed of IT teams who spelunk actionable business information from massive files, or big data. According to Gartner Research, these “citizen developers” will build at least 25% of new business applications by 2014.

imageCitizen developers differ from traditional programmers in that they’re end users who “create new business applications for consumption by others using development and runtime environments sanctioned by corporate IT.” Gartner calls this process “end-user application development,” or EUAD.

Workgroup and department members often turn to citizen developers to analyze or help them analyze structured information from on-premises and cloud-based Web server log files, semi-structured data from social websites like Twitter and Facebook, as well as hordes of unstructured files, such as Word documents and Excel worksheets.

Historically, IT staffs have written ad-hoc Microsoft Office applications with Excel or Access macros to manage structured data. Microsoft designed Visual Studio LightSwitch as a forms-over-data alternative for EUAD that provides simple data modeling features and easy deployment as on-premises or Windows Azure Web applications.

Microsoft’s LINQ Pack, LINQ to HPC and Project “Daytona” as well as Microsoft Research’s forthcoming open source Excel DataScope app were designed to make unstructured big data analytics in Windows Azure accessible to citizen developers. However, in November 2011, Microsoft announced plans to not release LINQ to HPC to production in favor of Apache Hadoop on Windows Azure, now in invitation-only private preview. To obtain a private preview, you must apply to Microsoft’s team for an invitation to test-drive the platform and its application programming interface (API). It remains to be seen if citizen developers will be capable of populating Hadoop clusters and programming MapReduce analytics.

Image goes here

Figure 1. (Click to enlarge.)
This .NET Windows form app summarizes a stream of Twitter data, including buzz (daily tweet counts) and calculated sentiment (positive and negative tweet tone) about Windows 8, as well as estimated reliability of the sentiment calculations, from a data feed provided by the Codename “Social Analytics” API.

New Microsoft products for EUAD
The SQL Azure Labs team released in late 2011 private previews of Codename “Social Analytics”, “Data Explorer” and “Data Transfer,” all of which target EUAD. Social Analytics delivers a real-time stream consisting mostly of Twitter tweets, retweets, replies and direct messages. It also includes a few Facebook “likes,” posts and comments, as well as occasional StackOverflow questions and answers.

The Windows Azure Marketplace DataMarket currently supplies two live OData streams that incorporate sentiment data about Windows 8 or Bill Gates. Microsoft's Social Analytics experimental cloud uses a Social Analytics’ API and a downloadable Visual Studio 2010 SP1 Windows application that retrieves, displays and summarizes data from the Windows 8 data feed (see Figure 1.) This app required almost 500 lines of C# code to generate, display and save the summary data as a comma-separated value (CSV) text file. It took me about a day to code and test and is probably too complex for most citizen developers to program.

Image goes hereFigure 2. (Click to enlarge.)
Selecting a resource from the list at the left displays 12 tool icons with galleries for more Filters as well as Insert Column and Split Column Transform tools. ContentItems is a table included in the Codename “Data Analytics” VancouverWindows8 data feed from the Windows Azure Marketplace

Simplifying cloud-based mashups
Microsoft touts Codename “Data Explorer” (DE) as a way for ordinary PC users to automatically discover data available to download from the Windows Azure Marketplace; enrich data by combining it in mashups with related data from the Marketplace, Web, databases and other data types; and publish results from cloud-based workspaces stored in Windows Azure. DE also is an easily approachable, composable extract-transform-load (ETL) tool that provides many of the capabilities of SQL Server Integration Services (SSIS) without the long learning curve. DE provides a set of tools to manipulate data resources in the sequence you specify (Figure 2.)

Data Explorer lets you emulate a complex set of procedural operations on tabular data, such as those needed to display source data and aggregate the daily buzz and sentiment values shown in Figure 1, by applying tools to resources. ContentItemTypes is an enumeration resource that the Lookup Column tool uses to translate numeric ContentTypeId values to readable ContentTypeName values in the second column of Figure 2’s ContentItems table display.

Image goes here

Figure 3. (Click to enlarge.)
The Daily Summary resource delivers a row for each of the 58 days that Social Analytics data was available with a Tweet Count column from Daily Items, Tones Positive and Reliability Pos columns from the Daily Positives table, as well as Tones Negative and Reliabilty Neg columns from Daily Negatives table.

The Tones resource provides a similar Lookup Column for ToneValues in the fifth column. Daily Items is the initial table for the DailySummary aggregation resource. Daily Summary has Published On and Tweet Count columns and a row for December 27, 2011 and the preceding 58 days. Merging aggregated DailyPositives and DailyNegatives table resources with the equivalent of an SQL left outer join on the Published On column creates a table with Published On, Tweet Count, Tones Positive, Tones Negative, Reliability Pos and Reliability Neg columns, as shown in Figure 3.

Writing formulas with graphical builder UIs
DE has a full-blown formula-based programming language that’s based on Microsoft’s M (for Modeling) language, a component of the ill-fated Oslo repository database and the Quadrant query and visualization tool. However, most DE users won’t need to write M code because DE’s user interface includes graphical builders for the fx expressions that appear at the top of tables. The following appears at the top of the table in Figure 3:

fx = Table.RenameColumns(ReorderedColumns,{{"Rt.Positives", "Tones Positive"}, {"Rt2.Negatives", "Tones Negative"}, {"Rt.ReliabilityPos", "Reliability Pos"}, {"Rt2.ReliabilityNeg", "Reliability Neg"}})

To expose complete multi-line formulas, click the v-shaped icon at the right of the first line.

Following are the formulas for all actions that define the Daily Summary and Merge resources:

shared #"Daily Summary" = let

#"Daily Summary" = Table.Join(Merge,{"PublishedOn"},Table.PrefixColumns(DailyNegatives, "Rt2"),{"Rt2.PublishedOn"},JoinKind.LeftOuter),

HiddenColumns = Table.RemoveColumns(#"Daily Summary",{"Rt.PublishedOn", "Rt2.PublishedOn"}),

ReorderedColumns = Table.ReorderColumns(HiddenColumns,{"PublishedOn", "Tweet Count", "Rt.Positives", "Rt2.Negatives", "Rt.ReliabilityPos", "Rt2.ReliabilityNeg"}),

RenamedColumns = Table.RenameColumns(ReorderedColumns,{{"Rt.Positives", "Tones Positive"}, {"Rt2.Negatives", "Tones Negative"}, {"Rt.ReliabilityPos", "Reliability Pos"}, {"Rt2.ReliabilityNeg", "Reliability Neg"}})

in

RenamedColumns;

shared Merge = Table.Join(DailyItems,{"PublishedOn"},Table.PrefixColumns(DailyPositives, "Rt"),{"Rt.PublishedOn"},JoinKind.LeftOuter);

Image goes here

Figure 4. (Click to enlarge.)
All tools have graphical formula builders, which appear when you click the Edit button at the left of the formula. This early version of the graphical UI for the Merge builder requires that you select PublishedOn in one of the lists and add a prefix to the right table’s column names.

Figure 4 shows the builder UI for the Merge resource’s action that generates the formula for the left outer join between DailyItems and DailyPositives tables on the PublishedOn field. As noted in my post about problems discovered with values and merging tables in Data Explorer, this builder UI is far less than intuitive and was undergoing usability improvement by the DE team at press time. …

I continue with the details of “Publishing your big data mashup.”

Full disclosure: I’m a paid contributor to Search Cloud Computing.com.


My (@rogerjenn) Introducing Microsoft Codename “Cloud Numerics” from SQL Azure Labs post of 1/23/2012 begins:

imageIntroduction

Table of Contents

  • “Cloud Numerics” Background
  • The MSCloudNumerics.sln Project Template and Sample Solution
  • “Cloud Numerics” Prerequisites
  • Installing the HPC and “Cloud Numerics” Components
  • “Cloud Numerics” Mathematic Libraries for .NET
  • “Cloud Numerics” Distributed Array, Algorithm and Runtime Libraries for .NET
  • Limitations of “Cloud Numerics”
  • Running the MSCloudNumerics Sample Project Locally
  • References

Updated 1/25/2012: My (@rogerjenn) Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters of 1/25/2012 describes how to configure and deploy two 8-core HPC clusters hosted in Windows Azure and submit the Latent Semantic Indexing (LSICloudApplication) project to the Windows Azure HPC Scheduler for processing


“Cloud Numerics” Background

imageCodename “Cloud Numerics” is the latest in a series of new SQL Azure Labs tools for managing and analyzing Big Data in the Cloud with Windows Azure and SQL Azure. Ronnie Hoogerwerf’s introductory The “Cloud Numerics” Programming and runtime execution model post of 1/11/2012 to the Microsoft Codename “Cloud Numerics” blog begins:

Microsoft Codename “Cloud Numerics” is a new .NET® programming framework tailored towards performing numerically-intensive computations on large distributed data sets. It consists of

  • a programming model that exposes the notion of a partitioned or distributed array to the user
  • an execution framework or runtime that efficiently maps operations on distributed arrays to a collection of nodes in a cluster
  • imagean extensive library of pre-existing operations on distributed arrays and tools that simplify the deployment and execution of a “Cloud Numerics” application on the Windows Azure™ platform

Writing numerical algorithms is challenging and requires thorough knowledge of the underlying math; typically this line of work is the realm of experts with job titles such as: data scientist, quantitative analyst, engineer, etc. Writing numerical algorithms that scale-out to the cloud is even harder. At the same time the ever increasing appetite for and availability of data is making it more and more important to be able to scale-out data analytics models and this is exactly what “Cloud Numerics” is all about. For example, with “Cloud Numerics” it is possible to write document classification applications using powerful linear algebra and statistical methods, such as Singular Value Decomposition or Principle Component Analysis, or to write applications that search for correlations in financial time series or genomic data that work on today’s cloud-scale datasets. [Links added.]

“Cloud Numerics” provides a complete [C#] solution for writing and developing distributed applications that run on Windows Azure. To use “Cloud Numerics” you start in Visual Studio with our custom project definition that includes an extensive library of numerical functions. You develop and debug your numerical application on your desktop, using a dataset that is appropriate for the size of your machine. You can read large datasets in parallel, allocate and manipulate large data objects as distributed arrays, and apply numerical functions on these distributed array[s]. When your application is ready and you want to scale-out and run on the cloud you start our deployment wizard, fill out your Azure information, deploy, and run you[r] application.

An important takeaway from the preceding excerpt is that the BigData input to “Cloud Numerics” applications must be a partitioned or distributed numeric array. You can load data into distributed arrays with data that implements the Numerics.Distributed.IO.ParallelReader interface or is processed by the sample Distributed.IO.CSVLoader class, which implements that interface.

Note: Source code for the Distributed.IO.CSVLoader class is included in the Cloud Numerics - Examples download, which is described in the “Install the HPC and ‘Cloud Numerics’ Components” section below.

imageRonnie’s Using Data post of 1/20/2012 is a useful reference for array data; it contains the following topics:

A rectangular array of numbers, symbols or expressions is called a matrix. Wikipedia has very detailed Matrix Theory and Linear Algebra topics. Matrix theory is a part of linear algebra. image

My article continues with the “The MSCloudNumerics.sln Project Template and Sample Solution” and later topics.


Toddy Mladenov described Accessing Windows Azure REST APIs with cURL in a 1/25/2012 post:

imageTonight I was playing with cURL on my Mac wondering how easy would it be to develop few scripts to manage Windows Azure applications from non-Windows machine. As it turns out getting access to Windows Azure REST APIs was quite simple. Here are the steps I had to go though in order to be able to receive valid response from the APIs:

Set up Windows Azure management certificate from your Mac machine

imageThe first thing I had to do is to create a self signed certificate that I can use to do the Service Management. Creating the cert with openssl (which is available on Mac) is quite simple - just type:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout azure-cert.pem -out azure-cert.pem

During the creation openssl will ask you for all the necessary information like country name, organization name etc. and at the end will generate .pem file that contains the public and the private key.

In order to upload the certificate to your Windows Azure subscription using the Management Portal though you need to have the certificate in PKCS12 (or .pfx) format. Here is the openssl command that will do the work:

openssl pkcs12 -export -out azure-cert.pfx -in azure-cert.pem -name "My Self Signed Cert"

Now that you have the PKCS12 file you can go ahead and upload this to your Management Certificates using the portal.

Update: By writing this in the middle of the night I totally messed up what you need to do. PKCS12 you need if you want to enable SSL for your service. For management you only need the public key that you can export in .CER file. Here is the command that you use for this:

openssl x509 -outform der -in azure-cert.pem -out azure-cert.cer

Now you can upload the .CER to the Management Certificates section using the portal.

Windows Azure Management Certificates - Management Portal Screenshot

The initial set-up is done!

Using cURL to Access Windows Azure REST APIs

Now that you have the cert created and uploaded to Windows Azure you can easily play with the REST APIs. For example if you want to list all your existing hosted services you can use the List Hosted Services API as follows:

curl -E [cert-file] -H "x-ms-version: 2011-10-01" "https://management.core.windows.net/[subscr-id]/services/hostedservices"

where:

  • cert-file is the path to the .pem file containing the certificate
  • subscr-id is your Windows Azure subscription ID

Don't forget to specify the version header (the -H flag for cURL) else Windows Azure will return an error. As a result of the call above you will receive XML response with list of all the hosted services in your Windows Azure subscriptioin.

You can access any of the REST APIs by manually constructing the request and the URL as described in the Windows Azure Service Management REST API Reference.

I didn't get to any of my planned scripts but I can explore the APIs easily cURL.

You might like:


Bruce Kyle reported MSDN Subscribers Receive Up to $3,700 in Annual Windows Azure Benefits in a 1/25/2012 post to the US ISV Evangelism blog:

imageIf you are an existing Visual Studio Professional, Premium or Ultimate with MSDN subscriber, you get free access to Windows Azure each month, and up to $3,700.00 in annual Windows Azure benefits at no charge.

This offer provides a base level of Compute, Storage, SQL Azure database, Access Control, Service Bus and Caching each month at no charge.

imageSpending limits are a new feature we added to Windows Azure last month, and ensure that you never have to worry about accidentally going over the resources included in a free offer and being charged.

You can easily track what resources you’ve used on Windows Azure by clicking the “Account” tab of the www.windowsazure.com web-site. This is another new feature we added to Windows Azure last month, and it allows customers (both free trial and paid) to easily see what resources they’ve used and how much it is costing them.

Benefits are offered world-wide.

Get more details, see Get up to $3,700.00 in annual Windows Azure benefits at no charge.

Getting Started with MSDN Subscription

There are several ways to get an MSDN Subscription:

  • Purchase one of Visual Studio subscription offers.
  • Qualify for the Silver or Gold ISV Competency (or other competencies)
  • If you are a startup, join BizSpark.

Or if you are in an enteprise, there’s a good chance you already have access to it.

How to Purchase an MSDN Subscriptions

Compare subscriptions.

To buy or renew a MSDN subscription please contact your Microsoft Partner or local Reseller.

Purchasing more than one license? Microsoft Volume Licensing offers flexible licensing solutions for companies needing multiple licenses and helps volume customers save time & money.

Also Available as Benefit through Microsoft Partner Network

Another way you may have an MSDN Subscription is through the Microsoft Partner Network.

ISVs can earn the ISV Competency and receive MSDN Subscriptions that entitle you to compute hours and benefits on Windows Azure.

To qualify for the Silver ISV Competency:

  1. Deliver one product or application that has passed one qualifying Microsoft Platform Test. (Get details.)
  2. Provide three verifiable customer references. (Get details.)
  3. Complete a full profile and pay the silver membership fee

You can use the License Calculator to help you better understand your software benefits based on the type and number of competencies you earn.

To get started, see ISV Competency Requirements.

BizSpark for Startups

BizSpark will fast-track the success of your startup with software, support & visibility. One of the benefits of the program is to help jumpstart your business. If you are a development company, privately held, less than three years old, and are making less than $1 million/year (US) in revenue, you qualify. Sign up for BizSpark.

I use my MSDN Subscription benefit to run the publicly accessible demonstration of GridView paging and iterative operations on Northwind Customer entities in Windows Azure Table Storage with the OakLeaf Systems Azure Table Services Sample Project demo from my Cloud Computing with the Windows Azure Platform book.


Brian Goldfarb (@bgoldy) posted Announcing Native Windows Azure Libraries and Special Free Pricing Using SendGrid for Windows Azure Customers to the Widows Azure Blog on 1/25/2012:

imageLast week our friends over at SendGrid shipped new native libraries on GitHub (C#, Node.JS) for Windows Azure developers that make it extremely easy to integrate their mail service into any application built and running in Windows Azure. In addition, SendGrid launched a new offer for Windows Azure customers that provides 25,000 free emails a month! We’ve heard from customers consistently that sending email was too hard and we listened! See detailed, step by step tutorials written by us on how to use SendGrid with Windows Azure in the Developer Center (C#, Node, PHP, Java).

Sending email from Windows Azure has never been so easy. For example, with C#:

Add the SendGrid NuGet package to your Visual Studio project by entering the following command in the NuGet Package Manager Console window:

PM > Install-Package SendGrid

Add the following namespace declarations:

using System.Net;
using System.Net.Mail;
using SendGridMail;
using SendGridMail.Transport;
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

It can be this easy

// Create an email message and set the properties.
SendGrid message = SendGrid.GenerateInstance();
message.AddTo("anna@contoso.com");
message.From = new MailAddress("john@contoso.com", "John Smith");
message.Subject = "Testing the SendGrid Library";
message.Text = "Hello World!";
// Create an SMTP transport for sending email.
var transport = SMTP.GenerateInstance(new NetworkCredential("username", "password"));
// Send the email.
transport.Deliver(message);
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Signup for 25,000 free emails a month today!


Wade Wegner (@WadeWegner) described s cause of the Cannot create database ‘DevelopmentStorageDb20110816′ for the Windows Azure Storage Emulator problem in a 1/25/2012 post:

imageHave you seen this error before? If you’ve spent any time with the Windows Azure storage emulator it’s highly probable. Here’s the full text:

    Added reservation for http://127.0.0.1:10000/ in user account COMPUTER\User.
    Added reservation for http://127.0.0.1:10001/ in user account COMPUTER\User.
    Added reservation for http://127.0.0.1:10002/ in user account COMPUTER\User.

    Creating database DevelopmentStorageDb20110816...
    Cannot create database 'DevelopmentStorageDb20110816' : CREATE DATABASE permission
    denied in database 'master'.

    One or more initialization actions have failed. Resolve these errors before attempting
    to run the storage emulator again. These errors can occur if SQL Server was installed
    by someone other than the current user. Please refer to
    http://go.microsoft.com/fwlink/?LinkID=205140 for more details.

And an image of the error:

Development Storage initialization error

imageThis error can occur when running the storage emulator (or running DSINIT.exe) for the first time. The compute emulator needs to initialize itself, which includes creating a local SQL Server database that is used to store data for local Windows Azure storage. The above error indicates that there’s a permissions [problem] when trying to create the database.

There are a number of ways to resolve this issue and, like others, I have my favorite approach. I have a script that I run which will add the executing user to the SQL Server sysadmin role.

I’ve published the entire script here: https://gist.github.com/1677788. Simply download and unzip the file. Open up an elevated command prompt and execute the file (i.e. run addselftosqlsysadmin.cmd). Once the script is executed the user can successfully initialize the storage emulator.


Don Pattee reported the availability of REST API docs [for Windows HPC Server 2008 R2 SP2 in Windows Azure] on MSDN on 1/25/2012:

imageI know there were a bunch of folks waiting on this, so I'm happy to say the REST API documentation for the HPC Pack 2008 R2 SP3 release is available.

Windows HPC Server 2008 R2 with Service Pack 2 (SP2) provides access to the HPC Job Scheduler Service by using an HTTP web service that is based on the representational state transfer (REST) model. You can use this REST API to create client applications that users can use to define, submit, modify, list, view, requeue, and cancel jobs.

imageWindows HPC Server 2008 R2 with Service Pack 3(SP3) expands the HTTP web service to provide additional operations that provide information about nodes, node groups, and the cluster name. Windows HPC Server 2008 R2 with SP3 also provides operations that allow you to create and manage SOA sessions, and to send SOA requests and receive SOA responses for these sessions. The SOA operations that are available only when the REST web service is hosted in Windows Azure. All other operations are available either when the REST web service is hosted on an on-premise cluster or when the REST web service is hosted in Windows Azure.

Check it out at http://msdn.microsoft.com/en-us/library/hh560254(VS.85).aspx and http://msdn.microsoft.com/en-us/library/hh560258(VS.85).aspx

See my Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters post of 1/25/2012 for details of uploading HPC Clusters to Windows Azure and running .NET numeric analysis applications in the clusters.


<Return to section navigation list>

Visual Studio LightSwitch and Entity Framework 4.1+

Jan Van der Haegen (@janvanderhaegen) described MyBizz Portal: The “smallest” LightSwitch application you have ever seen (LightSwitch Star Contest entry) in a 1/24/2012 post:

imageDear blog reader, thank you (again) for taking the time to come to my blog and read about my LightSwitch adventures. In this blog post series, you find my (elaborate) submission to the CodeProject LightSwitch Star Contest. If you enjoy this article, it would like it if you tweet about it, link from your site to it or send me your firstborn daughter as a token of gratitude, however it would also mean a lot to me if you hopped over to the CodeProject site and voted on the submission as you see fit. You can also leave any questions, remarks or comments there, however there’s a very good chance that (especially in the distant future) I’ll get your feedback much quicker if you just post on the bottom of this blog post.

Link to CodeProject submission

imageAddicted to the power of LightSwitch since the first time I ever saw a demo of it, I immediately realized that LightSwitch offered me a window of opportunity to found my own software company, which will officially be born on April 2nd 2012 (after office hours at first)… LightSwitch is a truly unique tool to write small to medium-sized business applications in virtually no time, with enough flexibility to avoid hitting a “brick wall”, which is all-to-often the case in classic RAD frameworks. The previous sentence contains a lot of LightSwitch praising and loving, but also contains one of my biggest worries. What about large enterprise application? What if my applications start off small, but become giant successes? Will my beloved technology become a burden once business starts growing?

  • Can LightSwitch handle extremely large applications? Is there a limit on how much functionality I can fit in one LSML? Will an enterprise LightSwitch application load fast enough?
  • Can LightSwitch handle horizontal scaling scenarios? Can I do a distributed installation of a LightSwitch application over several servers to support a massive amount of simultaneous users?
  • How can I reuse screens and entities between my customers when they have similar needs? Is a VSIX extension really the only way? And will I be able to partially update one functionality without fear of breaking other parts of the application?
  • LightSwitch can help me develop at lightning speed, but is that the only aspect of a smooth running company? How can LightSwitch help me to align customers, developers, salesmen, … Or help me keep track of versions, servers, feature requests, installations, …
  • How are multiple developers going to work on the same LightSwitch application? Merging the LSML file on a daily bases will be a hell…

Will LightSwitch not only lift me up my feet, but still carry me once I hit cruise speed?

One night – for some reason, my best ideas always hit me about 4-5 hours before my alarm clock tells me to get up and get ready for work – I remembered an essay which contained the statement that “our vision on reality is shaped, and thus limited, by the language that we use“. By simply using a different lingo (the one used when talking about enterprise service oriented architecture taxonomy), my vision of LightSwitch, and thus its limitations, drastically changed.

LightSwitch taxonomy 1o1
LightSwitch application (LS APP)

To try to speak the same language again, let’s have a quick look at the anatomy of one of the most “complex” LightSwitch applications that we can develop today. Not mentioning cosmetic extensions, a LightSwitch application itself will generate three core components: a single SilverLight application, that connects to a single WCF service, which in turn connects to a single SQL database, and perhaps you’ll use one of three other datasources: an external database, a sharepoint database, or an external WCF RIA Service.

This is what we formerly called a LightSwich application, however from now on a LightSwitch app will be considered no more than a LightSwitch project structure in Visual Studio

When creating a new LightSwitch application, you can choose to think about the result as a “module”. A LightSwitch “module” should immediately shape the mental picture that what you are designing is a well-scoped, reusable part of a greater organism.

LightSwitch Entity Module (LS EM)

A LightSwitch Entity Module is a LS APP that has the only responsibility to design simple database entities on which other modules can perform CRUD operations. It contains no business logic or validation. (Note: a LS APP will generate a WCF service and a rather empty SilverLight component as well, we’ll just never use them.)

External Process Module (EX PcM)

An External Process Module is any module that can provide access to external processes to the end-user, or to our LightSwitch Process Modules. By external I am not necessarily trying to indicate that the source code of those modules is not under our control, but merely referring to the fact that they are not written in LightSwitch.

LightSwitch Process Module (LS PcM)

A LightSwitch Process Module is a LS APP that implements a particular business process. It (optionally) contains its own entities, logic, screens, and references whatever LS EM, EX PcM or other LS PcM it needs to materialize that business process.

Any LightSwitch Module needs careful scoping / designing before development, but this is especially easy to violate when talking about LS PcM. Consider the functionality of managing customers for my software company. My sales force needs software to work with those customers, so that they can schedule meetings to talk about improvements or new feature requests. The development team will need software to work with these customers as well, so that they can implement user stories for them, contact them to discuss that implementation, … Engineering will also need software to work with these customers, as they’ll want to keep track of the installed software versions, the servers, staging areas, … And finally, the accountant will need these customers in his accounting suite, so that he/she can bill the customer correctly. Although it might seem a good idea to create a “Manage Customers LightSwitch Process Module”, it’s a much better and maintainable idea to create four separate LS PcM, because you are dealing with four separate business processes, each with its own logic, references and screens, and most likely with a shared “Customer LS EM” containing common “Customer” entities. The four separate LS PcM will not only be smaller and more maintainable, but they will load faster too.

LightSwitch Utility Module (LS UM)

A LightSwitch Utility Module is a non-application specific module that provides some reusable functionality. Ideally, but not always, you can package these as a VSIX extension. Good examples are generic reporting extensions, logging functionality, or the built-in LightSwitch security module (users, groups & rights).

LightSwitch Portal Module (LS PtM)

A lot of the popular application frameworks, such as PRISM or Caliburn, implement the concept of a “shell”, which blends together different modules or subsystems into a single UI at runtime. Even after numerous attempts I still haven’t succeeded to implement this runtime weaving in LightSwitch. However, you can still create a LS APP that provides a rather similar experience to the end-user, considering you can host a web browser in a LightSwitch (desktop) app (thanks Tim Leung for that article, you’re a genius!), and a SilverLight application (LS PcM) is hosted in a normal web page:

Above is my first succeeded attempt to create a LightSwitch Portal Module. I was working on a proof-of-concept of two interacting LightSwitch applications (hence the “student”/”teacher” entities) when I read Tim Leung’s article. I just had to try it right there right then. If you click on the image and pay attention to the web page property, you’ll see that you could create and test your own LS PtM with two simple LS APPs, right from the Visual Studio debugger!

A LightSwitch Portal Module is a LS APP that provides a user a single entry point into a Business Network and the ability to “portal” to the process module of his choice.

Business Network.

A Business Network is a collection of different modules that support a company in its business. This network can be partially or completely private (not accessible outside the company domain for example). It can contain numerous External and LightSwitch Modules. Business networks of allied companies could also partially overlap.

MyBizz network

In the picture above you can see a (simplified) example of such a network. Two actually. The network on the left is the network that will fulfill my startup’s needs, the network on the right is one that I created for demo purposes… Let’s just say – to avoid legal matters – that “a certain real estate company” already saw this “demo that I just had lying around” and “might” become my first paying customer on April 2nd. Both networks happen to be distributed over both a private (Windows 2008) and a public (Windows Azure) server, totaling 4 servers…

What the picture doesn’t show, is that both networks actually overlap. It would be more obvious if I’d draw the dependencies between the different components, but that quickly turns the overview into a messy web… The demo customer’s skinrepository pulls resources from my business (central) skin repository, my STS (see later: single-sign-on) is a trusted STS of the demo customer’s STS, and his MITS (My Issue Tracking System) reads and writes to my PITS (Public Issue Tracking System). Our two Business Networks contain over 235 entities, hundreds of screens, about every extension I could think of writing (including the Achievements extension to encourage my sales force to write up decent specs for each customer requirement, for which they earn points, which could be materialize in a bonus plan, meanwhile making the work for the developer easier… and a very early alpha version of the skinning extension), almost all of the free extensions I could get my hands on (thanks to all the community contributors!), and some commercial extensions.

MyBizz Portal

I’m still playing around with the portal application at the moment, but have come to a point where I am satisfied enough to “reveal it to the public”.

Getting started with MyBizz Portal…

Upon a “clean” install, an administrator user can start the application, and after logging in is greeted with the main – and only – screen of the application…

To explain what you are seeing here… I have a LS EM that contains the core data, one of the entities is called a “MyBizzModule”. In my portal application, I connect to that data, and show it in a custom Silverlight Control. Because there are no other screens, and no commands, I opted to use a blank shell extension.

Because there’s only one module configured (in other words: only one MyBizzModule record in the database), the only option this application offers us now is to click the lonely red box icon at the bottom…

When clicked, the module opens and displays it’s login screen to us… (You might have to enlarge the image to see that it’s actually a LightSwitch application being shown IN another LightSwitch application).

Single-Sign-On

After logging in, we can see the main screen of the module. This LS PcM allows us to manage all the modules in a Business Network, only one is configured at the moment.


Name, Image and Portal site should need no explanation, but the fourth one is a very special one which will take more than an hour to explain in detail.

In my Business Network, I have a WCF service that fulfills the role of a Security Token Service. The login screens used throughout the Business Network are actually all “hacked” custom implementations. Instead of authenticating with the current LightSwitch application, they request an encrypted token from the STS, and use that token as a key to log in – or, and only if automatic authentication fails, present the login screen to the user.

When I logged in to the portal application, the portal application kept the token in memory.

Now, if I check the “Include Security Token” checkbox, and portal to the module, the encrypted token will be passed in the query (I did the same thing with the “LightSwitch Deep Linking“) to the module. The “hacked” login screen will pick up the token and try to use that token to log in automagically. This functionality will be really important for the end-user, we do not want him to enter a username & password every time he/she portals to a different module, hence the “single-sign-on”…

Another major benefit is that one STS can “trust” another STS. This means that I can log on to my Business Network, and navigate to a “public” module from an allied company’s Network. That module will recognize that the token I’m trying to use was issued from my STS, and map my claims (permissions) accordingly.

Another thing that I’m trying to pass through, is information about the “default skin” that should be used. This feature is in an EXTEMELY early alpha stage, so I wont go into it too deep.

Last property, the required claim, simply allows the administrator to indicate that only users with the given permission (claim) can view / use the module, allowing fine-tuning of users&groups at a much higher level than the classic screens & entities level that we LightSwitch developers have been doing so far…

Moving on

Enough explaining, let’s give that module some values & restart the application (changes in the configured modules still require a restart at this point… ).

You can see that the “Cosmopolitan.Blue” skin has been loaded (unfinished, based on a Silverlight theme and the LightSwitch Metro theme sources), and that I did not need to log on a second time to enter the module… (Well, kind of hard to post a screenshot of the latter.)

Let’s have some fun and add two sample modules, one portal to Twitter and one to MSDN.

Clicking the bird in the middle shows that the portal application is (obviously) not limited to LS PcM alone, any link that can be shown in a web browser will do…

Finally, just to show my custom SilverLight control – 3 icons really don’t do it justice, these screenshots below are from my Business Network as it is today, roughly 2 months before my company will actually be born…

Which results in this layout…

And for bonus points…
Basic questions

What does your application (or extension) do? What business problem does it solve?

From a technical point of view: it’s a webbrowser wrapper with an advanced URL management system.

From a LightSwitch point of view: it’s a LightSwitch application that manages my LightSwitch applications.

From a functional point of view: it’s an application that allows a user to portal anywhere he wants ( / is allowed to ) inside my LightSwitch Business Network…

From a personal point of view: it’s the last of the missing pieces of the puzzle, it will help me manage my startup (customers, sales force, expenses, keeping track of projects & LightSwitch applications, …) , and gave me the confidence that truly anything is possible with LightSwitch.

How many screens and entities does this application have?

The application has 1 screen and 1 entity.

Additional questions

Did LightSwitch save your business money? How?

Not yet, but I strongly believe it will save me loads once my company is born… I can however say that without LightSwitch, the chances of me starting a company after my day job, and actually produce working software, would pretty much be zero.

Would this application still be built if you didn’t have LightSwitch? If yes, with what?

Most likely, it wouldn’t have been built.

How many users does this application support?

In theory, over 7,023,324,899; especially since the modules can be deployed on and load balanced over seperate servers. The databases would be the only bottleneck.

How long did this application take to actually build using LightSwitch?

Less than a week. Most time was spent trying to make a radial ItemsControl. I have limited experience as a developer, and none as a designer… Another two evenings were spent to write this article & take the screenshots. The LightSwitch part took me 15 minutes.

(Developing both Business Networks took me 6+ months, ofcourse)

Does this application use any LightSwitch extensions? If so, which ones? Did you write any of these extensions yourself? If so, is it available to the public? Where?

The LS PcM in the Business Networks also use…

  • LightSwitch Achievements – homebrewn and part of a blog post series, so available to public soon.
  • LightSwitch STS – homebrewn and won’t be available to public anytime soon, however I’m porting the ability to inject a custom login screen to EME soon.
  • LightSwitch Skin Studio – homebrown and available to public later this year… Much, much later…
  • Many, many, many commercial and free extensions – a special thanks to the LightSwitch crew, Allesandro, Tim, Yann, Bala and Michael for contributing so much to the community! (So sorry in advance if I forgot anyone special!!)

How did LightSwitch make your developer life better? Was it faster to build compared to other options you considered?

LightSwitch gave me an incredible passion & energy to build applications that support the customer’s process, I wouldn’t find the energy to do this in any other technology that I know of at the time of writing… It does all the tedious and boring tasks for you with a few clicks, and let’s you focus on what really makes your application stand out.

Additional additional questions

Can LightSwitch handle extremely large applications? Is there a limit on how much functionality I can fit in one LSML? Will an enterprise LightSwitch application load fast enough?

Yes, if you find any way to split your extremely large applications in many different modules, and bring them all together, there should be no limits for any LightSwitch developer. The solution I choose (using a webbrowser control) fits my business’ needs perfectly, and because modules are only loaded when you actually access them, drastically reduces the load time of the application.

Can LightSwitch handle horizontal scaling scenarios? Can I do a distributed installation of a LightSwitch application over several servers to support a massive amount of simultaneous users?

Yes, it does!

How can I reuse screens and entities between my customers when they have similar needs? Is a VSIX extension really the only way? And will I be able to partially update one functionality without fear of breaking other parts of the application?

A VSIX is one way, and sometimes the best way. If you truly want to reuse screens & entities, distributing your application over multiple, reusable modules, adds a whole new dimension of benefits to the LightSwitch way of development.

LightSwitch can help me develop at lightning speed, but is that the only aspect of a smooth running company? How can LightSwitch help me to align customers, developers, salesmen, … Or help me keep track of versions, servers, feature requests, installations, …

LightSwitch helped me build several LS PcM to keep track of my business process in just a couple of days…

How are multiple developers going to work on the same LightSwitch application? Merging the LSML file on a daily bases will be a hell…

Not if they work on one module each…

Where is the source?

- You wish ;)

(Although parts could be polished, then published on my blog, on demand).


Return to section navigation list>

Windows Azure Infrastructure and DevOps

William Bellamy posted Troubleshooting Best Practices for Developing Windows Azure Applications to the MSDN Library on 1/24/2012. From the introduction:

Author: William Bellamy, Microsoft Principal Escalation Engineer

Contributors:

  • Bryan Lamos, Microsoft Senior Program Manager, Product Quality
  • Kevin Williamson, Microsoft Senior Escalation Engineer, Azure Developer Support
  • Pranay Doshi, Microsoft Senior Program Manager, Windows Azure Production Services
  • Tom Christian, Microsoft Senior Escalation Engineer, Azure Developer Support

Published: January 2012

Summary


imageThe number one priority of Microsoft is to help Windows Azure customers keep their applications up and running. The Windows Azure Service Level Agreements define a 99.95% availability of external connectivity when you deploy two or more role instances. However, external connectivity ensures that you are able to reach your application from outside the Microsoft data centers, which is not the same as "site up." Most Windows Azure services have multiple dependencies: SQL Azure, Caching, Content Delivery Network, internal resources (through Windows Azure connect), etc. The failure of any one of these dependencies can cause your Windows Azure service to not function as expected.

This paper focuses on the different troubleshooting challenges and recommended approaches to design and develop more supportable applications for Microsoft’s Windows Azure platform. When (and not if) a problem occurs, time is of the essence. Proper planning can enable you to find and correct problems without having to contact Microsoft for support. The approach advocated in this paper will also speed the resolution of problems that require Microsoft assistance.

Intended Audience


This paper is intended to be a resource for technical software audiences: software designers, architects, IT Professionals, System Integrators, developers and testers who design, build and deploy Windows Azure solutions.

We assume that you have a basic understanding of the application development lifecycle of a Windows Azure application, including terminology and the various components of the Windows Azure development and runtime environment.

We also assume that basic guidelines for Windows Azure will be followed, such as using the latest version of the Windows Azure SDK and testing code changes before they are put into production.

Document Structure


This paper is organized into two sections:

  • Overview of Windows Azure diagnostic resources:
    • Windows Azure resources
    • Third-party resources
  • Best practices for supportable design, development and deployment:
    • Before you deploy your application.
    • Fail fast design and monitoring.
    • What to do when a problem happens.

Contents



<Return to section navigation list>

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

image

No significant articles today.


<Return to section navigation list>

Cloud Security and Governance

No significant articles today.


<Return to section navigation list>

Cloud Computing Events

Glenn Block (@gblock) reported Windows Azure[, Scott Guthrie] and Cloud9 IDE at Node Summit in a 1/24/2012 post:

imageLast month we launched our new Windows Azure SDK for Node.js. The release came after months of hard work between Microsoft and Joyent. Since that time we’ve seen a lot of excitement in the Node community around the support for Node.js in Windows Azure. We’re thankful for all the support!

imageToday at the Node Summit in San Franciso, Scott Guthrie demonstrated the SDK, which provides a streamlined experience for Windows developers to build and deploy Node.js apps to Windows Azure using PowerShell cmdlets and their editor of choice. Scott also showed the “Azure” npm module, which enables developers hosting Node apps in any environment to utilize Windows Azure Storage services like table, queues and blob. You can find out more about the PowerShell tools and the npm package at our dev center.

Additionally, Scott showed a new way to deploy to Azure, Cloud9 IDE!

Cloud9 IDE offers a cross-platform, browser-based development environment for Node.js. It is one of the de-facto tools for Node developers today. Cloud9 runs completely in the browser, and it’s available to developers working on any OS. In the second part of his keynote, Scott demonstrated using Cloud9 IDE on a Mac to build and deploy an application to Azure.

With Cloud9 IDE you can easily create a new Node application, connect it to your Windows Azure account, and deploy. Cloud9 makes it easier for you by packaging up source, creating your hosted service, and publishing the package. It supports publishing to Staging and Production and offers Windows Azure portal integration. Combining that with Cloud9’s integration with distributed version control providers like GitHub and BitBucket offers a fantastic experience!

Below you can see a screenshot of the new Cloud9 experience.

Along with the announcement, we’ve published a brand new tutorial on our Node.js dev center to show you how easy it is to get started developing for Windows Azure in Cloud9. In addition, check out these resources from Cloud9 about their Windows Azure support.

We are very excited about the collaboration with Cloud9 and the opportunity to offer both Windows and non-Windows developers an awesome experience developing for Windows Azure.

Read more about this announcement in the most recent posts on the Cloud9 and Interoperability @ Microsoft blogs.


<Return to section navigation list>

Other Cloud Computing Platforms and Services

Jeff Barr (@jeffbarr) reported The AWS Storage Gateway - Integrate Your Existing On-Premises Applications with AWS Cloud Storage on 1/25/2012:

imageWarning: If you don't have a data center, or if all of your IT infrastructure is already in the cloud, you may not need to read this post! But feel free to pass it along to your friends and colleagues.

The Storage Gateway
imageOur new AWS Storage Gateway service connects an on-premise software appliance with cloud-based storage to integrate your existing on-premises applications with the AWS storage infrastructure in a seamless, secure, and transparent fashion. Watch this video for an introduction:

Data stored in your current data center can be backed up to Amazon S3, where it is stored as Amazon EBS snapshots. Once there, you will benefit from S3's low cost and intrinsic redundancy. In the event you need to retrieve a backup of your data, you can easily restore these snapshots locally to your on-premises hardware. You can also access them as Amazon EBS volumes, enabling you to easily mirror data between your on-premises and Amazon EC2-based applications.

You can install the AWS Storage Gateway's software appliance on a host machine in your data center. Here's how all of the pieces fit together:

The AWS Storage Gateway allows you to create storage volumes and attach these volumes as iSCSI devices to your on-premises application servers. The volumes can be Gateway-Stored (right now) or Gateway-Cached (soon) volumes. Gateway-Stored volumes retain a complete copy of the volume on the local storage attached to the on-premises host, while uploading backup snapshots to Amazon S3. This provides low-latency access to your entire data set while providing durable off-site backups. Gateway-Cached volumes will use the local storage as a cache for frequently-accessed data; the definitive copy of the data will live in the cloud. This will allow you to offload your storage to Amazon S3 while preserving low-latency access to your active data.

Gateways can connect to AWS directly or through a local proxy. You can connect through AWS Direct Connect if you would like, and you can also control the amount of inbound and outbound bandwidth consumed by each gateway. All data is compressed prior to upload.
Each gateway can support up to 12 volumes and a total of 12 TB of storage. You can have multiple gateways per account and you can choose to store data in our US East (Northern Virginia), US West (Northern California), US West (Oregon), EU (Ireland), Asia Pacific (Singapore), or Asia Pacific (Tokyo) Regions.
The first release of the AWS Storage Gateway takes the form of a VM image for VMware ESXi 4.1 (we plan on supporting other virtual environments in the future). Adequate local disk storage, either Direct Attached or SAN (Storage Area Network), is needed for your application storage (used by your iSCSI storage volumes) and working storage (data queued up for writing to AWS). We currently support mounting of our iSCSI storage volumes using the Microsoft Windows and Red Hat iSCSI Initiators.

Up and Running
During the installation and configuration process you will be able to create up to 12 iSCSI storage volumes per gateway. Once installed, each gateway will automatically download, install, and deploy updates and patches. This activity takes place during a maintenance window that you can set on a per-gateway basis.

The AWS Management Console includes complete support for the AWS Storage Gateway. You can create volumes, create and restore snapshots, and establish a schedule for snapshots. Snapshots can be scheduled at 1, 2, 4, 8, 12, or 24 hour intervals. Each gateway reports a number of metrics to Amazon CloudWatch for monitoring.

The snapshots are stored as Amazon EBS (Elastic Block Store) snapshots. You can create an EBS volume using a snapshot of one of your local gateway volumes, or vice versa. Does this give you any interesting ideas?

The Gateway in Action
I expect the AWS Storage Gateway will be put to use in all sorts of ways. Some that come to mind are:

  • Disaster Recovery and Business Continuity - You can reduce your investment in hardware set aside for Disaster Recovery using a cloud-based approach. You can send snapshots of your precious data to the cloud on a regular and frequent basis and you can use our VM Import service to move your virtual machine images to the cloud.
  • Backup - You can back up local data to the cloud without worrying about running out of storage space. It is easy to schedule the backups, and you don't have to arrange to ship tapes off-site or manage your own infrastructure in a second data center.
  • Data Migration - You can now move data from your data center to the cloud, and back, with ease.

Security Considerations
We believe that the AWS Storage Gateway will be at home in the enterprise, so I'll cover the inevitable security questions up front. Here are the facts:

  • Data traveling between AWS and each gateway is protected via SSL.
  • Data at rest (stored in Amazon S3) is encrypted using AES-256.
  • The iSCSI initiator authenticates itself to the target using CHAP (Challenge-Handshake Authentication protocol).

Costs
All AWS users are eligible for a free trial of the AWS Storage Gateway. After that, there is a charge of $125 per month for each activated gateway. The usual EBS snapshot storage rates apply ($0.14 per Gigabyte-month in the US-East Region), as do the usual AWS prices for outbound data transfer (there's no charge for inbound data transfer). More pricing information can be found on the Storage Gateway Home Page. If you are eligible for the AWS Free Usage Tier, you get up to 1 GB of free EBS snapshot storage per month as well as 15 GB of outbound data transfer.

On the Horizon
As I mentioned earlier, the first release of the AWS Storage Gateway supports Gateway-Stored volumes. We plan to add support for Gateway-Cached volumes in the coming months.

We'll add more features to our roadmap as soon as our users (this means you) start to use the AWS Storage Gateway and send feedback our way.

Learn More
You can visit the Storage Gateway Home Page or read the Storage Gateway User Guide to learn more.

We will be hosting a Storage Gateway webinar on Thursday, February 23rd. Please attend if you would like to learn more about the Storage Gateway and how it can be used for backup, disaster recover, and data mirroring scenarios. The webinar is free and open to all, but space is limited and you need to register!


Andrew R. Hickey (@andrewrhickey) reported IBM Joins Google, Microsoft In Cloud Productivity War in a 1/24/2012 post to CRN:

imageIBM is looking to flex its cloud muscle against rivals Google and Microsoft with a cloud-based productivity and collaboration play that pits the three tech titans against each other for cloud dominance.

imageDubbed IBM Docs, Big Blue's new cloud productivity offering, which IBM unveiled in a brief video, includes a word processing application, spreadsheets and slide presentation software. IBM Docs is tied into IBM's SmartCloud for Social Business suite, which is the new moniker bestowed upon IBM LotusLive.

imageIBM Docs puts IBM in a three-way battle for cloud productivity as it squares off in the cloud against Google Docs and Microsoft Office 365. According to IBM, IBM Docs is now in beta and will be available this year.

"IBM Docs allows organizations, both inside and outside the firewall, to simultaneously collaborate on word processing, spreadsheet and presentation documents in the cloud to improve productivity," IBM said in a statement. "IBM Docs authors will be able to store and share documents in IBM SmartCloud, co-edit documents in real time or assign users sections of the document so they can work privately easing the management of multiple revisions from multiple authors in team-based documents."

But cloud solution providers said it's too soon to tell whether IBM Docs will shake up the market, but it does validate that the cloud market is growing and desktop software is on its way out.

"Whether or not IBM Docs is a threat to Google Docs remains to be seen; they have a long way to go before reaching the mainstream mindshare and market share already achieved by Google," said Michael Cohn, founder and senior vice president of marketing for Atlanta-based Cloud Sherpas, a major Google cloud provider. "That said, IBM is a smart company. To see them enter the cloud productivity space is another clear indication that the days of desktop software are numbered."

Allen Falcon, CEO of Cumulus Global, a Westborough, Mass.-based cloud solution provider said IBM Docs is a logical extension of LotusLive as IBM looks to stem declining market share. Though he hasn't yet gotten a real feel for IBM Docs, Falcon added that "Lotus has long-enough been an also-ran in this space, so it will be interesting to see if IBM can gain traction."

IBM Docs comes as part of a sweeping cloud and social assault that IBM has planned in the new year. And at the heart of it is the ability to apply data analytics to social and cloud initiatives.

Along with access to IBM Docs, IBM SmartCloud for Social Business offers single-click access to social networking, file sharing, online meetings, e-mail, calendar and instant messaging to enable internal and external collaboration.

The Armonk, N.Y.-based tech giant also rallied around the new social revolution with the beta of the next release of its enterprise social networking platform, IBM Connections which offers wikis, blogs and activities for collaboration while also offering access to e-mail, calendar and business tasks from the social networking platform. The landing page in Connections offers a single point for users to view and interact with content from a third party along with their company's content.

IBM also launched IBM Connections Enterprise Content Edition, which is an integrated social content management solution combining social networking with enterprise content management, compliance and control.

"There is boundless opportunity for social business to transform how we connect people and processes, and increase the speed and flexibility of business," said Alistair Rennie, general manager of IBM's Social Business unit, in a statement. "A successful social business can break down barriers to collaboration and put social networking in the context of everyday work, from the device or delivery vehicle of your choice, to improve productivity and speed decision-making."


Jeff Barr (@jeffbarr) described how to Launch Relational Database Service Instances in the Virtual Private Cloud

imageYou can now launch Amazon Relational Database Service (RDS) DB instances inside of a Virtual Private Cloud (VPC).

Some Background
imageThe Relational Database Service takes care of all of the messiness associated with running a relational database. You don't have to worry about finding and configuring hardware, installing an operating system or a database engine, setting up backups, arranging for fault detection and failover, or scaling compute or storage as your needs change.

The Virtual Private Cloud lets you create a private, isolated section of the AWS Cloud. You have complete control over IP address ranges, subnetting, routing tables, and network gateways to your own data center and to the Internet.

Here We Go
Before you launch an RDS DB Instance inside of a VPC, you must first create the VPC and partition its IP address range in to the desired subnets. You can do this using the VPC wizard pictured above, the VPC command line tools, or the VPC APIs.

Then you need to create a DB Subnet Group. The Subnet Group should have at least one subnet in each Availability Zone of the target Region; it identifies the subnets (and the corresponding IP address ranges) where you would like to be able to run DB Instances within the VPC. This will allow a Multi-AZ deployment of RDS to create a new standby in another Availability Zone should the need arise. You need to do this even for Single-AZ deployments, just in case you want to convert them to Multi-AZ at some point.

You can create a DB Security Group, or you can use the default. The DB Security Group gives you control over access to your DB Instances; you can allow access from EC2 instances with specific EC2 Security Group or VPC Security Groups membership, or from designated ranges of IP addresses. You can also use VPC subnets and the associated network Access Control Lists (ACLs) if you'd like. You have a lot of control and a lot of flexibility.

The next step is to launch a DB Instance within the VPC while referencing the DB Subnet Group and a DB Security Group. With this release, you are able to use the MySQL DB engine (we plan to additional options over time). The DB Instance will have an Elastic Network Interface using an IP address selected from your DB Subnet Group. You can use the IP address to reach the instance if you'd like, but we recommend that you use the instance's DNS name instead since the IP address can change during failover of a Multi-AZ deployment.

Upgrading to VPC
If you are running an RDB DB Instance outside of a VPC, you can snapshot the DB Instance and then restore the snapshot into the DB Subnet Group of your choice. You cannot, however, access or use snapshots taken from within a VPC outside of the VPC. This is a restriction that we have put in to place for security reasons.

Use Cases and Access Options
You can put this new combination (RDS + VPC) to use in a variety of ways. Here are some suggestions:

  • Private DB Instances Within a VPC - This is the most obvious and straightforward use case, and is a perfect way to run corporate applications that are not intended to be accessed from the Internet.
  • Public facing Web Application with Private Database - Host the web site on a public-facing subnet and the DB Instances on a private subnet that has no Internet access. The application server and the RDB DB Instances will not have public IP addresses.

Your Turn
You can launch RDS instances in your VPCs today in all of the AWS Regions except AWS GovCloud (US). What are you waiting for?


Barton George (@Barton808) completed his series with Web Glossary part three: Infrastructure tier on 1/24/2011:

imageThis is the last in my three-part Web Glossary series. As I previously explained, in compiling this I pulled information from various and sundry sources across the Web including Wikipedia, community and company web sites and the brain of Cote.

imageThe idea behind the glossary is to help our teams get a better understand of the wild and wacky world of the Web and Web developers as we move forward with our Web|Tech vertical. I figured I might as also share it with a few friends.

Today’s focus, having worked our way down from the top, is the infrastructure tier (with a short catch-all bucket at the end , “Misc.”)

Infrastructure

General Terms

  • DevOps: The goal of the DevOps movement is to drive out inefficiency in web shops by bridging the gap (and lessening conflict) between traditional development activity and operations activity. It seeks to address this issue by providing tools and practices to bring these two groups closer together and provide for greater automation of processes. Key tools in this effort are Opscode’s Chef and Puppet lab’s Puppet which automate the set-up and management of infrastructure.
  • PUE: Power Usage Effectiveness is a measure of how efficiently a computer data center uses its power; specifically, how much of the power is actually used by the computing equipment (in contrast to cooling and other overhead). PUE is the ratio of total amount of power used by a computer data center facility to the power delivered to computing equipment. The closer to 1.0, the better the PUE.
  • Distributed management: refers to the setup, provisioning, maintenance and management of the scale-out infrastructure (either physical or virtual) that has historically been characteristic of web firms and is increasing typical within traditional enterprise customers. This includes players like Chef and Puppet for provisioning and configuration, New Relic and Splunk for monitoring and management, and Loggly/Eucalyptus/OpenStack/ VMware for management monitoring.

Projects/Entities

  • Crowbar: Crowbar is a Dell-developed open source software framework designed to speed up the installation and configuration of open source cloud software onto bare metal systems. By automating the process, Crowbar can reduce the time needed for installation from days to hours. The software is modular in design so while the basic functionality is in Crowbar itself, “barclamps” sit on top of it to allow it work with a variety of projects. There have been barclamps built for OpenStack, Hadoop, CloudFoundry and Dreamhost.
  • Ubuntu: The most popular desktop linux distribution. On the server side they are supporting OpenStack and have an offering called the Ubuntu Enterprise Cloud. Backed by the commercial company Canonical.
  • Puppet: a configuration management tool designed to automate the set up and management of infrastructure. A key DevOps tool. It is produced by Puppet labs
  • Chef: a configuration management tool designed to automate the set up and management of infrastructure. A key DevOps tool. It is produced by Opscode, who hosts a cloud-based version of Chef called the Opscode Platform.
  • Nagios: a popular open source computer system and network monitoring software application. It watches hosts and services, alerting users when things go wrong and again when they get better.
  • Ganglia: an open source scalable distributed monitoring system for high-performance computing systems such as clusters and grids.

Misc

  • LAMP stack: Open source stack that provides a viable general purpose web server. The name comes from the first letters of its components: Linux, Apache web server, MySQL and PHP (or Perl or Python). LAMP has become a de facto development standard and is an excellent example of how open source software has made its way into enterprise environments through unofficial channels.
  • Apache Software Foundation: A decentralized group of developers that produce open source software under the Apache license. Notable projects include: Apache web server, Hadoop, CouchDB, Cassandra, Tomcat, Subversion
  • Nginx: an open source web server that recently has been gaining considerable traction
  • Recipes: They encapsulate collections of software resources which are executed in the order defined to configure a system.

Extra-credit reading


Werner Vogels (@werner) described Expanding the Cloud - The AWS Storage Gateway in a 1/23/2012 post to his All Things Distributed blog:

imageToday Amazon Web Services has launched the AWS Storage Gateway, making the power of secure and reliable cloud storage accessible from customers’ on-premises applications.

We have been working closely with our customers on their requests to bring the power of the Amazon Web Services cloud closer to their existing on-premises compute infrastructures. The Amazon Virtual Private Cloud extends on-premises compute with all the power of AWS, making it elastic, scalable and highly reliable. AWS Identity and Access Management brings together on-premises and cloud identity management. VM Import allows our customers to move virtual machine images from their datacenters to the Cloud and Amazon Direct Connect makes the network latencies and bandwidth between on-premises and AWS more predictable. With the launch of the AWS Storage Gateway our customers can now integrate their on-premises IT environment with AWS’s storage infrastructure.

imageThe AWS Storage Gateway is a service connecting an on-premises software appliance with cloud-based storage. Once the AWS Storage Gateway’s software appliance is installed on a local host, you can mount Storage Gateway volumes to your on-premises application servers as iSCSI devices, enabling a wide variety of systems and applications to make use of them. Data written to these volumes is maintained on your on-premises storage hardware while being asynchronously backed up to AWS, where it is stored in Amazon S3 in the form of Amazon EBS snapshots. Snapshots are encrypted to make sure that customers do not have to worry about encrypting sensitive data themselves. When customers need to retrieve data, they can restore snapshots locally, or create Amazon EBS volumes from snapshots for use with applications running in Amazon EC2.

Here are three example use cases that we envision for the AWS Storage Gateway. The first one is using the AWS Storage Gateway to back up your data to Amazon S3’s highly reliable storage environment. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, redundantly storing your data on multiple devices across multiple facilities in an AWS Region. So, backing up your data to Amazon S3 means a lot less headaches worrying about your local storage environment.

The second use case is where customers want to move data between local infrastructure and the Amazon Web Services cloud to provide access to applications and other computations running in Amazon EC2. The use of the Amazon EBS snapshot format means the data that was on-premises can be restored as an Amazon EBS volume mounted to an Amazon EC2 instance.

The third use case, cloud-based Disaster Recovery, is a specific variation of the previous two. If there is a failure in your local infrastructure, you can quickly launch a DR environment in Amazon EC2 which will have full access to the data snapshots backed up into Amazon S3 by the AWS Storage Gateway.

For more information on the AWS Storage Gateway, you can visit the detail page Jeff Barr over at the AWS Developer Blog has more details.


Jo Maitland (@JoMaitlandSF) posted Updated: AWS DynamoDB and the eventual consistency issue to GigaOm Pro (subscription or free trial required) on 1/19/2012:

imageAmazon’s new NoSQL database service DynamoDB launched this week, not with a bang but a whimper. Minutes into the live stream announcing the service, the video link went down, which was a bummer for the hundreds of people who tuned in and kicked off lots of jokes on Twitter about cloud database services and the notion of “eventual consistency” being a synonym for “inconsistency.” DynamoDB is a NoSQL database service that will run in the AWS cloud, but in the world of cloud computing and the Internet at large, predictability and consistency are often a stretch. What does this mean for developers using the service?

imageBehind the jokes is an interesting and complex issue that companies using cloud services will need to embrace. We have come to expect that modern distributed systems supporting large web applications must provide low read and write latency. Think of entering your info when purchasing something on the web: One or two seconds too long and you’re out and onto the next website. To achieve this low latency, cloud systems often eschew protocols that guarantee consistency and instead opt for eventual consistency protocols. This means there is no guarantee on the recency of the version of data you are seeing except that the system will “eventually” return the most recent version in the absence of new writes. But how “eventual” is eventual consistency? …

Jo continues with an update to the post.

Full Disclosure: I’m a GigaOm Pro analyst.


<Return to section navigation list>

by Roger Jennings (--rj) (noreply@blogger.com) at January 25, 2012 04:47 PM

Links to My Cloud Computing Tips at TechTarget’s SearchCloudComputing Site

imageI’m a regular contributor of tips and techniques articles for cloud computing development and strategy to TechTarget’s (@TechTarget) SearchCloudComputing site. The following table lists the topics I’ve covered to date:

Date Title
1/24/2011 Microsoft cloud service lets citizen developers crunch big data (“Data Explorer”)
12/1/2011 Microsoft tests Social Analytics experimental cloud (Project “Social Analytics”)
11/7/2011 Google, IBM, Oracle want piece of big data in the cloud
9/15/2011 Developments in the Azure and Windows Server 8 pairing (from //BUILD/)
9/8/2011 DevOps: Keep tabs on cloud-based app performance (Resources links)
8/2011 Microsoft's, Google's big data [analytics] plans give IT an edge (Resources links)
7/2011 Connecting cloud data sources with the OData API
7/2011 Sharding relational databases in the cloud
6/2011 Choosing a cloud data store for big data
4/2011 Microsoft brings rapid application development to the cloud
3/2011 How DevOps brings order to a cloud-oriented world
2/2011 Choosing from the major Platform as a Service providers
2/2011 How much are free cloud computing services worth?

Updated 1/25/2012 for “Social Analytics” article.

imageI’ll update this table with new articles as the SearchCloudComputing editors post them.

Links to my cover stories for 1105 Media’s Visual Studio Magazine from November 2003 to the present are here.

by Roger Jennings (--rj) (noreply@blogger.com) at January 25, 2012 03:49 PM

ReadWriteCloud

Gravatar

Tibbr Has a New Twist on Geolocation


We haven't written much about TIBCO's enterprise social media tool tibbr since a year ago. But they have interesting news, including updates to the service, that they are announcing today with v3.5, scheduled to be available next month.

Sponsor

Perhaps the most unusual aspect of this version is its geo-location service that checks in the location to you, rather than the other way around as Foursquare and Facebook Places et al. do. So for example a gate at an airport can become the context for all sorts of events, including the agents servicing that gate and the other ramp operations that are happening outside on the airfield. Or an oil well can be used in the context of being able to find experts in particular skillsets who are in the vicinity that can service that well. Or a pallet of goods can check in to a warehouse for a supply manager. As one TIBCO product manager told me, "It isn't about Billie's vacation pictures or what you have eaten in the company cafeteria, but what you are doing every day for your job responsibilities. "

The idea is to turn a location into a data hub and make this data easier to consume. The new version can also work with augmented reality displays to show who is nearby, for example. Here is what the current version looks like:

tibbr-new.jpg

tibbr continues to have very simple pricing model of $12 per user per month, either in the cloud or on premises.

Discuss

by David Strom at January 25, 2012 02:00 PM

Lori MacVittie

Gravatar

The Mobile Chimera

#mobile #vdi #IPv6 In the case of technology – as with mythology - the whole is often greater (and more challenging) than the sum of its parts.

chimera

The chimera is a mythological beast of scary proportions. Not only is it fairly large, but it’s also got three, independent heads – traditionally a lion, a goat, and a snake. Some variations on this theme exist, but the basic principle remains: it’s a three-headed, angry beast that should not be taken lightly should one encounter it in the hallway.

Individually, one might have a strategy to meet the challenge of a lion or a goat head on. But when they converge into one very angry and dangerous beast, the strategies and tactics employed to best any one of them will almost certainly not work to address all three of them simultaneously.

The world of mobility is rapidly approaching its own technological chimera, one comprised of three individual technology trends. While successful stratagem and tactics exist which address each one individually, when taken together they form a new challenge requiring a new strategic approach.

THE MOBILE CHIMERA

Three technology trends - VDI, mobile, and IPv6 - are rapidly converging upon the enterprise. Each is driven in part by the other, and each requires in part functionality and support of another. Addressing the challenges accompanying this trifecta requires a serious evaluation of the enterprise infrastructure with an eye toward performance, scalability, and flexibility, less it be overwhelmed by demand originating both internally and externally.

Mobile

The myriad articles, blogs, and editorial orations on mobile device growth have to date focused on the need for organizations to step up and accept the need for device-ready enterprise applications. This focus has thus far ignored the reality of the diversity of the device client base, the ramifications of which those with long careers in IT will painfully recall from the client-server era. Thus it is no surprise that interest in and adoption of technology such as VDI is on the rise, as virtualization serves as a popular solution to the problem of delivering applications to a highly-diverse set of clients.

But virtualization, as popular a solution as it may be, is not a panacea. Security and control over corporate resources and applications is a growing necessity today because of the ease with which users can take advantage of mobile technology to access them.

Access control does not entirely solve the challenges of a diverse mobile client audience, as attackers turn their attention on mobile platforms as a means to gain access to resources and data previously beyond their reach. The need for endpoint security inspection continues to grow as the threat posed by mobile devices continues to rear its ugly head.

VDI

It was inevitable that the growth of mobile device usage in the enterprise continued to grow that so, too, would the solution of VDI grow as the most efficient way to deliver applications without requiring mobile platform-specific versions. The desire by business owners and security practitioners to keep data securely within the data center "walls", too, is a factor in the rising desire to deploy VDI. VDI enables organizations to deliver applications remotely while maintaining control over data inside the data center, preserving enforcement of corporate security policies and minimizing risk.

But VDI deployments are not trivial, regardless of the virtualization platform chosen. Each virtualization solution has its challenges and most of those challenges revolve around the infrastructure necessary to support such an initiative. Scalability and flexibility are important facets of VDI delivery infrastructure, and performance cannot be overlooked if such deployments are to be considered successful.

IPv6

Who could forget that the Internet is being pressured to move to IPv6 sooner rather than later, in part because of the growth of mobile clients? The strain placed on service providers to maintain IPv4 support as a means to not "break the Internet" can only be borne so long before IPv6 becomes, as has been predicted, the Y2K for the network.

The ability to deliver applications via VDI to mobile devices will soon require support for IPv6, but will not obviate the need to support IPv4 just yet. A dual stack approach will be required during the transition period, putting delivery infrastructure again front and center in the battle to deploy and support applications for mobile devices.

With all accounts numbering mobile devices in the four billion range across multiple platforms and effectively 0 IPv4 addresses left to assign to those devices, it should be no surprise that as these three technology trends collide the result will be the need for a new mobility strategy. 

This is why solutions are strategic and technology is tactical. There exist individual products that easily solve each of these problems individually, but very few solutions that address the combined juggernaut that is the three combined. It is necessary to coordinate and architect a solution that can solve all three challenges simultaneously as a means to combat complexity and its associated best friend forever, operational risk.

A flexible and scalable delivery strategy will be necessary to ensure performance and security without sacrificing operational efficiency.


Connect with Lori: Connect with F5:
o_linkedin[1] google  o_rss[1] o_facebook[1] o_twitter[1]   o_facebook[1] o_twitter[1] o_slideshare[1] o_youtube[1] google

Related blogs & articles:


by Lori MacVittie at January 25, 2012 11:56 AM

Amazon Web Services

Gravatar

The AWS Storage Gateway - Integrate Your Existing On-Premises Applications with AWS Cloud Storage

Warning: If you don't have a data center, or if all of your IT infrastructure is already in the cloud, you may not need to read this post! But feel free to pass it along to your friends and colleagues.

The Storage Gateway
Our new AWS Storage Gateway service connects an on-premise software appliance with cloud-based storage to integrate your existing on-premises applications with the AWS storage infrastructure in a seamless, secure, and transparent fashion. Watch this video for an introduction:

Data stored in your current data center can be backed up to Amazon S3, where it is stored as Amazon EBS snapshots. Once there, you will benefit from S3's low cost and intrinsic redundancy. In the event you need to retrieve a backup of your data, you can easily restore these snapshots locally to your on-premises hardware. You can also access them as Amazon EBS volumes, enabling you to easily mirror data between your on-premises and Amazon EC2-based applications.

You can install the AWS Storage Gateway's software appliance on a host machine in your data center. Here's how all of the pieces fit together:

 

The AWS Storage Gateway allows you to create storage volumes and attach these volumes as iSCSI devices to your on-premises application servers. The volumes can be Gateway-Stored (right now) or Gateway-Cached (soon) volumes. Gateway-Stored volumes retain a complete copy of the volume on the local storage attached to the on-premises host, while uploading backup snapshots to Amazon S3. This provides low-latency access to your entire data set while providing durable off-site backups. Gateway-Cached volumes will use the local storage as a cache for frequently-accessed data; the definitive copy of the data will live in the cloud. This will allow you to offload your storage to Amazon S3 while preserving low-latency access to your active data.

Gateways can connect to AWS directly or through a local proxy. You can connect through AWS Direct Connect if you would like, and you can also control the amount of inbound and outbound bandwidth consumed by each gateway. All data is compressed prior to upload.

Each gateway can support up to 12 volumes and a total of 12 TB of storage. You can have multiple gateways per account and you can choose to store data in our US East (Northern Virginia), US West (Northern California), US West (Oregon), EU (Ireland), Asia Pacific (Singapore), or Asia Pacific (Tokyo) Regions.

The first release of the AWS Storage Gateway takes the form of a VM image for VMware ESXi 4.1 (we plan on supporting other virtual environments in the future). Adequate local disk storage, either Direct Attached or SAN (Storage Area Network), is needed for your application storage (used by your iSCSI storage volumes) and working storage (data queued up for writing to AWS). We currently support mounting of our iSCSI storage volumes using the Microsoft Windows and Red Hat iSCSI Initiators.

Up and Running
During the installation and configuration process you will be able to create up to 12 iSCSI storage volumes per gateway. Once installed, each gateway will automatically download, install, and deploy updates and patches. This activity takes place during a maintenance window that you can set on a per-gateway basis.

The AWS Management Console includes complete support for the AWS Storage Gateway. You can create volumes, create and restore snapshots, and establish a schedule for snapshots. Snapshots can be scheduled at 1, 2, 4, 8, 12, or 24 hour intervals. Each gateway reports a number of metrics to Amazon CloudWatch for monitoring.

The snapshots are stored as Amazon EBS (Elastic Block Store) snapshots. You can create an EBS volume using a snapshot of one of your local gateway volumes, or vice versa. Does this give you any interesting ideas?

The Gateway in Action
I expect the AWS Storage Gateway will be put to use in all sorts of ways. Some that come to mind are:

  • Disaster Recovery and Business Continuity - You can reduce your investment in hardware set aside for Disaster Recovery using a cloud-based approach. You can send snapshots of your precious data to the cloud on a regular and frequent basis and you can use our VM Import service to move your virtual machine images to the cloud.
  • Backup - You can back up local data to the cloud without worrying about running out of storage space. It is easy to schedule the backups, and you don't have to arrange to ship tapes off-site or manage your own infrastructure in a second data center.
  • Data Migration - You can now move data from your data center to the cloud, and back, with ease.

Security Considerations
We believe that the AWS Storage Gateway will be at home in the enterprise, so I'll cover the inevitable security questions up front. Here are the facts:

  • Data traveling between AWS and each gateway is protected via SSL.
  • Data at rest (stored in Amazon S3) is encrypted using AES-256.
  • The iSCSI initiator authenticates itself to the target using CHAP (Challenge-Handshake Authentication protocol).

Costs
All AWS users are eligible for a free trial of the AWS Storage Gateway. After that, there is a charge of $125 per month for each activated gateway. The usual EBS snapshot storage rates apply ($0.14 per Gigabyte-month in the US-East Region), as do the usual AWS prices for outbound data transfer (there's no charge for inbound data transfer). More pricing information can be found on the Storage Gateway Home Page. If you are eligible for the AWS Free Usage Tier, you get up to 1 GB of free EBS snapshot storage per month as well as 15 GB of outbound data transfer.

On the Horizon
As I mentioned earlier, the first release of the AWS Storage Gateway supports Gateway-Stored volumes. We plan to add support for Gateway-Cached volumes in the coming months.

We'll add more features to our roadmap as soon as our users (this means you) start to use the AWS Storage Gateway and send feedback our way.

Learn More
You can visit the Storage Gateway Home Page or read the Storage Gateway User Guide to learn more.

We will be hosting a Storage Gateway webinar on Thursday, February 23rd. Please attend if you would like to learn more about the Storage Gateway and how it can be used for backup, disaster recover, and data mirroring scenarios. The webinar is free and open to all, but space is limited and you need to register!

-- Jeff;

by AWS Evangelist at January 25, 2012 08:56 AM

Launch Relational Database Service Instances in the Virtual Private Cloud

You can now launch Amazon Relational Database Service (RDS) DB instances inside of a Virtual Private Cloud (VPC).

Some Background
The Relational Database Service takes care of all of the messiness associated with running a relational database. You don't have to worry about finding and configuring hardware, installing an operating system or a database engine, setting up backups, arranging for fault detection and failover, or scaling compute or storage as your needs change.

The Virtual Private Cloud lets you create a private, isolated section of the AWS Cloud. You have complete control over IP address ranges, subnetting, routing tables, and network gateways to your own data center and to the Internet.

Here We Go
Before you launch an RDS DB Instance inside of a VPC, you must first create the VPC and partition its IP address range in to the desired subnets. You can do this using the VPC wizard pictured above, the VPC command line tools, or the VPC APIs.

Then you need to create a DB Subnet Group. The Subnet Group should have at least one subnet in each Availability Zone of the target Region; it identifies the subnets (and the corresponding IP address ranges) where you would like to be able to run DB Instances within the VPC. This will allow a Multi-AZ deployment of RDS to create a new standby in another Availability Zone should the need arise. You need to do this even for Single-AZ deployments, just in case you want to convert them to Multi-AZ at some point.

You can create a DB Security Group, or you can use the default. The DB Security Group gives you control over access to your DB Instances; you can allow access from EC2 instances with specific EC2 Security Group or VPC Security Groups membership, or from designated ranges of IP addresses. You can also use VPC subnets and the associated network Access Control Lists (ACLs) if you'd like. You have a lot of control and a lot of flexibility.

The next step is to launch a DB Instance within the VPC while referencing the DB Subnet Group and a DB Security Group. With this release, you are able to use the MySQL DB engine (we plan to additional options over time). The DB Instance will have an Elastic Network Interface using an IP address selected from your DB Subnet Group. You can use the IP address to reach the instance if you'd like, but we recommend that you use the instance's DNS name instead since the IP address can change during failover of a Multi-AZ deployment.

Upgrading to VPC
If you are running an RDB DB Instance outside of a VPC, you can snapshot the DB Instance and then restore the snapshot into the DB Subnet Group of your choice. You cannot, however, access or use snapshots taken from within a VPC outside of the VPC. This is a restriction that we have put in to place for security reasons.

Use Cases and Access Options
You can put this new combination (RDS + VPC) to use in a variety of ways. Here are some suggestions:

  • Private DB Instances Within a VPC - This is the most obvious and straightforward use case, and is a perfect way to run corporate applications that are not intended to be accessed from the Internet.
  • Public facing Web Application with Private Database - Host the web site on a public-facing subnet and the DB Instances on a private subnet that has no Internet access. The application server and the RDB DB Instances will not have public IP addresses.

Your Turn
You can launch RDS instances in your VPCs today in all of the AWS Regions except AWS GovCloud (US). What are you waiting for?

-- Jeff;

 

by AWS Evangelist at January 25, 2012 01:34 AM

ReadWriteCloud

Gravatar

Cloud Roundup for January 24, 2012

Hadoop logo 150x150Craigslist loves Perl, Amazon wants to help customers use geo-blocking, and if you're looking for an overview of Hadoop solutions then we've got a good link for you.

Geo-Blocking Content With Amazon CloudFront – Geo-targeting has its good and bad side. I'll let you decide where geo-blocking content falls. If it's something your company needs to do, though, Amazon has a short post by Nihar Bihani of the CloudFront team on using geo-blocking for content with CloudFront.

Sponsor

Big data market survey: Hadoop solutions – Edd Dumbill has a survey of the leading Hadoop distributions and solutions. Covers Cloudera, EMC Greenplum, Hortonworks, IBM, MapR, Microsoft and Platform Computing.

Node 0.7.1 Released – Another unstable release from the Node.js folks that brings Node up to V8 3.8.8 and has a number of other fixes and improvements.

Craigslist Charitable Fund Donates $100,000 to the Perl Foundation – A few weeks ago, I caught a link to a site that claims Perl is only being used on 1% of the top n-something Web sites. While I'm fairly confident Perl usage is down from its peak in the early 2000s, I'm very skeptical that it's quite that low. At the very least, though, Craigslist seems pretty interested in the future of Perl – kicking up $100,000 to support the Perl Foundation.

Have a cloud news tip for me? Drop me a note at jzb@readwriteweb.com or to @jzb on Twitter.

Discuss

by Joe Brockmeier at January 25, 2012 12:00 AM

January 24, 2012

ReadWriteCloud

Gravatar

The 4-Terabyte Data Object Store: CAStor's Latest Volley Against RAID

Caringo (150 sq).jpgLast month, we introduced you to a cloud-based backup system called CTERA - a practical demonstration of the flexibility of the underlying object storage platform. That platform, called CAStor, is essentially a mapping system for files stored over a widely distributed pool of clusters in the cloud. CAStor takes care of where things are located in the cloud; applications like CTERA map those locations using systems that make sense to humans.

Today, the company behind CAStor - Austin, Texas-based Caringo Inc. - altered the definition of "things" in that context, with the introduction to its customers of CAStor version 5.5. With the help of a little process learned from Web mechanics called chunked encoding, data centers will become able to store widely distributed chunks of files up to 4 TB in total length. It's part of Caringo's latest effort to squash RAID using the cloud as its weapon.

Sponsor

The idea of chunked encoding is for a system to begin storing an object as it streams in, rather than have it wait in a cache while its final size is ascertained. For a system that can accept a 4 TB single file, that's a lot of cache; and if the same system is to be used for objects that may also be infinitesimally small, a huge cache could actually work against you. So chunked encoding enables clusters to be provisioned while huge files are being uploaded. CAStor sorts it out once it receives the "zero byte" signaling everything's done.

120124 Caringo 5.5 01.jpg

Caringo's other upgrades for CAStor version 5.5 include a zero-provisioning system in the new Cluster Services Node (above) for setting up additional nodes in networks. Rather than manually install the CAStor software on every node in the network, the new system enables nodes added to the network to access and install the software for themselves.

These latest upgrades represent the latest effort by Caringo to reintroduce its customers to the notion of replication as a viable method for protecting data. Replication is a service that CAStor does perform, but which its end users never have to be concerned with. A July 2011 Caringo white paper on the subject (PDF available here) made the point that enterprises that came to depend on RAID1 architecture for redundant disk arrays started moving away from replication as costly and inefficient, perhaps believing that redundancy and replication together were... well, redundant.

120124 Caringo 5.5 02.jpg

The graph from that white paper demonstrates that CAStor can implement optimized per-object replication, as well as other resilience measures such as performance reserve (holding open a small amount of space for periodic defragmentation) and reserve storage pool space for "hot spares" when a RAID5/RAID6 rebuild becomes necessary (the right column, above), while consuming no more storage space than raw replication alone would have required under RAID1 (the left column).

"if you are using RAID 5 or 6 as a data resilience method it is only a matter of time before you experience data loss (if you haven't already)," the white paper concluded. "Replication will ensure the resilience and recoverability of your data for the full life cycle of your applications. The CAStor storage architecture delivers the value of replication without compromising capacity when compared to RAID alternatives. CAStor customer environments reap the benefits of future proofing of their storage investment spanning the entire life of the system for a superior TCO experience."

Caringo did concede at that time that RAID5/RAID6 configurations could be more efficient than CAStore with file systems where file size tended to be larger; the advantage swung back to CAStor's favor where file sizes were smaller. But that was before the implementation of chunked encoding for version 5.5, which could conceivably have swung the pendulum back in Caringo's direction.

Discuss

by Scott M. Fulton, III at January 24, 2012 10:30 PM

Running Out of Time for Monki Gras Tickets

redmonk-1.jpgThe RedMonk folks are getting ready to close the door on signups for The Monki Gras. The conference is scheduled for February 1st and 2nd in London, and features a delightful pairing of industry experts and beer. If you want to attend, you need to speak up today – the organizers are closing ticket sales on January 25th.

The Monki Gras is a follow-on conference to Monktoberfest, which took place last October in Portland, Maine. (As some would have it, "the Real Portland.")

Sponsor

What's It About?

The Monktoberfest conference was the first time RedMonk went about organizing a conference. As RedMonk's Stephen O'Grady likes to note, the conference started as a joke. What if they threw a conference that focused on beer as well as an agenda for developers?

Here's what happens: People show up. Also? They have a pretty good time while also learning quite a bit and sharing with other conference attendees. The "hallway track" is often the best track of any technical conference, and Monktoberfest put the hallway track front and center by arranging fantastic food and beer for the breaks and dinners. Oh, and the talks were quite good too.

One More Round

Monktoberfest was a resounding success, so they said that they'd do it again. This time, they're doing it in London a few days before another (slightly larger) developer-oriented conference: the Free and Open Source Software Developers' European Meeting (FOSDEM).

Once again, the focus is on tech and craft beer. This time, the beer focus is on the UK's "burgeoning craft beer startup scene" and one or two Belgian beers, perhaps. The talk focus has been expanded a bit, though. Many of the folks at Monktoberfest complained that the event was great, but there wasn't enough of it. This time around, the RedMonk crew is adding a half-day of talks on day two starting at 10 a.m. (not too early).

Some of the speakers you don't want to miss: Bit.ly's Matt LeMay will be doing the "Kitteh vs Chikin" talk again, Kohsuke Kawaguchi from CloudBees will be talking about Jenkins and building an OSS community. Mike Milinkovich of Eclipse will be discussing open source foundations, and Laura Merling of Alcatel-Lucent will speak on "how telcos got API religion and what comes next."

The Monki Gras will also feature folks from Lanyrd, Zendesk, Adobe PhoneGap, Joyent and others. I'll also be doing a talk on day two on how developers can "bootstrap" coverage for their projects.

Tickets for the event are £140.00 and registration closes tomorrow. See you in London!

Discuss

by Joe Brockmeier at January 24, 2012 09:00 PM

Cloud9 IDE to Enable Node.js App Posting to Windows Azure Cloud

node-150.pngAs the Windows Azure platform began branching out last year from support for purely Microsoft frameworks like .NET, going so far as to incorporate Java, one possibility that was overlooked at the time was to support JavaScript. The reason seemed obvious: JavaScript, as its creators would tell you, is a client language. Well, that's no longer true, now that Node.js makes it about as easy to write JavaScript for the V8 interpreter on the server as it is for V8 in Google Chrome on the client.

Last month, Azure demonstrated how much both its platform and its proprietor's attitude had matured by opening up support for Node.js. Today at a summit of Node.js developers in San Francisco, the maker of a SaaS-based IDE for developers, announced it has added the ability for developers to deploy Node.js apps to Azure.

Sponsor

Cloud9 began supporting Node.js on the Joyent cloud last July, and then added Node.js support for Heroku just last September.

The Cloud9 IDE provides all the basic functions that a developer would expect from an "Express" IDE, except it doesn't have to be installed anyplace. Taking a cue from cloud-delivered word processors, it provides a full development and debugging console, including the ability to set breakpoints and run immediate JS instructions using a console window. Previous editions of Cloud9 bore a greater similarity to Visual Studio, but the latest edition deployed now utilizes a more distinct style, with functional icons along the left side, a column for logging events in the middle, and the editor window as the rightmost two-thirds.

The development team released a set of videos this morning (part 1 of which appears above) showing how deployment and execution of a task remains a one-click process for Azure, just as it has been for Heroku and Joyent.

Cloud9 is typically available to developers through a kind of voluntary subscription model. A developer can opt to use the product for free, so long as he makes his source code available to others through an open source license shares his code with others through the Cloud9 public project

. Developers who wish to retain the right to use their own proprietary licenses pay $15 per month. A Cloud9 spokesperson confirmed to RWW this afternoon that both options remain available for developers deploying to Azure. Discuss

by Scott M. Fulton, III at January 24, 2012 08:00 PM

SearchCloudComputing (Carl Brooks)

Gravatar

Microsoft cloud service lets citizen developers crunch big data

Microsoft?s "Data Explorer" cloud service puts coding power into the hands citizen developers. It remains to be seen how deep into big data users get.

Add to digg Add to StumbleUpon Add to del.icio.us Add to Google

by Roger Jennings, Contributor(editor@searchcloudcomputing.com at January 24, 2012 05:37 PM

ReadWriteCloud

Gravatar

Cloud Roundup for January 23, 2012

bitnami-cloud-icon.jpgMuch has been said about Facebook's Timeline feature, but very little attention has been paid to the actual tech behind the feature. Timeline goes well beyond the scope of Facebook's previous profile pages and deals with years of Facebook activity. Starting this Fall, O'Reilly and Cloudera are going to be smooshing together their conferences, and Siddharth Anand has some thoughts on the state of NoSQL in 2012.

The State of NoSQL in 2012 – Anand has some thoughts on the limitations of today's NoSQL options. "Many of the NoSQL vendors view the "battle of NoSQL" to be akin to the RDBMS battle of the 80s, a winner-take-all battle. In the NoSQL world, it is by no means a winner-take-all battle. Distributed Systems are about compromises."

Sponsor

BitNami Cloud Tools now supports DynamoDB – BitNami has added support for DynamoDB with its BitNami Cloud Tools stack. Cloud Tools includes the most popular command line tools (and dependencies, of course) for working with the Amazon APIs for EC2, Beanstalk, RDS, SES and others.

Cloudera Teams With O'Reilly Media to Merge Hadoop World and Strata Conferences – It can be tough to attend all the trade shows that are relevant, so this is good news for Hadoop folks. Cloudera is going to be folding Hadoop World into the 2012 Strata Conference New York this Fall. The Strata Conference New York is being held October 24 and 25, the call for papers starts on February 28.

Building Timeline: Scaling up to hold your life story – I still haven't decided quite how I feel about Facebook's Timeline feature, but I do admit that the technical challenges behind it are quite interesting. Ryan Mack describes the tech behind Timeline, how its team whipped data into shape and how the project got started. "Timeline started as a Hackathon project in late 2010 with two full-time engineers, an engineering intern, and a designer building a working demo in a single night. The full team ramped up in early 2011, and the development team was split into design, front-end engineering, infrastructure engineering, and data migrations."

Have a cloud news tip for me? Drop me a note at jzb@readwriteweb.com or to @jzb on Twitter.

Discuss

by Joe Brockmeier at January 24, 2012 01:24 AM

January 23, 2012

ReadWriteCloud

Gravatar

PaaS Makes Progress in 2011

While Platform-as-a-Service (PaaS) has always had its cheerleaders - yours truly included - the harsh reality is that, commercially speaking, PaaS offerings have underperformed relative to expectations for several years running. This is particularly the case among enterprises, which have, by and large, turned a blind eye to the technology.

Sponsor

Past performance notwithstanding, many industry watchers have predicted 2012 to be the breakout year for PaaS in the enterprise. Gartner, for example, reportedly communicated at its November Application Architecture Development & Integration conference its belief that 2012 marks the beginning of a rise in PaaS adoption from almost zero (3% of enterprises) to nearly half of all enterprises (43%) in 2015.

While it remains to be seen whether 2012 goes down in the history books as the year PaaS makes good, much of the groundwork for PaaS' predicted success was laid in 2011. Here are some trends from the past year:

Heroku Beyond Ruby

Salesforce.com's acquisition of Heroku for $250M, an estimated 50-100x revenue, while announced in December 2010, set the stage for a year of brisk investment in PaaS. In January, on the heels of the acquisition, we saw a flurry of Heroku investments and product launches including PHPFog (PHP), Gondor.io (Python/Django), Nodejitsu (Node.js) and CloudBees (Java). With this activity, the reach of PaaS was significantly broadened.

The Rise of Polyglot Platforms

The early flood of single-language PaaS platforms gave way to a move towards multi-language platforms later in the year, perhaps precipitated by DotCloud's entrance in the market with a vision of "One Platform, Any Stack." Established providers like Red Hat OpenShift and Heroku broadly expanded platform support, while the aforementioned PHPFog relaunched as AppFog with a new multi-language platform.

Sam Charrington is the principal of CloudPulse Strategies, an analyst and consulting firm focusing exclusively on cloud computing, big data and related technologies and markets. He can be followed on Twitter at @samcharrington.

The rise of so-called polyglot PaaS platforms, while derided by some as stifling innovation, is significant in that it marks a departure from early "one size fits few" approaches to PaaS, towards something more flexible, familiar, accommodating, and with a bit less lock-in... Just what the enterprise user is looking for.

Enter Cloud Foundry

VMware's Cloud Foundry, which launched in April of 2011 and covered extensively in ReadWriteCloud, was not the first multi-language PaaS. Nor was it the first open-source PaaS, or the first PaaS backed by a major player in enterprise IT. It wasn't the first PaaS to be readily deployable in both public and private cloud environments. Nor was it the first PaaS to embrace the power of an extended ecosystem of developers and partners.

What made Cloud Foundry a game-changer in 2011 is the fact that it was the first PaaS to offer all of those things--on your laptop, in your data center, or in the cloud.

PaaS Ecosystems Flourish

Last but not least, the developer services ecosystems that have formed around the major offerings were expanded greatly in 2011. Users of these platforms can now easily add a wide variety of services such as caching, messaging and databases (SQL and NoSQL) to their applications. These ecosystems are powerful in that they simultaneously have made PaaS platforms more productive for developers and more profitable for providers, while they have reduced the threat of lock-in for users.

The role of these ecosystems is key, and I'm planning to explore this topic further in a future article.

In the meantime, I'll continue rooting for the success of PaaS customers and providers, with full knowledge that they are building upon solid foundations laid throughout the last year.

Discuss

by Sam Charrington at January 23, 2012 08:00 PM

InfiniBand Acquisition Puts Intel Back in the Networking Business

Intel logoTwo technologies have made the quantum speed leaps in high-performance computing possible. One is the rapid ascent of commercial, off-the-shelf (COTS) processors that made computing speed cheaper. The second is InfiniBand (IB), the switching technology that Sun Microsystems helped evolve into a fabric - the underlying infrastructure of a carrier-grade cloud.

Today, after an on-again, off-again relationship with InfiniBand that stretches back to its very beginning, Intel is back in the networking fabric business in a big way. With as big a message of "we're back" as you can send, the company has agreed to purchase the InfiniBand production assets, along with many of the employees, of QLogic. Analysts estimate the company to be the #2 player in the InfiniBand switch market with over one-fourth the global market. The deal has a reported value of $125 million.

Sponsor

You know Intel's hurting as a company whenever it sheds its assets in the networking department. Back in 2003, Intel found itself cancelling a once-promising project to build semiconductors for InfiniBand switching, the ultra-high-speed switching fabric it helped create with Sun Microsystems. During its 2006 reorganization, it first sold its IXP handheld network processor division to Marvell, then sold its optical networking systems division to Cortina.

120223 QLogic InfiniBand diagram.jpg

A diagram of the components of an InfiniBand-switched network. [Courtesy QLogic]


Consequently, you know Intel's on a healing streak when it gets back into networking in a big way. QLogic itself will not be acquired in this transaction. It appears the company will scale down, focusing more on producing device controllers and bus adapters. QLogic is believed to have more than half the market in Fibre Channel (FC) adapters, though its share of the switch market is significantly lower.

Signs of a warm relationship between Intel and QLogic began appearing in 2008, when Intel began sponsoring QLogic's "test track" facility for testing Intel processors on QLogic's fabric. The facility was made available for hardware makers and high-performance software developers to experiment with new configurations for InfiniBand switches and COTS processors - a technology that was about to go through the roof. The following year, the two companies declared themselves officially "allied."

Analysts soon saw that time as an opportunity for the then-#1 player in the IB switch market, Voltaire, to acquire QLogic outright. But last February, #2 player Mellanox acquired Voltaire whole for $208 million, giving the new company almost 60% market share in InfiniBand switches.

The question on investors' minds had become whether QLogic could compete. Just last September, it put everything it had toward one last, big market push, completely overhauling its IB and FC product lines for what it called its "Adaptive Convergence" strategy. That's where the company produces bus adapters and network adapters on the same chip. Analysts believe the strategy should keep QLogic ahead in the Fibre Channel market, though it did try to string InfiniBand along by tying the umbrella branding to a new line of IB switches as well.

Intel may have played its hand very smartly, acquiring a reputable production firm for what, for Intel, counts as a song - about 21% of QLogic's annual revenue for 2011. If anyone can compete with Mellanox using a QLogic product line, it's Intel.


Intel is a ReadWriteWeb sponsor. Discuss

by Scott M. Fulton, III at January 23, 2012 06:08 PM

Lori MacVittie

Gravatar

The API is the Center of the Application (Integration) Universe

#mobile #fasterapp #ccevent Today, at least. Tomorrow, who knows?

api is the center of the universe

Some have tried to distinguish between “mobile cloud” and “cloud” by claiming the former is the use of the web browser on a mobile device to access services while the latter uses device-native applications. Like all things cloud, the marketing fluff is purposefully obfuscating and sweeping under the rug the technology required to make things work for consumers, whether those consumers be your kids or IT professionals. Infrastructure is not eliminated when organizations take to the cloud nor do the constraints of web-based protocols and methodologies become irrelevant when Bob uses a service to store photos of his kid’s piano recital on Flickr.

The applications and web browsers on a mobile device are using the same technology, the same protocols, suffering under the same constraints as the rest of us in wireline land. If developers are as smart as they are lazy (and I say that as a compliment because it is the laziness of developers that more often than not leads to innovation) they have already moved to an API-centric model in which web site and device native-app interfaces both leverage the same APIs.

This isn’t just a social integration phenomenon – it isn’t just about Twitter and Facebook and Google. API usage and demand is growing, and it is not expected to stop any time soon. Given the option, developers asked about desire to connect to services (assuming service = API) the overwhelming response was developers would like to connect to “everything, if it were easy.”  (API Integration Pain Survey Results)  

The API is rapidly becoming (if it isn’t already) the center of the application (integration) universe. This unfortunately has the potential to cause confusion and chaos in the data center. When a single API is consumed by multiple clients – mobile, remote, applications, partners, etc.. – solutions unique to each quickly seem to make their way into the code to deal with “exceptions” and “peculiarities” inherent to the client platform.

That’s inefficient and, when one considers the growing number of platforms and form-factors associated with mobile communications alone, it is not scalable from a people and process perspective.

But reality is that these exceptions and peculiarities – often times caused by a lack of feature parity across form-factors and platforms – must be addressed somewhere, and that somewhere is unfortunately almost unilaterally determined to be the application. Do we need to treat mobile devices differently? In terms of performance and delivery concerns, yes. But that’s where we leverage the application delivery tier to differentiate by device to ensure delivery. That’s the beauty of an abstracted, service-enabled data center – there’s an intelligent and agile layer of application delivery services that mediates between clients (regardless of their form factor) and services to ensure that delivery needs (security, performance, and availability) are met in part by addressing the unique characteristics and reality of access via mobile devices.

api deliveryABSTRACT and ISOLATE

This is exactly the type of problem application delivery is designed to address. Multiple clients, multiple networks, all accessing the same application service or API but requiring specific authentication, security, and delivery characteristics to ensure that operational risk is mitigated in the most efficient manner possible.

This includes the ability to throttle services based on user and client, a common approach used by mega-sites such as Twitter. This includes the ability to provide single sign-on capabilities to all clients, regardless of platform, form-factor and support for enterprise-grade authentication integration to the same API or application service. This includes leveraging the appropriate security policies to ensure inbound and outbound security of data regardless of client, such that corporate data is not infected and spread to other consumers.

A flexible, scalable application delivery tier addresses the problem of a single API being utilized by a variety of clients in a way that precludes the need to codify specific functionality on a per-platform or form-factor basis in the application logic itself, making the API simpler and easier to maintain as well as test and upgrade. It makes APIs and application services more scalable in terms of people and processes, which in turn makes the development and deployment process more efficient and able to focus on new services rather than constantly modifying and updating existing ones.

Service-oriented architecture may have begun in the application demesne as a means to abstract and isolate services such that they could more easily be integrated, maintained, and changed without disruption, but the concept is applicable to the data center as a whole. By leveraging SOA concepts at the data center architecture level, the entire technological landscape of the business can be transformed into one that is ultimately more adaptable, more scalable, and more secure.


CC_logo_CMYK

I’ll be at CloudConnect 2012 and we’ll discuss the subject of cloud and performance a whole lot more at the show!

Sessions


Connect with Lori: Connect with F5:
o_linkedin[1] google  o_rss[1] o_facebook[1] o_twitter[1]   o_facebook[1] o_twitter[1] o_slideshare[1] o_youtube[1] google

Related blogs & articles:


by Lori MacVittie at January 23, 2012 12:42 PM

OakLeaf Systems

Gravatar

Windows Azure and Cloud Computing Posts for 1/20/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222

image433

• Updated 1/20/2012 with new articles marked .

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:


Azure Blob, Drive, Table, Queue, Numerics and Hadoop Services

• Ronnie Hoogerwerf (@rhoogerw) described Using Data in a 1/20/2012 post to the Microsoft Codename “Cloud Numerics” blog:

imageThis post contains the following topics:

Working with Arrays

imageYou can create either dense n-dimensional arrays or distributed dense n-dimensional arrays using Microsoft codename “Cloud Numerics” lab.

Creating Arrays

You can create dense arrays with Numerics.Local. For example:


using local = Microsoft.Numerics.Local;
var a = local.NumericDenseArrayFactory.

CreateFromSystemArray<double>( new double [,] { {-0.5, 1.0},
{ 0.5, 1.0} } );
Creating Distributed Arrays

You can create distributed dense arrays with Numerics.Distributed. For example:


using dist = Microsoft.Numerics.Distributed;
var c = new dist.NumericDenseArray<double>(a); // Explicit distributed data creation
Casting Arrays

You can cast from a distributed array to a local array. For example:

var d = c.ToLocalArray(); // Implict distributed data recast

You can also assign local data to distributed data. For example:


var a = local.NumericDenseArrayFactory.CreateFromSystemArray<double>( new double [,]
{ {-0.5, 1.0},
{ 0.5, 1.0} } );


dist.NumericDenseArray<double> c = a; // Assignment with backend distributed data

Loading Distributed Data from a File

The “Cloud Numerics” lab provides an interface you can implement for loading data from a file.

The steps to loading distributed data from a file are:

1. Create a class that returns an object that conforms to the Numerics.Distributed.IO.IParallelReader interface or else use or modify the Distributed.IO.CSVLoader class provided in the Cloud Numerics lab distribution.

2. Use the Distributed.IO.Loader.LoadData() method to load your data into a distributed dense array.

For more details, see the blog post titled Using the IParallelReader Interface.

Creating Distributed Arrays from Azure Blobs

For more information on Windows Azure Blob storage, navigate to the following Getting Started page http://www.microsoft.com/windowsazure/learn/get-started/

Creating Serial IO from Blobs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.WindowsAzure.StorageClient;
using msnl = Microsoft.Numerics.Local;
using msnd = Microsoft.Numerics.Distributed;

namespace ExampleWithSerialIO
{
class Program
{
// Sample blobs that hold matrices of random numbers as binary data
static string accountName = @"https://cloudnumericslab.blob.core.windows.net/";

// 1000-by-1000 matrix
static string blobAddress= @”https://cloudnumericslab.blob.core.windows.net/arraycollection/mediummatrix”;

// Method to read blob data and convert it into local NumericDenseArray of doubles
public static msnl.NumericDenseArray<double> ReadBlob()
{
long i,j;

// Get reference to blob
var blobClient = new CloudBlobClient(accountName);
var blob = blobClient.GetBlobReference(blobAddress);

// Read number of rows and columns from blob metadata
blob.FetchAttributes();
long rows = Convert.ToInt64(blob.Metadata["dimension0"]);
long columns = Convert.ToInt64(blob.Metadata["dimension1"]);

// Convert blob binary data to local NumericDenseArray
var outArray = msnl.NumericDenseArrayFactory.Create<double>(new long[] { rows, columns });
var blobData = blob.DownloadByteArray();
for (i = 0; i < rows; i++)
{
for (j = 0; j < columns; j++)
{
outArray[i, j] = BitConverter.ToDouble(blobData, (int)(i * columns + j) * 8);
}
}
return outArray;
}

static void Main()
{
// Initialize runtime
Microsoft.Numerics.NumericsRuntime.Initialize();

// Read data and implicitly cast to distributed array
msnd.NumericDenseArray<double> data = ReadBlob();

// Compute mean of dataset
var mean = Microsoft.Numerics.Statistics.Descriptive.Mean(data);

// Write result. When running on Windows Azure cluster,
// the output is available in job output
Console.WriteLine("Mean of data: {0}", mean);

// Shut down runtime
Microsoft.Numerics.NumericsRuntime.Shutdown();

}
}
}

Creating Distributed IO from Blobs
using System;
using System.Linq;
using msnl = Microsoft.Numerics.Local;
using msnd = Microsoft.Numerics.Distributed;
using Microsoft.Numerics;
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.StorageClient;

// A example method for reading an array from blob storage
// Each blob contains a piece of 2-D array

namespace AzureArrayReader
{
[Serializable()]
public class AzureArrayReader : msnd.IO.IParallelReader<double>
{
private string accountName;
private string containerName;

public AzureArrayReader(string accountName,string containerName)
{
this.accountName = accountName;
this.containerName = containerName;
}

// Assign blobs to MPI ranks
public object[] ComputeAssignment(int nranks)
{
Object[] blobs = new Object[nranks];

var blobClient = new CloudBlobClient(accountName);
var matrixContainer = blobClient.GetContainerReference(containerName);
var blobCount = matrixContainer.ListBlobs().Count();
int maxBlobsPerRank = (int)Math.Ceiling((double)blobCount / (double)nranks);
int currentBlob = 0;
for (int i = 0; i < nranks; i++)
{
int step = Math.Max(0,
Math.Min(maxBlobsPerRank,
blobCount - currentBlob) );
blobs[i] = new int[] { currentBlob, step };
currentBlob = currentBlob + step;
}
return (object[])blobs;
}

// Assume pieces are concatenated along column dimension
public int DistributedDimension
{
get { return 1; }
set { }
}

// Read data from blobs
public msnl.NumericDenseArray<double> ReadWorker(Object assignment)
{
var blobClient = new CloudBlobClient(accountName);
var matrixContainer = blobClient.GetContainerReference(containerName);
int[] blobs = (int[])assignment;
long i, j, k;
msnl.NumericDenseArray<double> outArray;
var firstBlob = matrixContainer.GetBlockBlobReference("slab0");
firstBlob.FetchAttributes();
long rows = Convert.ToInt64(firstBlob.Metadata["dimension0"]);
long[] columnsPerSlab = new long[blobs[1]];
if (blobs[1] > 0)
{
// Get blob metadata, validate that each piece has equal number of rows
for (i = 0; i < blobs[1]; i++)
{
var matrixBlob = matrixContainer.GetBlockBlobReference(
"slab" + (blobs[0] + i).ToString());
matrixBlob.FetchAttributes();
if (Convert.ToInt64(matrixBlob.Metadata["dimension0"]) != rows)
{
throw new System.IO.InvalidDataException("Invalid slab shape");
}
columnsPerSlab[i] =
Convert.ToInt64(matrixBlob.Metadata["dimension1"]);
}

// Construct output array
outArray =
msnl.NumericDenseArrayFactory.Create<double>(
new long[] { rows, columnsPerSlab.Sum() } );

// Read data
long columnCounter = 0;
for (i = 0; i < blobs[1]; i++)
{
var matrixBlob =
matrixContainer.GetBlobReference("slab" + (blobs[0] + i).ToString());
var blobData = matrixBlob.DownloadByteArray();
for (j = 0; j < columnsPerSlab[i]; j++)
{
for (k = 0; k < rows; k++)
{
outArray[k, columnCounter] =
BitConverter.ToDouble(blobData, (int)(j * rows + k) * 8);
}
columnCounter = columnCounter + 1;
}
}
}
else
{
// If a rank was assigned zero blobs, return empty array
outArray =
msnl.NumericDenseArrayFactory.Create<double>( new long[] {rows, 0 });
}
return outArray;
}
}
}
Accessing Data with LINQ

This section provides the following examples of how to use the C# LINQ extensions to access array data.

  • Extracting Selected Data by Index
  • Filtering out NaN Values
Extracting Selected Data by Index
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Numerics;
using Microsoft.Numerics.Local;

namespace HowToRecipes
{
class LINQtoNDAExtractExample
{
public static void Run()
{
// Create Numeric Dense Array
var numbers = NumericDenseArrayFactory.CreateFromSystemArray<int>(
new int[] { 1, 2, 3, 4, 5, 6 });
// Set indexes of start and end of the part to be extracted
int idxStart = 1;
int idxEnd = 4;

Console.WriteLine("All numbers: {0}", numbers);
Console.WriteLine("Start index: {0}, End index {1}", idxStart, idxEnd);

// Extract
NumericDenseArray<int> outArray =
NumericDenseArrayFactory.CreateFromSystemArray<int>(
numbers
.Where((x, i) => (i >= idxStart && i <= idxEnd))
.ToArray());

Console.WriteLine("Extracted array: {0}", outArray);
}
}
}
Filtering out NaN Values
using System;
using System.Linq;
using System.Collections;
using System.Collections.Generic;
using Microsoft.Numerics;
using Microsoft.Numerics.Local;

namespace HowToRecipes
{
class LINQtoNDATrimNaNsExample
{
public static void Run()
{
// Create Numeric dense array with NaNs
var sampleNan = NumericDenseArrayFactory.CreateFromSystemArray<double>(
new double[] { double.NaN, 1.0, 2.0, 3.0, double.NaN, 4.0, 5.0, 6.0 }
);
Console.WriteLine("Array with NaNs: {0}", sampleNan);

// Trim NaN
var cleanedNDA = NumericDenseArrayFactory.CreateFromSystemArray<double>(
sampleNan
.Where(x => (!double.IsNaN(x)))
.ToArray());

Console.WriteLine("Trimmed array: {0}", cleanedNDA);
}
}
}

The default “Cloud Numerics” C# project performs a Cholesky decomposition on a 50 x 50 element array. Stay tuned for my forthcoming Getting Acquainted with Microsoft Codename “Cloud Numerics” post.


• Avkash Chauhan (@avkashchauhan) posted Understanding Map/Reduce job in Apache Hadoop on Windows Azure (A Reverse Approach) on 1/20/2012:

imageWhen you run [a] Map/Reduce job in [a] Hadoop cluster on Windows Azure, you will get an aggregated progress and log directly on portal, so you can see what is happening with your job. This log is different [than] what [you] see when you  check individual job status in the datanode[. I]nstead this log gives you cumulative details about how the job was started and how it was completed.

image2012-01-20 22:27:37,646 [main] INFO org.apache.pig.Main - Logging error messages to: c:\apps\dist\bin\pig_1327098457646.log

image2012-01-20 22:27:38,036 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://10.114.178.123:9000

2012-01-20 22:27:38,443 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 10.114.178.123:9010

2012-01-20 22:27:38,661 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: ORDER_BY,LIMIT,NATIVE

2012-01-20 22:27:38,661 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.

2012-01-20 22:27:40,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: q2: Store(asv://hadoop/outjan20:org.apache.pig.builtin.PigStorage) - scope-12 Operator Key: scope-12)

2012-01-20 22:27:40,302 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false

2012-01-20 22:27:40,380 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 5

2012-01-20 22:27:40,380 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 5

2012-01-20 22:27:40,489 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job

2012-01-20 22:27:40,505 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

At this point verification related with your jobs are completed, and now the job can be sent to Job queue. If you have had any issue related with your data source or where you are going to store the results, you could not have come here.. reaching here means the verification is done map/reduce job is pushed in queue:

2012-01-20 22:27:41,724 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job

2012-01-20 22:27:41,755 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.

2012-01-20 22:27:42,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

Depend on your input(s), at this point the data from the data source is ready to process:

2012-01-20 22:27:43,771 [Thread-4] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2012-01-20 22:27:43,771 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

2012-01-20 22:27:43,802 [Thread-4] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1

At this point the input data source was processed and can be used later and Hadoop Job ID is associated with your job. If you look for this job ID inside you will see get more info about the tasks associated with this Job:

http://<datanode_ipaddress>:50030/jobdetailshistory.jsp?*

2012-01-20 22:27:44,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201201202036_0012

2012-01-20 22:27:44,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://10.114.178.123:50030/jobdetails.jsp?jobid=job_201201202036_0012

2012-01-20 22:28:18,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 10% complete

2012-01-20 22:28:30,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 20% complete

2012-01-20 22:28:35,130 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

At this point A new Job is being processed:

2012-01-20 22:28:35,473 [main] INFO org.apache.hadoop.mapred.JobClient - Running job: job_201201202036_0013

2012-01-20 22:28:36,473 [main] INFO org.apache.hadoop.mapred.JobClient - map 0% reduce 0%

2012-01-20 22:29:04,489 [main] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 0%

2012-01-20 22:29:25,489 [main] INFO org.apache.hadoop.mapred.JobClient - map 100% reduce 100%

2012-01-20 22:29:36,489 [main] INFO org.apache.hadoop.mapred.JobClient - Job complete: job_201201202036_0013

After the Job is completed, here are the Job details which are same as what you would find at cluster:

http://localhost:50030/jobdetailshistory.jsp?logFile=file:/c:/Apps/dist/logs/history/done/version-1/10.114.178.123_1327091770147_/2012/01/20/000000/job_201201202036_0013_1327098515411_avkash_MRjs

Visit: http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/JobCounter.html to learn more about Job counter.


<Return to section navigation list>

SQL Azure Database, Federations and Reporting

• Benjamin Guinebertière explained SQL Azure: create a login that has only access to one database | SQL Azure: créer un login qui n’a accès qu’à une base de données in a 1/20/2012 post. From the English version:

imageYou have a SQL Azure Server. You can access the databases thru the administrator login you specified while creating the SQL Azure Server.

In the SQL Azure server, you have the following databases:

  • master
  • DB001
  • DB002

imageYou would like to create a SQL Azure login that has fully access to DB001 but no access to the other databases.

Here is how to do that.

In the Windows Azure management portal, select the master database and click manage.

image

connect with the SQL Azure Administrator

image

Create the new login by entering the following statement in a new Query:

create login DB001Admin with password = 'IYtfidgu18';
go

(replace IYtfidgu18 by a password you choose)
then click Run

create login DB001Admin with password = 'IYtfidgu18';
go

image

Close the browser tab and go back to the Windows Azure management portal. Sele