EMC Isilon: Data Lake 2.0

`
ESG Solution Showcase
EMC Isilon: Data Lake 2.0
Date: November 2015 Author: Scott Sinclair, Analyst
Abstract: With the rise of new workloads such as big data analytics and the Internet of Things, data scales not only in the
data center, but also at enterprise edge locations and in the cloud. With the release of IsilonSD Edge and Isilon CloudPools,
EMC is extending data awareness and understanding outside of the data center to the next-generation data lake.
Introduction
When discussing the challenges of IT storage environments, identifying the underlying culprit can often be oversimplified
by focusing solely on the rapid rate of data growth. With the amount of data created and the length of time organizations
wish to store data increasing, the challenge of data growth is a very real phenomenon that can extend well beyond the
simple cost of storing and managing additional capacity. Higher levels of data growth can impact backup and protection
schemes, and create power and cooling challenges. While these challenges have created and will likely continue to create
concerns for IT storage leaders, many IT organizations are also grappling with an added layer of data storage complexity
resulting from the advent of new generation workloads such as business intelligence (or big data) analytics and the
Internet of Things (IoT).
Digital repositories for business intelligence analytics are often referred to as data lakes. While these architectures may
provide the scale to store the added influx of content, a greater level of flexibility and manageability may be required to
make data lake architecture truly effective. In many cases, these newer workloads extend the acts of data creation and
access well beyond the centralized and somewhat predictable confines of the centralized data center. As businesses
integrate IoT workloads, sensor data may be created at the edge (i.e., a remote site or system) just as often as it is created
within the data center. Additionally, as more departments in the business look to leverage business intelligence analytics,
broad access to digital content will likely be desired from a wider range of locations. As the viability of the traditional
storage silos looks to be coming to an end, global organizations appear to require the next generation of the data lake
architecture.
EMC, a market leader in storage, understands the evolving infrastructure demands of IT organizations and has augmented
its Isilon storage technology to enable the next generation of the data lake. With the release of IsilonSD Edge and Isilon
CloudPools, the capabilities of Isilon’s OneFS file system are extended well beyond the data center. IsilonSD Edge delivers
Isilon’s OneFS with a software-deployment model for a software-defined storage solution that can leverage new or existing
commodity hardware as well as help simplify storage manageability at the edge. CloudPools extend its capability to public
cloud deployments as well. The resulting solution allows a content repository to take advantage of the benefits of a public
cloud infrastructure while offering the seamless accessibility of data on-premises. With these two additions, Isilon delivers
a next-generation data lake offering with an expanded level of flexibility to serve a new generation of workloads.
This ESG Solution Showcase was commissioned by EMC and is distributed under license from ESG.
© 2015 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Solution Showcase: EMC Isilon: The Next-generation Data Lake
The Need for a New Storage Architecture
In recent years, IT organizations have often looked toward scale-out storage architectures as a means to keep pace with
the challenges of data growth. With the advent of big data analytics, these scale-out architectures added new capabilities
such as broader protocol support to serve a wider variety of applications. The goal was to deliver what some in the
industry refer to as a data lake—a single, scalable storage repository of digital content that can be leveraged for business
intelligence and big data analytics. Recently, however, new innovations are driving organizations to seek a more flexible
and capable storage infrastructure layer. For example, the rise of IoT workloads and the collection of sensor data have
expanded the breadth of locations where data may be created. The emergence of public cloud storage has enticed IT
organizations to migrate data off-premises to free up on-premises resources. The net result is an increased desire to
extend the data lake concept to a more flexible and more capable storage solution.
In an attempt to quantify some of these trends, ESG recently surveyed IT decision makers responsible for their
organizations’ data storage environments, which revealed a number of insights including:

The rapid rate of data growth continues to be a top storage challenge.

The application/ workload most widely identified as driving this data—and subsequent storage capacity—growth
spending over the next 24 months was business intelligence and analytics.

There is an early awareness of and focus on IoT and its potential impact on data storage infrastructure and
strategy.1
In other words, the demand for the data lake storage infrastructure will likely continue to increase. However, as mentioned
previously, the storage silo architecture, regardless of its scalability, will likely be sub-optimal for IT organizations as they
seek to extend their business intelligence capabilities. These organizations will likely require a next-generation data lake.
To address the growing data storage demands, next-generation data lake architectures must continue to be resilient and
highly available, but for global scale, geo-dispersed protection capabilities are ideal. Planned or unplanned downtime of
the data lake can have a critical impact on the business. Additionally, the next generation data lake cannot simply be
isolated to the data center; it should integrate data from the edge and leverage public cloud resources as well.
Software-defined Storage: IsilonSD Edge Software
EMC’s Isilon storage is a market leader in scale-out file storage and offers a robust level of capabilities designed for growing
unstructured data environments. In addition to providing a scale-out storage architecture, Isilon offers support for a
variety of storage protocols including NFS, SMB/CIFS, HDFS, and Openstack Swift, along with automated data migration
across tiers and a solid complement of data protection capabilities including snapshots and replication. With the advent of
IsilonSD Edge, EMC is able to offer Isilon OneFS storage technology as a software-only option. As a software-defined
storage component, EMC is able to extend the benefits of OneFS to remote office environments with a simple and flexible
software deployment model. IsilonSD Edge can be deployed on new or even existing commodity hardware to simplify the
deployment and help reduce the cost of storage equipment, power, and cooling. IsilonSD Edge continues, however, to
offer the same levels of capability as Isilon OneFS in addition to leveraging the same management tools, interfaces, and
VMware integration, increasing management simplicity.
ESG 2015 storage research conducted earlier this year identified that the emergence of software-defined storage (SDS) has
seen an emphatic level of interest from the IT industry. When IT decision makers were asked to identify their organization’s
1
Source: ESG Research Report, 2015 Data Storage Market Trends, October 2015.
© 2015 by The Enterprise Strategy Group, Inc. All Rights Reserved.
2
Solution Showcase: EMC Isilon: The Next-generation Data Lake
3
perception of software-defined storage, 60% of IT decision makers reported that their organizations are committed to SDS
as a long-term strategy (68%) or at least conceptually interested in SDS (26%).2
For additional detail on the rationale driving this high level of interest, the data in Figure 1 offers perspective on the factors
responsible for the consideration of software-defined storage.3 Although many respondents identified the potential
benefits that focus on cost savings by means of reducing operational or capital expenditures as drivers, the most oftencited response for all factors was simplified storage management. This data provides credence to the potential impact that
the flexibility enabled by SDS environments can have on simplifying storage management. It is this simplicity that
contributes to a large portion of the benefit behind IsilonSD Edge and Isilon Cloud pools.
Figure 1. Factors Responsible for Organization’s Consideration of Software-defined Storage
To the best of your knowledge, which of the following factors are responsible for your
organization’s consideration of software-defined storage? (Percent of respondents, N=307)
17%
Simplified storage management
55%
15%
Reduction in operational expenditures
50%
13%
Reduction in capital expenditures
50%
17%
Total cost of ownership (TCO)
Greater agility to better align with evolving and
fluid needs of the business
Support server virtualization workload
consolidation
Support virtual desktop infrastructure (VDI)
deployment
Most important
factor driving
consideration of
software-defined
storage
50%
13%
47%
14%
All factors driving
consideration of
software-defined
storage
44%
10%
43%
1%
1%
Don't know
0%
10%
20%
30%
40%
50%
60%
Source: Enterprise Strategy Group, 2015
IsilonSD Edge and Isilon Cloud Pools: Delivering the Next-generation Data Lake
As mentioned previously, the data lake concept potentially only represents the first step in delivering an architecture to
serve not only the rapid rate of date growth, but also the new types of workloads being deployed by IT organizations.
The Next-generation Data Lake
The promise of big data or business intelligence can be quite alluring: Take the data you are storing already and run some
additional analysis to glean business insights, then use those insights to help your business run more efficiently and
effectively. The effectiveness of these analytics applications, however, can be limited by the storage infrastructure. If the
underlying storage foundation does not scale enough or does not offer the right performance, the completeness of the
results could suffer. As a result, the concept of a data lake emerged, offering a storage foundation designed to present the
2
3
Source: ESG Brief, Software-defined Storage Trends, September 2015.
Source: ibid.
© 2015 by The Enterprise Strategy Group, Inc. All Rights Reserved.
Solution Showcase: EMC Isilon: The Next-generation Data Lake
benefits of storage consolidation, which provides simplified management and reduced infrastructure costs, but those
benefits are often limited to the data center.
To deliver a next-generation data lake, EMC has introduced a new OneFS operating system for Isilon that provides
increased reliability and availability at the core of the data center. It includes support for non-disruptive operations, nondisruptive upgrades, and rollback of upgrades. As data lakes grow in size and become critical repositories of massive
scaled-business data, resiliency is key for the data lake. OneFS now also includes support for Microsoft’s SMB3 Continuous
IsilonSD Edge Benefit Overview Availability protocol, which enables newer Windows clients to seamlessly
fail over in case of any outage.
 Extends enterprise data lake from
data center to enterprise edge
locations.
 Simple, software-only deployment
model.
 Ability to leverage (new or existing
and unused) commodity hardware
storage infrastructure, and reduce
EMC’s has also introduced IsilonSD Edge and Isilon CloudPools to extend
the Isilon ecosystem well beyond the data center, delivering a nextgeneration data lake architecture that can extend the aggregation and
accessibility benefits to both the edge (e.g., remote offices or sites) and
the cloud. The net result significantly increases deployment and
infrastructure flexibility, helping the IT organization to design the optimal
storage ecosystem for its specific workload needs.
Addressing the Challenges of the Edge
power and cooling.
The management of data at remote sites can create a challenge for IT
 Improved data protection at the
administrators. Lack of direct accessibility to storage hardware can often
add an extra layer of management complexity, slowing both planned and
edge with the capabilities of Isilon.
unplanned maintenance tasks. ESG recently conducted a research study
 Support for a number of emerging
into the challenges associated with managing remote office environments.
use cases including IoT, analysis at
When IT decision makers were asked to identity their top IT priorities with
the edge—health care, video
respect to supporting ROBO locations, four of the top five most-cited
surveillance, and content
involved the protection, storage, and accessibility of data; improving
collaboration.
information security measures (45%), managing data growth (37%),
improving backup and recovery processes (37%), and improving employees’ abilities to share files/collaborate with other
employees (36%).4 When considered in aggregate, these priorities can represent a myriad of specific IT ecosystem
concerns. For example, multiple challenges such as the need for greater efficiency, management simplicity, reduced power
and cooling, and superior data protection and security can fall under managing storage growth. These priorities further
support the rising interest and demand for next-generation data lake environments that can consolidate data from the
edge into a central data lake ecosystem, simplifying the management of data on the edge.
The Promise of Cloud Infrastructure
While managing data at the edge with traditional storage can create challenges, the emergence of the public cloud storage
tiers introduces opportunities. Often, IT organizations look to off-premises cloud storage as a potential low-cost bastion for
unused, “cold” or “frozen” data storage. In ESG’s aforementioned storage research, more than one-third (37%) of IT
decision makers identified leveraging public cloud-based storage as an initiative expected to impact storage spending over
the next 12 to 18 months. This data is understandable given the cost savings often associated with leveraging public cloud
storage tiers. These savings result in benefits such as reduced infrastructure, simpler manageability, and reduced power
and cooling, to name a few.
4
Source: ESG Research Report, Remote Office/Branch Office Technology Trends, May 2015.
© 2015 by The Enterprise Strategy Group, Inc. All Rights Reserved.
4
Solution Showcase: EMC Isilon: The Next-generation Data Lake
For some years, Isilon has offered policy-based, automated storage tiering on an Isilon cluster with SmartPools software to
provide the most appropriate storage resources for specific data sets. EMC’s Isilon CloudPools leverages the SmartPools
policy engine to extend storage tiering to cloud storage resources as
Isilon CloudPool Benefit Overview
part of a larger Isilon storage ecosystem. With CloudPools, Isilon
provides automated and policy-based data migration to the cloud as a
 Integrates data center with cloud storage
new storage tier for less active data sets. To secure this data, all data
resources.
that is moved to the cloud with CloudPools is sharded (divided up and
 Simple solution management.
separated) and then encrypted. In addition to the ability to
automatically migrate data to cloud, Isilon’s CloudPools also provides
 Seamless viability to content on the cloud
the ability for the data to remain accessible as a part of the enterprise’s
 Access to low-cost public and private cloud
Isilon data lake. This capability lets organizations more effectively
leverage public cloud resources by allowing local on-premises
resources for cold, unused data.
workloads to retain access to data even when it has been migrated off Date stored on cloud resources remains
premises. The net result can allow for more efficient utilization of both
accessible for analytical analysis performed in
on- and off-premises resources, reducing the cost and complexity of
the data center on the entire data lake.
data management while being transparent to users and applications.
The Bigger Truth
Ultimately, an organization’s data and its technology should enable the business to do more, be more competitive, and be
more successful. Achieving these goals requires a storage architecture similar to that which Isilon is delivering with IsilonSD
Edge and CloudPools, where data can reside at the right location for the business—whether that is in the data center, at
the edge, or in the cloud—while providing the management and simplicity of one single pool. This design has become
increasingly important as organizations continue to increase their usage of analytics and extend the collection and analysis
of digital content to a wider variety of locations.
The new data lake extends beyond the data center to the edge and to the cloud, which simplifies management and
reduces storage costs and complexity. When looking to deploy a foundation for the next generation of digital workloads,
organizations should ensure that the storage foundation can provide the resiliency and flexibility to extend to all the
locations where data may be created, analyzed, and retained. EMC understands that organizations require a storage
solution that can evolve to meet the specific needs of their environments, and that those requirements will continue to
evolve with the organization’s demands. As such, EMC Isilon is delivering the next-generation data lake that supports
traditional and next-generation workloads.
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject
to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this
publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express
consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable,
criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
© 2015 by The Enterprise Strategy Group, Inc. All Rights Reserved.
5