New Alluxio Release Accelerates Cloud Deployments for Analytics and Machine Learning

Customers can Improve performance and reduce cost with location aware data management -- new object storage optimizations eases migration from HDFS to cloud


SAN MATEO, Calif., July 31, 2018 (GLOBE NEWSWIRE) -- Alluxio, developer of the world's first software system that unifies data at memory speed, today announced the release of Alluxio 1.8 to accelerate cloud adoption for analytics and machine learning workloads. Location-aware data management tools provide a wide range of policy-based control within hybrid cloud environments and within cloud availability zones. Optimizations for object storage, and each major cloud provider, close semantic differences with the Hadoop Distributed File System (HDFS) and ease application portability between cloud platforms. The new Filesystem in Userspace (FUSE) interface enables machine learning frameworks to access cloud data as if it were in a local file system.

Additionally, developers now have superior insight into data metrics within an Alluxio cluster as well as expanded visibility into the application and persistent storage layer. Increased metrics coverage within the Alluxio cluster, the application layer and underlying storage systems makes it easier to connect and manage multiple data sources. A new dashboard in the UI provides overall health and utilization metrics with accompanying command line interface (CLI) tools for live cluster statistics. All remote procedure call (RPC) requests are recorded, providing a detailed set of machine-consumable metrics with API-based statistics generation for third party tools such as Grafana and Prometheus. Developers can quickly diagnose storage system performance issues with new tools that include latency histograms and capacity utilization. The application configuration checker provides a one-click integration check for third party applications such as Hive, MapReduce, Spark, and more.

“Innovative companies are looking for new ways to interact with data in a complex ecosystem with a wide variety of application frameworks, heterogeneous storage systems, and hybrid cloud environments,” said Haoyuan Li, co-founder and CEO of Alluxio. “Alluxio is innovating rapidly and leveraging the open source model to give developers new capabilities to extract value from their data and build new services as enterprises navigate the digital transformation.”

Developing for cloud environments (public, private, and hybrid) is paramount for modern application deployment. New data management features in Alluxio include location-aware data capabilities so that companies can set policies and control data placement across availability zones as well as simplifying tiered data placement. These features help organizations ensure business continuity and also boost performance by putting data close to compute resources for cloud workloads. Department isolation can be enforced, and expensive data transfers between zones can be avoided via a single persistent data source. Alluxio can be deployed in a container and is certified with Kubernetes container orchestration.

This release includes optimizations that apply object storage semantics to the interfaces for major public cloud providers. A standard Amazon Web Services (AWS) S3 interface ensures broad support for independent storage vendors. Applications connect to Alluxio and can access data from cloud storage without code changes. The standard interface also ensures application portability across cloud vendors.

Object storage optimizations ease migration from expensive HDFS storage solutions to more cost-effective object storage. This also helps decouple compute and storage in Big Data deployments by making it easier to move live datasets to HDFS when required and putting the rest of the data in object storage. Performance optimizations include intelligent metadata services to speed up critical big data operations such as directory listing (‘ls’)’ and ‘rename’. Alluxio implements POSIX-style security to maintain compatibility with frameworks such as Access Control Lists (ACLs) when moving to the cloud.

The new Filesystem in Userspace (FUSE) interface brings the power of Alluxio to data scientists and analysts without involving IT operations while providing improved insight and control for both developers and administrators. Data sources mount like a local filesystem, a key feature simplifying self-service data access and particularly useful for applications like TensorFlow.

To help developers and data scientists take advantage of Alluxio faster, the latest release of Alluxio includes a new starter kit. The kit Includes:

  • Alluxio pre-built binaries;
  • How to guide: Install Alluxio on a local machine;
  • How to guide: Install, plus mount an S3 bucket and accelerate remote reads;
  • Video: walk-through of install through accelerating remote reads;
  • How to: Running Spark on Alluxio;
  • Learn more: Architecture overview.

To improve usability, the new release simplifies cluster configuration management. Centralized configuration settings can be applied at the master and propagated automatically through the cluster. Different client applications, such as Spark jobs, can initialize their configuration by retrieving the default from the master. Improved journaling and snapshot provide guaranteed data consistency, faster restart, and disaster recovery support.

Alluxio solutions help customers in a wide range of use cases to maximize the value of their data. Alluxio enables a flexible data infrastructure that meets the volume, variety and velocity challenges of data-driven enterprises with a scalable virtual data layer in the cloud, on-premises or hybrid. With Alluxio, customers can scale beyond petabytes across storage silos, geographic locations and cloud providers allowing concurrent access to shared data sources without modifying applications. Alluxio provides standard access to multiple object or file data sources concurrently to deliver data at memory speed regardless of physical location.

About Alluxio
Proven at global web scale in production for modern data services, Alluxio is the world's first system that unifies data at memory speed. Alluxio provides a single source virtual data layer connecting applications to data in any format running on premises or in public clouds. Intelligent caching and data management deliver fast performance, data consistency and high availability to customers in financial services, high tech, retail and telecommunications. Venture-backed by Andreessen Horowitz, Alluxio was founded at UC Berkeley’s AMPLab by the creators of the Tachyon open source project. For more information, contact info@alluxio.com or follow us on LinkedIn, Twitter or Facebook.

Editorial Contact
Lonn Johnston for Alluxio
+1.650.219.7764
lonn@flak42.com