Databricks Expands Platform for Turnkey Production Apache Spark Deployments in the Cloud

Company Launches Enhanced Reliability and Security Capabilities for Data Engineering on its Managed Spark Platform


LAS VEGAS, NV--(Marketwired - Nov 30, 2016) - Databricks®, the company founded by the team that created the popular Apache® Spark™ project, announced new capabilities to its platform that further simplify the production deployment of Spark in the cloud. The production enhancements complement the existing Databricks environment for data science, which enable users to collaboratively analyze data in real-time with data science notebooks and immediately deploy them as production Spark jobs and workflows. The announcement was made today at the 2016 Amazon Web Services (AWS) re:Invent conference.

The production features announced today enable users to effortlessly setup and run Spark jobs and workflows without humans in the loop via APIs, monitor performance and troubleshoot errors with detailed logs, manage AWS EC2 costs with AWS Tags, control access to resources with AWS IAM Roles, and increase the scalability of long-running workloads with encrypted AWS Elastic Block Storage (EBS). Databricks is the first and only vendor to offer a SOC2 and HIPAA compliant Spark platform that provides turnkey deployment of both real-time analysis and production Spark workloads with a seamless transition from analysis to production.

As organizations across industries deploy Apache Spark in the public cloud, the task of minimizing costly downtimes of mission-critical workloads, such as applications that predict equipment failure, falls on data engineering teams. Yet, building sophisticated systems around Spark to ensure that such workloads are resilient, easy to troubleshoot, and secure, requires a high level of technical expertise and meticulous efforts that most organizations struggle to spare.

"As enterprises increasingly rely on Apache Spark to power more diverse production workloads supporting more people, it becomes critical to prevent business system outages that could cost millions of dollars," said Nik Rouda, Senior Analyst at Enterprise Strategy Group.

In Databricks' production environment, data engineers can bypass the difficult and tedious tasks of developing, configuring, tuning and securing infrastructure to easily achieve production requirements with features such as:

  • HIPAA and SOC2-compliant Apache Spark clusters fully managed and tuned by the Spark committers at Databricks;
  • REST APIs to orchestrate and monitor sophisticated Spark jobs and workflows programmatically, without humans in the loop;
  • End-to-end logs and performance metrics to easily debug and fine-tune Spark workloads, accessible via APIs programmatically or in the Databricks user interface;
  • Customizable AWS tags to manage the AWS EC2 usage of each Spark cluster;
  • Encrypted AWS Elastic Block Storage (EBS) to increase the reliability of long-running Spark jobs on AWS EC2 instances by automatically providing additional storage;
  • AWS IAM Roles integration to provide secure access to AWS resources to diverse user groups in the same organization;
  • Direct integration with the data science environment to let organizations instantly move exploratory work to production without re-engineering;
  • SSH Access to provide engineers direct access to the production environment to troubleshoot and inspect the Spark clusters.

"Databricks is experiencing unprecedented demand for a robust and secure Apache Spark platform in the cloud to run production workloads," says Ali Ghodsi, CEO and Co-Founder of Databricks. "We are proud to enable one of our core user groups, the data engineers, to meet the most stringent of operational requirements."

Visit databricks.com or Booth #1341 at AWS re:Invent to learn more.

Contact Databricks to get started: http://go.databricks.com/contact-databricks.

About Databricks
Databricks' vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache Spark project. The company has also trained over 20,000 users on Apache Spark, and has the largest number of customers deploying Spark to date. Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact info@databricks.com.

© Databricks 2016. All rights reserved. Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.

Contact Information:

Contact:
Suzanne Block for Databricks
P: 617-824-0981
E: databricks@merrittgrp.com