Databricks Announces DataFrames for Spark

SAN FRANCISCO, CA--(Marketwired - Feb 17, 2015) - Databricks, the company founded by the creators of the popular open-source Big Data processing engine Apache Spark with its flagship product, Databricks Cloud, today announced they have developed a new API for Spark 1.3 called DataFrames, aimed to simplify distributed data processing and enable data scientists who are familiar with single-machine tools to now immediately use Spark. DataFrames will support a variety of data sources and automatically optimize computations to make big data processing easier, accelerating the rate at which organizations can use big data.

The DataFrames API, inspired by data frames in R and Python (Pandas), provides a familiar and efficient interface for data scientists that they can now run on big data, making large-scale computing accessible to users with experience in these languages. The API builds on the Spark SQL query optimizer to automatically execute code efficiently on a cluster of machines. Finally, through Spark SQL's external data sources API, DataFrames can access a wide array of third-party data sources such as databases and NoSQL stores. DataFrames will be incorporated into Spark 1.3 and released in early March and is expected to play a major role in the R API for Spark coming later this year.

"There has always been a major gap in the skill sets required to work on 'small data' and 'big data'. By adding DataFrames to Spark, we are making the power of the Spark ecosystem available to the wide set of data scientists and analysts. Along with our other ongoing work in Spark and Databricks Cloud, we expect this feature to make big data accessible to a much wider range of users," said Matei Zaharia, CTO of Databricks and the creator of Apache Spark.

Matei Zaharia Keynote at Strata and Hadoop World
Databricks is also announcing today that its chief technology officer and co-founder, Matei Zaharia will keynote at Strata and Hadoop World. Zaharia's keynote, "New Directions for Spark in 2015," will address the significant momentum Spark is gathering in 2015, specifically how the platform is positioned for enterprise utilization, performance, and scalability with its recent improvements and additions in 2014. The session will highlight key framework enhancements related to Spark in 2015 and what these enhancements mean for pushing the envelope in the cloud analytics space.

What: "New Directions for Spark in 2015"
Who: Matei Zaharia, chief technology officer and co-founder at Databricks
When: Friday, February 20 at 9:15am PT
Where: Strata and Hadoop World (San Jose Convention Center, Grand Ballroom 220)

Zaharia will be joined at the conference by several Databricks colleagues who will also be presenting, including Paco Nathan, Patrick Wendell, Reynold Xin and more, to address Spark adoption and growth slated for 2015.

Spark at Strata and Hadoop World
For the first time ever, Strata and Hadoop World is including a Spark-focused track at the conference, "Spark in Action," where attendees can get deep dives of best practices, architectural considerations, and real-world case studies drawn from startups to large enterprises. Databricks will be training more than 300 attendees via interactive demos and tutorials, delivering an advanced training program for more than 40 people, in addition to running a Spark certification exam in partnership with O'Reilly. This certification exam is the industry's first and most comprehensive Spark certification.

Additional Resources:

More information on DataFrames API can be found here: https://databricks.com/?p=2757
All of the Databricks-led sessions and keynotes can be found here: http://strataconf.com/big-data-conference-ca-2015/public/schedule/speakers
Apache Spark developer certification exam, Fri Feb 20 10:40-12:40 PST: https://www.eventbrite.com/e/developer-certification-for-apache-spark-san-jose-2015-registration-14954172332
More information on the Spark track at Strata and Hadoop World can be found here: http://strataconf.com/big-data-conference-ca-2015/public/schedule
Bay Area Spark Users meetup, Tue Feb 17 19:00-21:00 PST
"DataFrames for Large-Scale Data Science", Reynold Xin: http://www.meetup.com/spark-users/events/220031485/

About Databricks:
Databricks was founded by the team that created and continues to drive Apache Spark, the most active open source project in the Big Data ecosystem. Databricks' vision is to dramatically simplify big data processing and free users to focus on turning data into value. Databricks Cloud, a cloud platform built around Apache Spark, delivers on this vision by combining the power of Spark with a zero-management hosted platform and an initial set of applications built around common workflows. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, visit http://www.databricks.com.

Contact Information:

For media inquiries:
Suzanne Block
617-824-0981
databricksmg@merrittgrp.com

Databricks Announces DataFrames for Spark

Company Unveils Contributions to Spark Roadmap, Delivers Spark Education Sessions and Certification at Strata and Hadoop World