Today at the Spark Summit in San Francisco I announced the release of Streaming SQL for Apache Spark*, part of a project to develop a complete open source framework for
Image may be NSFW.
Clik here to view.
streaming analytics and making these capabilities pervasive. This release is the latest milestone in our initiative with Apache Hadoop* and Apache Spark communities and their ecosystems to accelerate Big Data analytics.
It’s a significant accomplishment.
Why? Because Moore’s Law has driven us to a tipping point. Computing technology is now so small and affordable, one can integrate intelligence into anything from bracelets to buildings. We’re in the midst of a continuously emerging digital service economy—an Internet of Things. And as we build out this global network of connected devices and machines, our opportunity (and our need) to sense, understand, and react to a world in motion is exploding.
Increasing Data, Increasing Challenges
The epic collection and availability of massive amounts of data is no longer news. The genie is out of the bottle, and demand for real-time analytics is going in only one direction: up. Enterprises want to visualize and monitor their business in real-time, take immediate actions, and listen to reactions in social media to better understand consumers. Governments want to make cities smarter, air cleaner, and traffic smoother. Both need to identify and contain malware and cyberattacks before the damage is done.
These usage models require data to be analyzed in the moment, including the ability to extract critical insights as it streams in. Apache Spark is well-suited to doing precisely that.
But Spark adoption has been a heavy lift, one of the reasons is the need for special programming skills to achieve timely business insights. Specifically, SQL-savvy data analysts who don’t know Java* or Scala* must hire developers or purchase proprietary software to leverage Spark.
Two recent studies underscore this point:
- According to KDnuggets, which recently surveyed 3,000 members of the data mining community on 93 different tools, SQL continues to be very popular. It ranked as the #3 software tool, used by more than 30% of respondents.
- In a Stack Overflow survey, 48% of developers know SQL – the #2 most popular coding language after JavaScript (Java was third at 37.4%). Scala didn’t make the list.
Streaming SQL for Apache Spark is a query language that makes it easy and efficient for data analysts to write stream-processing applications. There is no need to learn a new programming language nor to rely on infrastructure developers for streaming applications.
Boosting Productivity: A Real-World Case Study
Intel® Software experts recently provided this capability to JD.com, an electronic commerce company headquartered in Beijing and one of the largest online retailers in China by transaction volume.
The Need: Provide real-time “statistics as a service” for JD.com’s Cloud service.
The Problem: In order to query data and develop business logic, their SQL-savvy analyst had to work with infrastructure developers to write a streaming application in Scala or Java, and then store the data in a database. This process took days or weeks.
The Solution: The infrastructure team provided the analyst with a direct SQL interface using Streaming SQL for Apache Spark. This allowed the analyst to write new applications in just a few hours.
“It sometimes took weeks to write new business applications. It was a cumbersome process involving two different teams to analyze real-time streaming data,” said Xiaohui, Lia, Senior Development Manager. “With Streaming SQL for Apache Spark, our analysts were able to develop business applications in a matter of hours – a huge boost to productivity.”
Enabling Efficiency Everywhere
At Intel, our goal is to make streaming analytics easy and efficient for data analysts everywhere. We want to enable businesses, governments, and even consumers with the ability to react to events quickly and intelligently as they happen.
With the growing ubiquity of intelligent devices, smart objects and digital assistants will increasingly rely on real-time streaming capabilities to help us find things we need, discover things we want, and live more productive (and stress-free, if we’re lucky) lives.
We strive to make this happen by giving software developers the tools they need to create.
Intel Inside, New Analytics Possibilities Outside.
###
*Other names and brands may be claimed as the property of others.
No computer system can be absolutely secure. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
The post Sparking Real-Time Analytics, Igniting Real-Time Intelligence appeared first on Intel Software and Services.