Spark Streaming Dashboard with Dynamic Widgets

Share: Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInShare on RedditEmail this to someonePrint this page

Using the new Jupyter extensions Dynamic Dashboards and Declarative Widgets along with Apache Spark (Pyspark), we can build a dashboard that visualizes data as it changes over time.

We have built an example notebook as a tutorial on how to construct such a dashboard. The link is to the version at the time of this writing. Feel free to browse the Github repository for newer versions as the projects evolve. You can try the notebook yourself on our temporary notebook site at http://jupyter.cloudet.xyz. Click the Meetup.com Streaming Demo link at the bottom of the first page. This tutorial will work best in Google Chrome.

Meetup WebSocket Stream

In this tutorial, we use the publicly available Meetup WebSocket RSVP stream as our data source. This is a real-time WebSocket stream of RSVPs made by Meetup users. The notebook is fully documented and will walk you through the code in detail.

To run the dashboard, select the View -> View Dashboard menu item in the notebook. Then select the Cell -> Run All menu item. The dashboard will be ready to stream. To start the stream, click the toggle button that reads “Stream” in the top corner. The dashboard should look like the screen shot in Figure 1.

streaming-dashboard

Figure 1: The full dashboard (click for full size)

The rest of this article will illustrate key techniques used to move data between the server and the browser and describe a potential business scenario that could make use of the concepts and technologies presented here.​

Data Flow

Note: This section assumes prior knowledge of Python, Jupyter notebooks, and Spark Streaming.

Using Python and Pyspark, we read in and process the RSVP data stream to extract Meetup event topics. Using the tools provided by Dynamic Widgets, we can control the flow of data from the Spark stream to the browser and back to the Spark stream. This will allow us to visualize the data as well as control/manipulate the data flow using UI elements.

Once we have the Spark streaming data in a format we would like to visualize, we can send data to the browser using an urth-core-channel  connection. In a notebook code cell, run code like:

where data  is in a structure consumable by the browser, such as a pandas DataFrame or a Python dictionary. Then in an HTML code cell, write a template  element like:

The myvar  variable will be sent over the urth-core-channel and bound to the Polymer template as myvar . This example creates a bar chart visualizing the data shown in myvar.data  which should be an array of arrays and myvar.columns  which should be an array of column names.

If the Python code above is executed as part of a Spark DStream operation, the chart will be continuously updated as new RSVP data comes in (Figure 2).

bar-chart-animate

Figure 2: Animated bar chart (click to see animation)

Now that we can visualize streaming data, it would be nice to manipulate the data stream using UI controls. For example, we can filter the incoming data by hooking an input field to an urth-core-function, which can bind a value from a UI widget to a Python function argument and invoke the function when the argument value changes. The Python function would look like:

The global value topic_filter  could be broadcast to a Spark stream to filter the data. Then we can define another template with an urth-core-function  to hook up a <paper-input> to the Python function:

This will cause the Python function set_topic_filter to be invoked when the <paper-input>  value changes. Such a filter can be seen in Figure 1.

The above has been a brief overview of streaming dashboards with Spark and notebooks. If you’re ready to dive into the details, remember, you can try the notebook yourself on our temporary notebook site at http://jupyter.cloudet.xyz. Click the Meetup.com Streaming Demo link at the bottom of the first page.

Business Scenario

​The work we have done in this tutorial can be extended to meet real business needs and gain business insights. Consider this potential business scenario:

We want to increase awareness and knowledge of our product, and one way we can do that is increase attendance to upcoming Meetups relating to our product. To help us target people that may be interested, we can build a dashboard to collect and monitor Meetup user actions using the public Meetup APIs, such as the RSVP stream we have used in this tutorial. This dashboard can use Spark and the Jupyter Declarative Widgets components to help gain the insights we need.

By filtering on topics relevant to our product, we could find popular upcoming meetups that we can sponsor or become involved in in order to drive awareness of our product. We can use more declarative widgets, such as a map, to further focus on relevant meetups and topics. Finally, a method to allow the user to take action could be added to help make use of the insights gained by using the dashboard.

Dynamic Dashboards and Declarative Widgets can help us harness the power of streaming analytics. They allow us to assemble a UI using pre-built building blocks that let us see what we need to see while data is being processed in real-time. This is a powerful way that businesses can quickly start understanding data and can drive business insight.

If you are interested in learning more about our organization or emerging technologies such as dashboards or widgets, contact us.

Share: Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedInShare on RedditEmail this to someonePrint this page
Drew Logsdon

Drew Logsdon

Advisory Software Engineer at IBM
Drew is a software developer in IBM Emerging Internet Technologies. He enjoys learning new things and working on User Interfaces.
Drew Logsdon

Latest posts by Drew Logsdon (see all)

Drew Logsdon

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *