Making High-level Charts

The high level bokeh.charts interface provides a fast, convenient way to create common statistical charts with a minimum of code. Wherever possible, the interface is geared to be extremely simple to use in conjunction with Pandas, by accepting a DataFrame and names of columns directly to specify data.

Key Concepts

Data: Input data is either a Pandas pandas.DataFrame or other table-like
structure, yet also handling simple formats through conversion to a DataFrame internally.
Smart Defaults: The attempt is made to provide unique chart attribute assignment
(color, marker, etc) by one or more column names, while supporting custom and/or advanced configuration through the same keyword argument.

Accepted Charts Data Formats

Charts make use of Pandas DataFrame internally, so any inputs provided coerced into this format. The Charts interface provides support for the more simple types externally, which can be useful for quickly building charts, or can avoid having to remember how to import and create a dataframe.

The input types accepted are:

Array-like: 1..* list, tuple, numpy.ndarray, pandas.Series

Table-like:
  • records: a list(dict)
  • columns: a dict(list), pandas.DataFrame, or blaze resource

Attribute Specification

An AttrSpec is a model for generating a look-up from a unique data label (ex. (‘a’, 3)), into a chained iterable. This functionality is what powers one-liner chart generation, while also providing flexibility for customized inputs.

If you were to manually generate the glyphs in a plot, you might start by using Pandas groupby() to identify unique subsets of your data that you’d like to differentiate. You would iterate over each data label and data group and assign unique attributes to the group.

Simple Use Case However, what if we don’t want one specific attribute type per group? Instead, let’s say we grouped by [‘a’, ‘b’], where a has 3 unique values and b has 10 unique values. We want to change the color by a and change the marker by b. In the groupby iteration, you will see each value of a multiple times, meaning you’ll need some way of keeping track of which unique value of which column will result in the assignment of each attribute value.

Supporting Exploratory Use More importantly, you’ll need to pre-define enough unique values of the attribute to assign to each value you have grouped on, which isn’t necessarily complicated, but it can be especially time consuming for new or sporadic users. This process of assigning attributes is also generally of little interest to users that prioritize interactive data discovery over novel charts. With the discovery use case, you are trying to understand what relationships exist within the data, so it is counter-productive to require the user to understand the data before plotting it.

Attribute Specifications avoid this issue, but are also designed to provide the ability to configure specific behavior as well. The typical pattern of use is shown shown below in pseudocode:

from bokeh.charts import color, marker

# generally any chart attribute can be handled with attribute specifications

Chart(df, color='red')          # single constant value supported
Chart(df, color='a')            # typical use is with column name input
Chart(df, color=['a', 'b'])     # or multiple column names
Chart(df, color=color(['a', 'b']))     # equivalent to previous line

# input of custom iterables that are automatically chained
Chart(df, color=color('a', palette=['red', 'green', 'blue']))
Chart(df, color=color('a', palette=['red', 'green', 'blue']),
      marker=marker('b', markers=['circle', 'x']))

Bar Charts

The Bar high-level chart can produce bar charts in various styles. Bar charts are configured with a DataFrame data object, and a column to group. This column will label the x-axis range. Each group is aggregated over the values column and bars are show for the totals:

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Bar(df, 'cyl', values='mpg', title="Total MPG by CYL")

output_file("bar.html")

show(p)

Aggregations

The agg parameter may be used to specify how each group should be aggregated:

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Bar(df, label='yr', values='mpg', agg='mean',
        title="Average MPG by YR")

output_file("bar.html")

show(p)

Available aggregations are:

  • 'sum'
  • 'mean'
  • 'count'
  • 'nunique'
  • 'median'
  • 'min'
  • 'max'

Bar Width

The bar_width parameter can be used to specify the width of the bars, as percentage of category width:

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Bar(df, 'yr', values='displ',
        title="Total DISPL by YR", bar_width=0.4)

output_file("bar.html")

show(p)

Bar Color

The color parameter can be used to specify the color of the bars:

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Bar(df, 'yr', values='displ',
        title="Total DISPL by YR", color="wheat")

output_file("bar.html")

show(p)

Grouping

Groups in the data may be visually grouped using the group parameter:

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Bar(df, label='yr', values='mpg', agg='median', group='origin',
        title="Median MPG by YR, grouped by ORIGIN", legend='top_right')

output_file("bar.html")

show(p)

Stacking

Groups in the data may be visually stacked using the stack parameter:

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Bar(df, label='origin', values='mpg', agg='mean', stack='cyl',
        title="Avg MPG by ORIGIN, stacked by CYL", legend='top_right')

output_file("bar.html")

show(p)

Box Plots

The BoxPlot can be used to summarize the statistical properties of different groups of data. The label specifies a column in the data to group by, and a box plot is generated for each group:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl',
            title="MPG Summary (grouped by CYL)")

output_file("boxplot.html")

show(p)

The label can also accept a list of column names, in which case the data is grouped by all the groups in the list:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label=['cyl', 'origin'],
            title="MPG Summary (grouped by CYL, ORIGIN)")

output_file("boxplot.html")

show(p)

Box Color

The color of the box in a BoxPlot can be set to a fixed color using the color parameter:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl', color='#00cccc',
            title="MPG Summary (grouped by CYL)")

output_file("boxplot.html")

show(p)

As with Bar charts, the color can also be given a column name, in which case the boxes are shaded automatically according to the group:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl', color='cyl',
            title="MPG Summary (grouped and shaded by CYL)")

output_file("boxplot.html")

show(p)

Whisker Color

The color of the whiskers can be similarly controlled using the whisker_color parameter. For a single color:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl', whisker_color='goldenrod',
            title="MPG Summary (grouped by CYL, shaded whiskers)")

output_file("boxplot.html")

show(p)

Or shaded automatically according to a column grouping:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl', whisker_color='cyl',
            title="MPG Summary (grouped and whiskers shaded by CYL)")

output_file("boxplot.html")

show(p)

Outliers

By default, BoxPlot charts show outliers above and below the whiskers. However, the display of outliers can be turned on or off with the outliers parameter:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl', outliers=False,
            title="MPG Summary (grouped by CYL, no outliers)")

output_file("boxplot.html")

show(p)

Markers

The marker used for displaying outliers is controlled by the marker parameter:

from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = BoxPlot(df, values='mpg', label='cyl', marker='square',
            title="MPG Summary (grouped by CYL, square marker)")

output_file("boxplot.html")

show(p)

Histograms

The Histogram high-level chart can be used to quickly display the distribution of values in a set of data. It can be used by simply passing it a literal sequence of values (e.g a python list, NumPy or Pandas DataFrame column):

from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Histogram(df['mpg'], title="MPG Distribution")

output_file("histogram.html",)

show(p)

It can also be used by passing in a Pandas Dataframe as the first argument, and specifying the name of the column to use for the data. The column name can be provided as the second positional argument:

from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Histogram(df, 'hp', title="HP Distribution")

output_file("histogram.html",)

show(p)

Or explicitly as the values keyword argument:

from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Histogram(df, values='displ', title="DISPL Distribution")

output_file("histogram.html",)

show(p)

Number of Bins

The bins argument can be used to specify the number of bins to use when computing the histogram:

from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Histogram(df, values='mpg', bins=50,
              title="MPG Distribution (50 bins)")

output_file("histogram_bins.html")

show(p)

Bar Color

It is also possible to control the color of the histogram bins by setting the color parameter:

from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Histogram(df, values='hp', color='navy', title="HP Distribution")

output_file("histogram_color.html")

show(p)

Color Groups

However, the color parameter can also be used to group the data. If the value of the color parameter is one of the DataFrame column names, the data is first grouped by this column, and a histogram is generated for each group. Each histogram is automatically colored differently, and a legend displayed:

from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Histogram(df, values='hp', color='cyl',
              title="HP Distribution (color grouped by CYL)",
              legend='top_right')

output_file("histogram_color.html")

show(p)

Scatter Plots

The Scatter high-level chart can be used to generate 1D or (more commonly) 2D scatter plots. It is used by passing in DataFrame-like object as the first argument then specifying the columns to use for x and y coordinates:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='mpg', y='hp', title="HP vs MPG",
            xlabel="Miles Per Gallon", ylabel="Horsepower")

output_file("scatter.html")

show(p)

Color

The color parameter can be used to control the color of the scatter markers:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='mpg', y='hp', title="HP vs MPG", color="navy",
            xlabel="Miles Per Gallon", ylabel="Horsepower")

output_file("scatter.html")

show(p)

Color Groups

if color is supplied with the name of a data column then the data is first grouped by the values of that column, and then a different color is used for every group:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='mpg', y='hp', color='cyl', title="HP vs MPG (shaded by CYL)",
            xlabel="Miles Per Gallon", ylabel="Horsepower")

output_file("scatter.html")

show(p)

Legends

When grouping, a legend is usually useful, and it’s location can be specified by the legend parameter:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='displ', y='hp', color='cyl',
            title="HP vs DISPL (shaded by CYL)", legend="top_left",
            xlabel="Displacement", ylabel="Horsepower")

output_file("scatter.html")

show(p)

Legends are not sorted by default but this behavior can be changed by using the legend_sort_field attribute to specify the attribute to sort by and legend_sort_direction to set the order (ascending or descending).

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='displ', y='hp', color='cyl',
            title="HP vs DISPL (shaded by CYL)", legend="top_left",
            legend_sort_field = 'color',
            legend_sort_direction = 'ascending',
            xlabel="Displacement",
            ylabel="Horsepower")

output_file("scatter.html")

show(p)

Markers

The marker parameter can be used to control the shape of the scatter marker:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='displ', y='hp', marker='square',
            title="HP vs DISPL", legend="top_left",
            xlabel="Displacement", ylabel="Horsepower")

output_file("scatter.html")

show(p)

As with color, the marker parameter can be given a column name to group by the values of that column, using a different marker shape for each group:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='displ', y='hp', marker='cyl',
            title="HP vs DISPL (marked by CYL)", legend="top_left",
            xlabel="Displacement", ylabel="Horsepower")

output_file("scatter.html")

show(p)

Often it is most useful to group both the color and marker shape together:

from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df

p = Scatter(df, x='displ', y='hp', marker='cyl', color='cyl',
            title="HP vs DISPL (marked by CYL)", legend="top_left",
            xlabel="Displacement", ylabel="Horsepower")

output_file("scatter.html")

show(p)

Chart Defaults

The bokeh.charts modules contains a defaults attribute. Setting attributes on this object is an easy way to control default properties on all charts created, in one place. For instance:

from bokeh.charts import defaults

defaults.width = 450
defaults.height = 350

will set the default width and height for any chart. The full list of attributes that can be set can be seen in the bokeh.charts section of the Reference Guide.