The high level bokeh.charts
interface provides a fast, convenient way
to create common statistical charts with a minimum of code. Wherever possible,
the interface is geared to be extremely simple to use in conjunction with
Pandas, by accepting a DataFrame
and names of columns directly to specify
data.
Warning
This guide describes a new charts API introduced in release 0.10.
Some older chart types have not yet been converted. However this new
API is such an important and dramatic improvement that it was decided
not to wait any longer to release it. All of the older charts are still
available in a bokeh._legacy_charts
modules that will be removed
later, once all chart types are converted to the new API.
The Bar
high-level chart can produce bar charts in various styles.
Bar
charts are configured with a DataFrame data object, and a column
to group. This column will label the x-axis range. Each group is
aggregated over the values
column and bars are show for the totals:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, 'cyl', values='mpg', title="Total MPG by CYL")
output_file("bar.html")
show(p)
The agg
parameter may be used to specify how each group should be
aggregated:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, label='yr', values='mpg', agg='mean',
title="Average MPG by YR")
output_file("bar.html")
show(p)
Available aggregations are:
'sum'
'mean'
'count'
'nunique'
'median'
'min'
'max'
The bar_width
parameter can be used to specify the width of the bars, as
percentage of category width:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, 'yr', values='displ',
title="Total DISPL by YR", bar_width=0.4)
output_file("bar.html")
show(p)
The color
parameter can be used to specify the color of the bars:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, 'yr', values='displ',
title="Total DISPL by YR", color="wheat")
output_file("bar.html")
show(p)
Groups in the data may be visually grouped using the group
parameter:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, label='yr', values='mpg', agg='median', group='origin',
title="Median MPG by YR, grouped by ORIGIN", legend='top_right')
output_file("bar.html")
show(p)
Groups in the data may be visually stacked using the stack
parameter:
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Bar(df, label='origin', values='mpg', agg='mean', stack='cyl',
title="Avg MPG by ORIGIN, stacked by CYL", legend='top_right')
output_file("bar.html")
show(p)
The BoxPlot
can be used to summarize the statistical properties
of different groups of data. The label
specifies a column in the data
to group by, and a box plot is generated for each group:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl',
title="MPG Summary (grouped by CYL)")
output_file("boxplot.html")
show(p)
The label can also accept a list of column names, in which case the data is grouped by all the groups in the list:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label=['cyl', 'origin'],
title="MPG Summary (grouped by CYL, ORIGIN)")
output_file("boxplot.html")
show(p)
The color of the box in a BoxPlot
can be set to a fixed color using the
color
parameter:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl', color='#00cccc',
title="MPG Summary (grouped by CYL)")
output_file("boxplot.html")
show(p)
As with Bar
charts, the color can also be given a column name, in which
case the boxes are shaded automatically according to the group:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl', color='cyl',
title="MPG Summary (grouped and shaded by CYL)")
output_file("boxplot.html")
show(p)
The color of the whiskers can be similary controlled using the whisker_color
paramter. For a single color:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl', whisker_color='goldenrod',
title="MPG Summary (grouped by CYL, shaded whiskers)")
output_file("boxplot.html")
show(p)
Or shaded automatically according to a column grouping:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl', whisker_color='cyl',
title="MPG Summary (grouped and whiskers shaded by CYL)")
output_file("boxplot.html")
show(p)
By default, BoxPlot
charts show outliers above and below the whiskers.
However, the display of outliers can be turned on or off with the outliers
parameter:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl', outliers=False,
title="MPG Summary (grouped by CYL, no outliers)")
output_file("boxplot.html")
show(p)
The marker used for displaying outliers is controlled by the marker
parameter:
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = BoxPlot(df, values='mpg', label='cyl', marker='square',
title="MPG Summary (grouped by CYL, square marker)")
output_file("boxplot.html")
show(p)
The Histogram
high-level chart can be used to quickly display the
distribution of values in a set of data. It can be used by simply
passing it a literal sequence of values (e.g a python list, NumPy
or Pandas DataFrame column):
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df['mpg'], title="MPG Distribution")
output_file("histogram.html",)
show(p)
It can also be used by passing in a Pandas Dataframe as the first argument, and specifying the name of the column to use for the data. The column name can be provided as the second positional argument:
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df, 'hp', title="HP Distribution")
output_file("histogram.html",)
show(p)
Or explicitly as the values
keyword argument:
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df, values='displ', title="DISPL Distribution")
output_file("histogram.html",)
show(p)
The bins
argument can be used to specify the number of bins to use when
computing the histogram:
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df, values='mpg', bins=50,
title="MPG Distribution (50 bins)")
output_file("histogram_bins.html")
show(p)
It is also possible to control the color of the histogram bins by setting
the color
parameter:
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df, values='hp', color='navy', title="HP Distribution")
output_file("histogram_color.html")
show(p)
However, the color
parameter can also be used to group the data. If the
value of the color
parameter is one of the DataFrame column names, the data
is first grouped by this column, and a histogram is generated for each group.
Each histogram is automatically colored differently, and a legend displayed:
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df, values='hp', color='cyl',
title="HP Distribution (color grouped by CYL)",
legend='top_right')
output_file("histogram_color.html")
show(p)
The Scatter
high-level chart can be used to generate 1D or (more commonly)
2D scatter plots. It is used by passing in DataFrame-like object as the first
argument then specifying the columns to use for x
and y
coordinates:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='mpg', y='hp', title="HP vs MPG",
xlabel="Miles Per Gallon", ylabel="Horsepower")
output_file("scatter.html")
show(p)
The color
parameter can be used to control the color of the scatter
markers:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='mpg', y='hp', title="HP vs MPG", color="navy",
xlabel="Miles Per Gallon", ylabel="Horsepower")
output_file("scatter.html")
show(p)
if color
is supplied with the name of a data column then the data is first
grouped by the values of that column, and then a different color is used for
every group:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='mpg', y='hp', color='cyl', title="HP vs MPG (shaded by CYL)",
xlabel="Miles Per Gallon", ylabel="Horsepower")
output_file("scatter.html")
show(p)
When grouping, a legend is usually useful, and it’s location can be specified
by the legend
parameter:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='displ', y='hp', color='cyl',
title="HP vs DISPL (shaded by CYL)", legend="top_left",
xlabel="Displacement", ylabel="Horsepower")
output_file("scatter.html")
show(p)
The marker
parameter can be used to control the shape of the scatter marker:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='displ', y='hp', marker='square',
title="HP vs DISPL", legend="top_left",
xlabel="Displacement", ylabel="Horsepower")
output_file("scatter.html")
show(p)
As with color
, the marker
parameter can be given a column name to group
by the values of that column, using a different marker shape for each group:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='displ', y='hp', marker='cyl',
title="HP vs DISPL (marked by CYL)", legend="top_left",
xlabel="Displacement", ylabel="Horsepower")
output_file("scatter.html")
show(p)
Often it is most useful to group both the color and marker shape together:
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Scatter(df, x='displ', y='hp', marker='cyl', color='cyl',
title="HP vs DISPL (marked by CYL)", legend="top_left",
xlabel="Displacement", ylabel="Horsepower")
output_file("scatter.html")
show(p)
The bokeh.charts
modules contains a defaults
attribute. Setting
attributes on this object is an easy way to control default properties
on all charts created, in one place. For instance:
from bokeh.charts import defaults
defaults.width = 450
defaults.height = 350
will set the default width and height for any chart. The full list of attributes that can be set is below:
ChartOptions
(**kwargs)¶Bases: bokeh.plot_object.PlotObject
legend
¶property type: Either
(Bool
, Enum
(‘top_right’, ‘top_left’, ‘bottom_left’, ‘bottom_right’))
A location where the legend should draw itself.
notebook
¶property type: Either
(Bool
, String
)
Whether to display the plot inline in an IPython/Jupyter notebook.
responsive
¶property type: Bool
If True, the chart will automatically resize based on the size of its container. The
aspect ratio of the plot will be preserved, but plot_width
and plot_height
will
act only to set the initial aspect ratio.
[
{
"attributes": {
"doc": null,
"filename": false,
"height": 400,
"id": "bda3a822-b175-4b8b-9efc-315307744f1c",
"legend": null,
"name": null,
"notebook": false,
"responsive": false,
"server": false,
"tags": [],
"title": null,
"title_text_font_size": "12pt",
"tools": true,
"width": 600,
"xgrid": true,
"xlabel": null,
"xscale": "auto",
"ygrid": true,
"ylabel": null,
"yscale": "auto"
},
"id": "bda3a822-b175-4b8b-9efc-315307744f1c",
"type": "ChartOptions"
}
]