Warning
bokeh.charts
interface is still new, and is very likely to change
in upcoming releases. Although we always try to be consistent, we cannot guarantee
backwards compatibility for now. Please take this into consideration when using it.
bokeh.charts
provides a very high level API to create rich charts commonly used without
having to access lower level components.
The current bokeh.charts
interface implementation supports the following chart types:
Area
(overlapped and stacked)Bar
(grouped and stacked)BoxPlot
Donut
Dot
HeatMap
Histogram
Horizon
Line
Scatter
Step
Timeseries
To use them, you only have to import the chart factory of interest from bokeh.charts
:
from bokeh.charts import Histogram
initialize your plot with the chart specific arguments to customize the chart:
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
normal_dist = OrderedDict(normal=normal)
hist = Histogram(normal_dist, bins=50, mu=mu, sigma=sigma,
title="kwargs, dict_input", ylabel="frequency", legend="top_left",
width=400, height=350, notebook=True)
and finally call the show()
method:
hist.show()
or use the plotting interface functions:
from bokeh.plotting import output_file, show
output_file('histogram.html')
show(hist)
Charts support a long list of arguments that you can pass when instantiating a class, as we have shown before. Available optional arguments are:
title
(str): the title of your chart.xlabel
(str): the x-axis label of your chart.ylabel
(str): the y-axis label of your chart.legend
(str, bool): the legend of your chart.xscale
(str): the x-axis type scale of your chart.yscale
(str): the y-axis type scale of your chart.xgrid
(bool): whether to draw an x-grid.ygrid
(bool): whether to draw an y-grid.width
(int): the width of your plot in pixels.height
(int): the height of you plot in pixels.tools
(str or bool): to enable or disable the tools in your chart.palette
(list): a list containing the colormap as hex values.filename
(str or bool): the name of the file where your chart will be written.server
(str or bool): the name of your chart in the server.notebook
(bool):if you want to output (or not) your chart into the IPython notebook.bokeh.charts
support any of the following:
list
dict
OrderedDict
arrays
DataFrame objects
In general inputs are supposed to be iterables representing each single data series values (i.e: list of lists, dict/ordered dict of lists, etc.. containing iterable of scalar values). The idea behind this canonical format is to easily represent groups of data and easily plot them through the interface.
Note
Scatter chart also supports pandas groupby objects as input. As we have
mentioned bokeh.charts
is still very experimental so the number of supported
inputs is very likely to grow.
Here are a few examples showing charts using different kind of inputs:
Using a pandas groupby
object (only supported by Scatter):
from bokeh.sampledata.iris import flowers
from bokeh.charts import Scatter
df = flowers[["petal_length", "petal_width", "species"]]
g = df.groupby("species")
scatter = Scatter(g, filename="iris_scatter.html", title="iris dataset GroupBy")
scatter.show()
Using OrderedDict
(or dict-like objects):
from collections import OrderedDict
xyvalues = OrderedDict()
for i in ['setosa', 'versicolor', 'virginica']:
x = getattr(g.get_group(i), 'petal_length')
y = getattr(g.get_group(i), 'petal_width')
xyvalues[i] = list(zip(x, y))
scatter = Scatter(xyvalues, filename="iris_scatter.html", title="iris dataset, OrderedDic")
scatter.show()
Using a hierarchical
pandas dataframe
:
import pandas as pd
dfvalues = pd.DataFrame(xyvalues)
scatter = Scatter(dfvalues, filename="iris_scatter.html", title="iris dataset, DataFrame")
scatter.show()
Using a list
:
lxyvalues = xyvalues.values()
scatter = Scatter(lxyvalues, filename="iris_scatter.html", title="iris dataset, List")
scatter.show()
Using a numpy array
:
import numpy as np
nxyvalues = np.array(xyvalues.values())
scatter = Scatter(nxyvalues, filename="iris_scatter.html", title="iris dataset, Array")
scatter.show()
All the previous examples render the chart in Scatter with
the difference that numpy array
and list
inputs will render different legends from
mappings like dict
, OrderedDict
, pandas DataFrame
or GroupBy
objects
(if legend
is True).
For some chart types we support specific arguments which only make sense in that specific chart context. For instance, if you use a Timeseries chart, the x-value (index) for each group has to be datetime values. Or, if you want to use the Categorical HeatMap, columns names and the specified index have to be string type values.
Going ahead with a few more examples: as you have seen before, in the Histogram
chart you need to setup the bins
and, additionally, you can pass a mu
and sigma
to get the pdf
and the cdf
line plots of theoretical
normal distributions for these parameters.
In the Bar charts case, if you pass several groups, they will be shown grouped
by default:
But if you specify the argument stacked
as True, it will be shown as stacked
bars as follows:
So, besides the shared arguments specified in Generic arguments and the general Interface inputs we have listed in the previous paragraph, each class support the following custom arguments:
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.index
(str | 1d iterable of any sort, optional): can be used to specify a common custom index for all data series as follows:key
of the mapping to be used as index (and not as data series) if area.values
is a mapping (like a dict
, an OrderedDict
or a pandas DataFrame
)stacked
(bool, optional):True
: areas are draw as a stack to show the relationship of parts to a wholeFalse
: areas are layered on the same chart figure. Defaults to False
.Example:
from collections import OrderedDict
from bokeh.charts import Area, show, output_file
# create some example data
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26, 221, 44, 233, 254, 265, 266, 267, 120, 111],
pypy=[12, 33, 47, 15, 126, 121, 144, 233, 254, 225, 226, 267, 110, 130],
jython=[22, 43, 10, 25, 26, 101, 114, 203, 194, 215, 201, 227, 139, 160],
)
output_file(filename="area.html")
area = Area(
xyvalues, title="Area Chart",
xlabel='time', ylabel='memory',
stacked=True, legend="top_left"
).legend("top_left")
show(area)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.cat
(list, optional): list of string representing the categories. Defaults to None.stacked
(bool, optional):True
: bars are draw as a stack to show the relationship of parts to a whole.False
: bars are groupped on the same chart figure. Defaults to False
.continuous_range
(Range, optional): An explicit range for the continuous
axis of the chart (the y-dimension).In the case where no continuous_range
object is passed, it is calculated
based on the data provided in values, according to the following rules:
Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.olympics2014 import data
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 0]
df = df.sort("medals.total", ascending=False)
# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict(bronze=bronze, silver=silver, gold=gold)
# any of the following commented are also alid Bar inputs
#medals = pd.DataFrame(medals)
#medals = list(medals.values())
output_file("stacked_bar.html")
bar = Bar(medals, countries, title="Stacked bars", stacked=True)
show(bar)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.marker
(int or string, optional): the marker type to use if outliers=True (e.g., circle). Defaults to circle.outliers
(bool, optional): whether or not to plot outliers. Defaults to True
.Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.olympics2014 import data
# create a DataFrame with the sample data
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 0]
df = df.sort("medals.total", ascending=False)
# get the countries and group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict(bronze=bronze, silver=silver, gold=gold)
# any of the following commented are valid BoxPlot inputs
#medals = pd.DataFrame(medals)
#medals = list(medals.values())
#medals = tuple(medals.values())
#medals = np.array(list(medals.values()))
output_file("boxplot.html")
boxplot = BoxPlot(
medals, marker='circle', outliers=True, title="boxplot test",
xlabel="medal type", ylabel="medal count", width=800, height=600)
show(boxplot)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Donut, show, output_file
from bokeh.sampledata.olympics2014 import data
# throw the data into a pandas data frame
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 8]
df = df.sort("medals.total", ascending=False)
# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict()
medals['bronze'] = bronze
medals['silver'] = silver
medals['gold'] = gold
# any of the following commented are also valid Donut inputs
#medals = list(medals.values())
#medals = np.array(list(medals.values()))
#medals = pd.DataFrame(medals)
output_file("donut.html")
donut = Donut(medals, countries)
show(donut)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.cat
(list, optional): list of string representing the categories. Defaults to None.Example:
from collections import OrderedDict
from bokeh.charts import Dot, show, output_file
# create some example data
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26],
pypy=[12, 33, 47, 15, 126],
jython=[22, 43, 10, 25, 26],
)
# any of the following commented are also valid Dot inputs
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = list(xyvalues.values())
#xyvalues = np.array(list(xyvalues.values()))
output_file("dots.html")
dots = Dot(
xyvalues, cat=['lists','loops','dicts', 'gen exp', 'exceptions'],
title="Dots Example", ylabel='Performance', legend=True
)
show(dots)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.cat
(list, optional): list of string representing the categories. Defaults to None.Example:
from bokeh.charts import HeatMap, output_file, show
from bokeh.sampledata.unemployment1948 import data
# pandas magic
df = data[data.columns[:-2]]
df2 = df.set_index(df[df.columns[0]].astype(str))
df2.drop(df.columns[0], axis=1, inplace=True)
df3 = df2.transpose()
output_file("cat_heatmap.html")
hm = HeatMap(df3, title="categorical heatmap", width=800)
show(hm)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.bins
(int): number of bins to use when building the Histogram.mu
(float, optional): theoretical mean value for the normal distribution. Defaults to None
.sigma
(float, optional): theoretical sigma value for the normal distribution. Defaults to None
.Example:
from collections import OrderedDict
import numpy as np
import pandas as pd
from bokeh.charts import Histogram, show, output_file
# build some distributions and load them into a dict
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
lognormal = np.random.lognormal(mu, sigma, 1000)
distributions = OrderedDict(normal=normal, lognormal=lognormal)
# create a pandas data frame from the dict
df = pd.DataFrame(distributions)
distributions = df.to_dict()
for k, v in distributions.items():
distributions[k] = v.values()
# any of the following commented are valid Histogram inputs
#df = list(distributions.values())
#df = tuple(distributions.values())
#df = tuple([tuple(x) for x in distributions.values()])
#df = np.array(list(distributions.values()))
#df = list(distributions.values())[0]
output_file("histograms.html")
hist = Histogram(df, bins=50, legend=True)
show(hist)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.index
(str | 1d iterable of any sort, optional): can be used to specify a common custom index for all data series as follows:key
of the mapping to be used as index (and not as data series) if area.values
is a mapping (like a dict
, an OrderedDict
or a pandas DataFrame
)num_folds
(int, optional): number of folds stacked on top of each other. (default: 3)pos_color
(color, optional): The color of the positive folds. Defaults to #006400
.neg_color
(color, optional): The color of the negative folds. Defaults to #6495ed
.Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Horizon, output_file, show
# read in some stock data from the Yahoo Finance API
AAPL = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
MSFT = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
IBM = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=IBM&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
xyvalues = OrderedDict(
AAPL=AAPL['Adj Close'],
Date=AAPL['Date'],
MSFT=MSFT['Adj Close'],
IBM=IBM['Adj Close'],
)
output_file("horizon.html")
hp = Horizon(
xyvalues, index='Date',
title="horizon plot using stock inputs",
width=800, height=300
)
show(hp)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.index
(str | 1d iterable of any sort, optional): can be used to specify a common custom index for all chart data series as follows:key
of the mapping to be used as index (and not as data series) if area.values
is a mapping (like a dict
, an OrderedDict
or a pandas DataFrame
)Example:
from collections import OrderedDict
from bokeh.charts import Line, show, output_file
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26, 221, 44, 233, 254, 265, 266, 267, 120, 111],
pypy=[12, 33, 47, 15, 126, 121, 144, 233, 254, 225, 226, 267, 110, 130],
jython=[22, 43, 10, 25, 26, 101, 114, 203, 194, 215, 201, 227, 139, 160],
)
# any of the following commented are also valid Line inputs
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = xyvalues.values()
#xyvalues = np.array(xyvalues.values())
output_file("lines.html", title="line.py example")
chart = Line(xyvalues, title="Lines", ylabel='measures', legend=True)
show(chart)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of x, y pairs, like i.e.: [(1, 2), (2, 7), ..., (20122, 91)]
Example:
from collections import OrderedDict
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.iris import flowers
# fill a data frame with the data of interest and create a groupby object
df = flowers[["petal_length", "petal_width", "species"]]
xyvalues = g = df.groupby("species")
# drop that groupby object into a dict
pdict = OrderedDict()
for i in g.groups.keys():
labels = g.get_group(i).columns
xname = labels[0]
yname = labels[1]
x = getattr(g.get_group(i), xname)
y = getattr(g.get_group(i), yname)
pdict[i] = zip(x, y)
# any of the following commented are also valid Scatter inputs
#xyvalues = pdict
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = xyvalues.values()
#xyvalues = np.array(xyvalues.values())
output_file("iris_scatter.html")
TOOLS="resize,crosshair,pan,wheel_zoom,box_zoom,reset,previewsave"
scatter = Scatter(xyvalues, tools=TOOLS, ylabel='petal_width')
show(scatter)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.index
(str | 1d iterable of any sort, optional): can be used to specify a common custom index for all chart data series as follows:key
of the mapping to be used as index (and not as data series) if area.values
is a mapping (like a dict
, an OrderedDict
or a pandas DataFrame
)Example:
from collections import OrderedDict
from bokeh.charts import Step, show, output_file
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26, 81, 44, 93, 94, 105, 66, 67, 90, 83],
pypy=[12, 20, 47, 15, 126, 121, 144, 333, 354, 225, 276, 287, 270, 230],
jython=[22, 43, 70, 75, 76, 101, 114, 123, 194, 215, 201, 227, 139, 160],
)
# any of the following commented are also valid Step inputs
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = list(xyvalues.values())
#xyvalues = np.array(list(xyvalues.values()))
output_file("steps.html", title="line.py example")
chart = Step(xyvalues, title="Steps", ylabel='measures', legend='top_left')
show(chart)
values
(see Interface inputs): data series to be plotted. Container values must be 1d iterable of scalars.index
(str | 1d iterable of any sort of datetime
values, optional): can be used to specify a common custom index for all chart data series as follows:key
of the mapping to be used as index (and not as data series) if area.values
is a mapping (like a dict
, an OrderedDict
or a pandas DataFrame
)Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import TimeSeries, show, output_file
# read in some stock data from the Yahoo Finance API
AAPL = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
MSFT = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
IBM = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=IBM&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
xyvalues = OrderedDict(
AAPL=AAPL['Adj Close'],
Date=AAPL['Date'],
MSFT=MSFT['Adj Close'],
IBM=IBM['Adj Close'],
)
# any of the following commented are valid Bar inputs
#xyvalues = pd.DataFrame(xyvalues)
#lindex = xyvalues.pop('Date')
#lxyvalues = list(xyvalues.values())
#lxyvalues = np.array(xyvalues.values())
TOOLS="resize,pan,wheel_zoom,box_zoom,reset,previewsave"
output_file("stocks_timeseries.html")
ts = TimeSeries(
xyvalues, index='Date', legend=True,
title="timeseries, pd_input", tools=TOOLS, ylabel='Stock Prices')
# usage with iterable index
#ts = TimeSeries(
# lxyvalues, index=lindex,
# title="timeseries, pd_input", ylabel='Stock Prices')
show(ts)
Here you can find a summary table that makes it easier to group and visualize those differences:
Argument | Area | Bar | BoxPlot | HeatMap | Donut | Dot | Histogram | Horizon | Line | Scatter | Step | TimeSeries |
---|---|---|---|---|---|---|---|---|---|---|---|---|
values | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
index | Yes | No | No | No | No | No | No | Yes | Yes | No | Yes | Yes |
cat | No | Yes | No | Yes | No | Yes | No | No | No | No | No | No |
stacked | Yes | Yes | No | No | No | No | No | No | No | No | No | No |
pallette | No | No | No | Yes | No | No | No | No | No | No | No | No |
bins | No | No | No | No | No | No | Yes | No | No | No | No | No |
mu | No | No | No | No | No | No | Yes | No | No | No | No | |
sigma | No | No | No | No | No | No | Yes | No | No | No | No | No |
num_folds | No | No | No | No | No | No | No | Yes | No | No | No | No |
pos_color | No | No | No | No | No | No | No | Yes | No | No | No | No |
ned_color | No | No | No | No | No | No | No | Yes | No | No | No | No |
Note
Scatter values are supposed to be iterables of coupled values. I.e.: [[(1, 20), ..., (200, 21)], ..., [(1, 12),... (200, 19)]]
As with the low and middle level Bokeh
plotting APIs, in bokeh.charts
,
we also support the chart output to:
a file:
hist = Histogram(distributions, bins=50, filename="hist.html")
hist.show()
# or use
from bokeh.plotting import output_file, show
output_file('hist.html')
show(hist)
to bokeh-server
:
hist = Histogram(distributions, bins=50, server=True)
hist.show()
# or use
from bokeh.plotting import output_server, show
output_server('hist')
show(hist)
to IPython notebook:
hist = Histogram(distributions, bins=50, notebook=True)
hist.show()
# or use
from bokeh.plotting import output_notebook, show
output_notebook()
show(hist)
Note
You can output to any or all of these 3 possibilities because, right now, they are not mutually exclusive.
Since 0.8 release Charts creation is streamlined by specific objects called Builders. Builders are convenience classes that create all computation, validation and low-level geometries needed to render a High Level Chart. This provides clear pattern to easily extend the Charts interface with new charts. For more info about this refer to Builders reference.