Warning
bokeh.charts interface is still new, and is very likely to change in upcoming releases. Although we always try to be consistent, we cannot guarantee backwards compatibility for now. Please take this into consideration when using it.
bokeh.charts provides a very high level API to create rich charts commonly used without having to access lower level components.
The current bokeh.charts interface implementation supports the following chart types:
To use them, you only have to import the chart factory of interest from bokeh.charts:
from bokeh.charts import Histogram
initialize your plot with the chart specific arguments to customize the chart:
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
normal_dist = OrderedDict(normal=normal)
hist = Histogram(normal_dist, bins=50, mu=mu, sigma=sigma,
title="kwargs, dict_input", ylabel="frequency", legend="top_left",
width=400, height=350, notebook=True)
and finally call the show() method:
hist.show()
or use the plotting interface functions:
from bokeh.plotting import output_file, show
output_file('histogram.html')
show(hist)
Charts support a long list of arguments that you can pass when instantiating a class, as we have shown before. Available optional arguments are:
bokeh.charts support any of the following:
In general inputs are supposed to be iterables representing each single data series values (i.e: list of lists, dict/ordered dict of lists, etc.. containing iterable of scalar values). The idea behind this canonical format is to easily represent groups of data and easily plot them through the interface.
Note
Scatter chart also supports pandas groupby objects as input. As we have mentioned bokeh.charts is still very experimental so the number of supported inputs is very likely to grow.
Here are a few examples showing charts using different kind of inputs:
Using a pandas groupby object (only supported by Scatter):
from bokeh.sampledata.iris import flowers
from bokeh.charts import Scatter
df = flowers[["petal_length", "petal_width", "species"]]
g = df.groupby("species")
scatter = Scatter(g, filename="iris_scatter.html", title="iris dataset GroupBy")
scatter.show()
Using OrderedDict (or dict-like objects):
from collections import OrderedDict
xyvalues = OrderedDict()
for i in ['setosa', 'versicolor', 'virginica']:
x = getattr(g.get_group(i), 'petal_length')
y = getattr(g.get_group(i), 'petal_width')
xyvalues[i] = list(zip(x, y))
scatter = Scatter(xyvalues, filename="iris_scatter.html", title="iris dataset, OrderedDic")
scatter.show()
Using a hierarchical pandas dataframe:
import pandas as pd
dfvalues = pd.DataFrame(xyvalues)
scatter = Scatter(dfvalues, filename="iris_scatter.html", title="iris dataset, DataFrame")
scatter.show()
Using a list:
lxyvalues = xyvalues.values()
scatter = Scatter(lxyvalues, filename="iris_scatter.html", title="iris dataset, List")
scatter.show()
Using a numpy array:
import numpy as np
nxyvalues = np.array(xyvalues.values())
scatter = Scatter(nxyvalues, filename="iris_scatter.html", title="iris dataset, Array")
scatter.show()
All the previous examples render the chart in Scatter with the difference that numpy array and list inputs will render different legends from mappings like dict, OrderedDict, pandas DataFrame or GroupBy objects (if legend is True).
For some chart types we support specific arguments which only make sense in that specific chart context. For instance, if you use a Timeseries chart, the x-value (index) for each group has to be datetime values. Or, if you want to use the Categorical HeatMap, columns names and the specified index have to be string type values.
Going ahead with a few more examples: as you have seen before, in the Histogram chart you need to setup the bins and, additionally, you can pass a mu and sigma to get the pdf and the cdf line plots of theoretical normal distributions for these parameters.
In the Bar charts case, if you pass several groups, they will be shown grouped by default:
But if you specify the argument stacked as True, it will be shown as stacked bars as follows:
So, besides the shared arguments specified in Generic arguments and the general Interface inputs we have listed in the previous paragraph, each class support the following custom arguments:
Example:
from collections import OrderedDict
from bokeh.charts import Area, show, output_file
# create some example data
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26, 221, 44, 233, 254, 265, 266, 267, 120, 111],
pypy=[12, 33, 47, 15, 126, 121, 144, 233, 254, 225, 226, 267, 110, 130],
jython=[22, 43, 10, 25, 26, 101, 114, 203, 194, 215, 201, 227, 139, 160],
)
output_file(filename="area.html")
area = Area(
xyvalues, title="Area Chart",
xlabel='time', ylabel='memory',
stacked=True, legend="top_left"
).legend("top_left")
show(area)
In the case where no continuous_range object is passed, it is calculated based on the data provided in values, according to the following rules:
Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.olympics2014 import data
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 0]
df = df.sort("medals.total", ascending=False)
# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict(bronze=bronze, silver=silver, gold=gold)
# any of the following commented are also alid Bar inputs
#medals = pd.DataFrame(medals)
#medals = list(medals.values())
output_file("stacked_bar.html")
bar = Bar(medals, countries, title="Stacked bars", stacked=True)
show(bar)
Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import BoxPlot, output_file, show
from bokeh.sampledata.olympics2014 import data
# create a DataFrame with the sample data
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 0]
df = df.sort("medals.total", ascending=False)
# get the countries and group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict(bronze=bronze, silver=silver, gold=gold)
# any of the following commented are valid BoxPlot inputs
#medals = pd.DataFrame(medals)
#medals = list(medals.values())
#medals = tuple(medals.values())
#medals = np.array(list(medals.values()))
output_file("boxplot.html")
boxplot = BoxPlot(
medals, marker='circle', outliers=True, title="boxplot test",
xlabel="medal type", ylabel="medal count", width=800, height=600)
show(boxplot)
Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Donut, show, output_file
from bokeh.sampledata.olympics2014 import data
# throw the data into a pandas data frame
df = pd.io.json.json_normalize(data['data'])
# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 8]
df = df.sort("medals.total", ascending=False)
# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values
# build a dict containing the grouped data
medals = OrderedDict()
medals['bronze'] = bronze
medals['silver'] = silver
medals['gold'] = gold
# any of the following commented are also valid Donut inputs
#medals = list(medals.values())
#medals = np.array(list(medals.values()))
#medals = pd.DataFrame(medals)
output_file("donut.html")
donut = Donut(medals, countries)
show(donut)
Example:
from collections import OrderedDict
from bokeh.charts import Dot, show, output_file
# create some example data
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26],
pypy=[12, 33, 47, 15, 126],
jython=[22, 43, 10, 25, 26],
)
# any of the following commented are also valid Dot inputs
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = list(xyvalues.values())
#xyvalues = np.array(list(xyvalues.values()))
output_file("dots.html")
dots = Dot(
xyvalues, cat=['lists','loops','dicts', 'gen exp', 'exceptions'],
title="Dots Example", ylabel='Performance', legend=True
)
show(dots)
Example:
from bokeh.charts import HeatMap, output_file, show
from bokeh.sampledata.unemployment1948 import data
# pandas magic
df = data[data.columns[:-2]]
df2 = df.set_index(df[df.columns[0]].astype(str))
df2.drop(df.columns[0], axis=1, inplace=True)
df3 = df2.transpose()
output_file("cat_heatmap.html")
hm = HeatMap(df3, title="categorical heatmap", width=800)
show(hm)
Example:
from collections import OrderedDict
import numpy as np
import pandas as pd
from bokeh.charts import Histogram, show, output_file
# build some distributions and load them into a dict
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
lognormal = np.random.lognormal(mu, sigma, 1000)
distributions = OrderedDict(normal=normal, lognormal=lognormal)
# create a pandas data frame from the dict
df = pd.DataFrame(distributions)
distributions = df.to_dict()
for k, v in distributions.items():
distributions[k] = v.values()
# any of the following commented are valid Histogram inputs
#df = list(distributions.values())
#df = tuple(distributions.values())
#df = tuple([tuple(x) for x in distributions.values()])
#df = np.array(list(distributions.values()))
#df = list(distributions.values())[0]
output_file("histograms.html")
hist = Histogram(df, bins=50, legend=True)
show(hist)
Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import Horizon, output_file, show
# read in some stock data from the Yahoo Finance API
AAPL = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
MSFT = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
IBM = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=IBM&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
xyvalues = OrderedDict(
AAPL=AAPL['Adj Close'],
Date=AAPL['Date'],
MSFT=MSFT['Adj Close'],
IBM=IBM['Adj Close'],
)
output_file("horizon.html")
hp = Horizon(
xyvalues, index='Date',
title="horizon plot using stock inputs",
width=800, height=300
)
show(hp)
Example:
from collections import OrderedDict
from bokeh.charts import Line, show, output_file
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26, 221, 44, 233, 254, 265, 266, 267, 120, 111],
pypy=[12, 33, 47, 15, 126, 121, 144, 233, 254, 225, 226, 267, 110, 130],
jython=[22, 43, 10, 25, 26, 101, 114, 203, 194, 215, 201, 227, 139, 160],
)
# any of the following commented are also valid Line inputs
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = xyvalues.values()
#xyvalues = np.array(xyvalues.values())
output_file("lines.html", title="line.py example")
chart = Line(xyvalues, title="Lines", ylabel='measures', legend=True)
show(chart)
Example:
from collections import OrderedDict
from bokeh.charts import Scatter, output_file, show
from bokeh.sampledata.iris import flowers
# fill a data frame with the data of interest and create a groupby object
df = flowers[["petal_length", "petal_width", "species"]]
xyvalues = g = df.groupby("species")
# drop that groupby object into a dict
pdict = OrderedDict()
for i in g.groups.keys():
labels = g.get_group(i).columns
xname = labels[0]
yname = labels[1]
x = getattr(g.get_group(i), xname)
y = getattr(g.get_group(i), yname)
pdict[i] = zip(x, y)
# any of the following commented are also valid Scatter inputs
#xyvalues = pdict
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = xyvalues.values()
#xyvalues = np.array(xyvalues.values())
output_file("iris_scatter.html")
TOOLS="resize,crosshair,pan,wheel_zoom,box_zoom,reset,previewsave"
scatter = Scatter(xyvalues, tools=TOOLS, ylabel='petal_width')
show(scatter)
Example:
from collections import OrderedDict
from bokeh.charts import Step, show, output_file
xyvalues = OrderedDict(
python=[2, 3, 7, 5, 26, 81, 44, 93, 94, 105, 66, 67, 90, 83],
pypy=[12, 20, 47, 15, 126, 121, 144, 333, 354, 225, 276, 287, 270, 230],
jython=[22, 43, 70, 75, 76, 101, 114, 123, 194, 215, 201, 227, 139, 160],
)
# any of the following commented are also valid Step inputs
#xyvalues = pd.DataFrame(xyvalues)
#xyvalues = list(xyvalues.values())
#xyvalues = np.array(list(xyvalues.values()))
output_file("steps.html", title="line.py example")
chart = Step(xyvalues, title="Steps", ylabel='measures', legend='top_left')
show(chart)
Example:
from collections import OrderedDict
import pandas as pd
from bokeh.charts import TimeSeries, show, output_file
# read in some stock data from the Yahoo Finance API
AAPL = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
MSFT = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
IBM = pd.read_csv(
"http://ichart.yahoo.com/table.csv?s=IBM&a=0&b=1&c=2000&d=0&e=1&f=2010",
parse_dates=['Date'])
xyvalues = OrderedDict(
AAPL=AAPL['Adj Close'],
Date=AAPL['Date'],
MSFT=MSFT['Adj Close'],
IBM=IBM['Adj Close'],
)
# any of the following commented are valid Bar inputs
#xyvalues = pd.DataFrame(xyvalues)
#lindex = xyvalues.pop('Date')
#lxyvalues = list(xyvalues.values())
#lxyvalues = np.array(xyvalues.values())
TOOLS="resize,pan,wheel_zoom,box_zoom,reset,previewsave"
output_file("stocks_timeseries.html")
ts = TimeSeries(
xyvalues, index='Date', legend=True,
title="timeseries, pd_input", tools=TOOLS, ylabel='Stock Prices')
# usage with iterable index
#ts = TimeSeries(
# lxyvalues, index=lindex,
# title="timeseries, pd_input", ylabel='Stock Prices')
show(ts)
Here you can find a summary table that makes it easier to group and visualize those differences:
Argument | Area | Bar | BoxPlot | HeatMap | Donut | Dot | Histogram | Horizon | Line | Scatter | Step | TimeSeries |
---|---|---|---|---|---|---|---|---|---|---|---|---|
values | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
index | Yes | No | No | No | No | No | No | Yes | Yes | No | Yes | Yes |
cat | No | Yes | No | Yes | No | Yes | No | No | No | No | No | No |
stacked | Yes | Yes | No | No | No | No | No | No | No | No | No | No |
pallette | No | No | No | Yes | No | No | No | No | No | No | No | No |
bins | No | No | No | No | No | No | Yes | No | No | No | No | No |
mu | No | No | No | No | No | No | Yes | No | No | No | No | |
sigma | No | No | No | No | No | No | Yes | No | No | No | No | No |
num_folds | No | No | No | No | No | No | No | Yes | No | No | No | No |
pos_color | No | No | No | No | No | No | No | Yes | No | No | No | No |
ned_color | No | No | No | No | No | No | No | Yes | No | No | No | No |
Note
Scatter values are supposed to be iterables of coupled values. I.e.: [[(1, 20), ..., (200, 21)], ..., [(1, 12),... (200, 19)]]
As with the low and middle level Bokeh plotting APIs, in bokeh.charts, we also support the chart output to:
a file:
hist = Histogram(distributions, bins=50, filename="hist.html")
hist.show()
# or use
from bokeh.plotting import output_file, show
output_file('hist.html')
show(hist)
to bokeh-server:
hist = Histogram(distributions, bins=50, server=True)
hist.show()
# or use
from bokeh.plotting import output_server, show
output_server('hist')
show(hist)
to IPython notebook:
hist = Histogram(distributions, bins=50, notebook=True)
hist.show()
# or use
from bokeh.plotting import output_notebook, show
output_notebook()
show(hist)
Note
You can output to any or all of these 3 possibilities because, right now, they are not mutually exclusive.
Since 0.8 release Charts creation is streamlined by specific objects called Builders. Builders are convenience classes that create all computation, validation and low-level geometries needed to render a High Level Chart. This provides clear pattern to easily extend the Charts interface with new charts. For more info about this refer to Builders reference.