Handling Categorical Data¶
Note
Several examples in this chapter use Pandas, for ease of presentation
and because it is a common tool for data manipulation. However, Pandas
is not required to create anything shown here.
Bars¶
Basic¶
Bokeh make it simple to create basic bar charts using the
hbar()
and
vbar()
glyphs methods. In the example
below, we have the following sequence of simple 1-level factors:
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
To inform Bokeh that the x-axis is categorical, we pass this list of factors
as the x_range
argument to :fund:~bokeh.plotting.figure.figure
:
p = figure(x_range=fruits, ... )
Note that passing the list of factors is a convenient shorthand notation for
creating a FactorRange
. The equivalent explicit
notation is:
p = figure(x_range=FactorRange(factors=fruits), ... )
This more explicit for is useful when you want to customize the
FactorRange
, e.g. by changing the range or category padding.
Next we can call vbar
with the list of fruit name factors as the x
coordinate, the bar height as the top
coordinate, and optionally any
width
or other properties that we would like to set:
p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)
All put together, we see the output:
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
p = figure(x_range=fruits, plot_height=250, title="Fruit Counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
As usual, the data could also be put into a ColumnDataSource
supplied as
the source
parameter to vbar
instead of passing the data directly
as parameters. Later examples will demonstrate this.
Sorted¶
Since Bokeh displays bars in the order the factors are given for the range, “sorting” bars in a bar plot is identical to sorting the factors for the range.
In the example below the fruit factors are sorted in increasing order according to their corresponding counts, causing the bars to be sorted:
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("bar_sorted.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
# sorting the bars means sorting the range factors
sorted_fruits = sorted(fruits, key=lambda x: counts[fruits.index(x)])
p = figure(x_range=sorted_fruits, plot_height=350, title="Fruit Counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
Filled¶
Colors¶
Often times we may want to have bars that are shaded some color. This can be
accomplished in different ways. One way is to supply all the colors up front.
This can be done by putting all the data, including the colors for each bar,
in a ColumnDataSource
. Then the name of the column containing the colors
is passed to vbar
as the color
(or line_color
/fill_color
)
arguments. This is shown below:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6
from bokeh.plotting import figure
output_file("colormapped_bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Spectral6))
p = figure(x_range=fruits, y_range=(0,9), plot_height=250, title="Fruit Counts",
toolbar_location=None, tools="")
p.vbar(x='fruits', top='counts', width=0.9, color='color', legend="fruits", source=source)
p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)
Another way to shade the bars is to use a CategoricalColorMapper
that
colormaps the bars inside the browser. There is a function
factor_cmap()
that makes this simple to do:
factor_cmap('fruits', palette=Spectral6, factors=fruits))
This can be passed to vbar
in the same way as the column name in the
previous example. Putting everything together we obtain the same plot in
a different way:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
output_file("colormapped_bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
source = ColumnDataSource(data=dict(fruits=fruits, counts=counts))
p = figure(x_range=fruits, plot_height=250, toolbar_location=None, title="Fruit Counts")
p.vbar(x='fruits', top='counts', width=0.9, source=source, legend="fruits",
line_color='white', fill_color=factor_cmap('fruits', palette=Spectral6, factors=fruits))
p.xgrid.grid_line_color = None
p.y_range.start = 0
p.y_range.end = 9
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)
Stacked¶
Another common operation or bar charts is to stack bars on top of one
another. Bokeh makes this easy to do with the specialized
hbar_stack()
and
vbar_stack()
functions. The example
below shows the fruits data from above, but with the bars for each
fruit type stacked instead of grouped:
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("stacked.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
colors = ["#c9d9d3", "#718dbf", "#e84d60"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=data,
legend=[value(x) for x in years])
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
Note that behing the scenes, these functions work by stacking up the
successive columns in separate calls to vbar
or hbar
. This kind of
operation is akin the to dodge example above (i.e. the data in this case is
not in a “tidy” data format).
Sometimes we may want to stack bars that have both positive and negative extents. The example below shows how it is possible to create such a stacked bar chart that is split by positive and negative values:
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import GnBu3, OrRd3
from bokeh.plotting import figure
output_file("stacked_split.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
exports = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
'2015' : [-1, 0, -1, -3, -2, -1],
'2016' : [-2, -1, -3, -1, -2, -2],
'2017' : [-1, -2, -1, 0, -2, -2]}
p = figure(y_range=fruits, plot_height=250, x_range=(-16, 16), title="Fruit import/export, by year",
toolbar_location=None)
p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),
legend=["%s exports" % x for x in years])
p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),
legend=["%s imports" % x for x in years])
p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None
show(p)
Hover Tools¶
For stacked bar plots, Bokeh provides some special hover variables that are useful for common cases.
When stacking bars, Bokeh automatically sets the name
property for each
layer in the stack to be the value of the stack column for that layer. This
name value is accessible to hover tools via the $name
special variable.
Additionally, the hover variable @$name
can be used to look up values from
the stack column for each layer. For instance, if a user hovers over a stack
glyph with the name "US East"
, then @$name
is equivalent to
@{US East}
.
The example below demonstrates both of these hover variables:
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("stacked.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
colors = ["#c9d9d3", "#718dbf", "#e84d60"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")
p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=data,
legend=[value(x) for x in years])
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
Note that it is also possible to override the value of name
by passing it
manually to vbar_stack
and hbar_stack
. In this case, $@name
will
look up the column names provided by the user.
It may also sometimes be desirable to have a different hover tool for each
layer in the stack. For such cases, the hbar_stack
and vbar_stack
functions return a list of all the renderers created (one for each stack).
These can be used to customize different hover tools for each layer:
renderers = p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source,
legend=[value(x) for x in years], name=years)
for r in renderers:
year = r.name
hover = HoverTool(tooltips=[
("%s total" % year, "@%s" % year),
("index", "$index")
], renderers=[r])
p.add_tools(hover)
Grouped¶
When creating bar charts, it is often desirable to visually display the data according to sub-groups. There are two basic methods that can be used, depending on your use case: using nested categorical coordinates, or applying vidual dodges.
Nested Categories¶
If the coordinates of a plot range and data have two or three levels, then Bokeh will automatically group the factors on the axis, including a hierarchical tick labeling with separators between the groups. In the case of bar charts, this results in bars grouped together by the top-level factors. This is probably the most common way to achieve grouped bars, especially if you are starting from “tidy” data.
The example below shows this approach by creating a single column of
coordinates that are each 2-tuples of the form (fruit, year)
. Accordingly,
the plot groups the axes by fruit type, with a single call to vbar
:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
output_file("bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
We can also apply a color mapping, similar to the earlier example. To obtain
same grouped bar plot of fruits data as above, except with the bars shaded by
the year, changethe vbar
function call to use factor_cmap
for the
fill_color
:
p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",
# use the palette to colormap based on the the x[1:2] values
fill_color=factor_cmap('x', palette=palette, factors=years, start=1, end=2))
Recall that the factors are of the for (fruit, year)
. The start=1
and end=2
in the call to factor_cmap
select the second part of data
factors to use when color mapping.
Visual Dodge¶
Another method for achieving grouped bars is to explicitly specify a visual displacement for the bars. Such a visual offset is also referred to as a dodge.
In this scenario, our data is not “tidy”. Instead a single table with
rows indexed by factors (fruit, year)
, we have separate series for each
year. We can plot all the year series using separate calls to vbar
but
since every bar in each group has the same fruit
factor, the bars would
overlap visually. We can prevent this overlap and distinguish the bars
visually by using the dodge()
function to provide an
offset for each different call to vbar
:
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
output_file("dodged_bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
source = ColumnDataSource(data=data)
p = figure(x_range=fruits, y_range=(0, 10), plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar(x=dodge('fruits', -0.25, range=p.x_range), top='2015', width=0.2, source=source,
color="#c9d9d3", legend=value("2015"))
p.vbar(x=dodge('fruits', 0.0, range=p.x_range), top='2016', width=0.2, source=source,
color="#718dbf", legend=value("2016"))
p.vbar(x=dodge('fruits', 0.25, range=p.x_range), top='2017', width=0.2, source=source,
color="#e84d60", legend=value("2017"))
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
Stacked and Grouped¶
The above techiques for stacking and grouping may also be used together to crate a stacked, grouped bar plot.
Continuing the example above with bars grouped by quarter, we might stack each individual bar by region.
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
output_file("bar_stacked_grouped.html")
factors = [
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
]
regions = ['east', 'west']
source = ColumnDataSource(data=dict(
x=factors,
east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],
west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ],
))
p = figure(x_range=FactorRange(*factors), plot_height=250,
toolbar_location=None, tools="")
p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=["blue", "red"], source=source,
legend=[value(x) for x in regions])
p.y_range.start = 0
p.y_range.end = 18
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
p.legend.location = "top_center"
p.legend.orientation = "horizontal"
show(p)
Mixed Factors¶
When dealing with hierarchical categories of two or three levels, it’s possible to use just the “higher level” portion of a coordinate to position glyphs. For example, if you have range with the hierarchical factors
factors = [
("East", "Sales"), ("East", "Marketing"), ("East", "Dev"),
("West", "Sales"), ("West", "Marketing"), ("West", "Dev"),
]
Then it is possible to use just “Sales” and “Marketing” etc. as positions
for glyphs. In this case the position is the center of the entire group. The
example below shows bars for each month, grouped by financial quarter, and
also adds a line (perhaps for a quarterly average) at the coordinates for
Q1
, Q2
, etc.:
from bokeh.io import show, output_file
from bokeh.models import FactorRange
from bokeh.plotting import figure
output_file("mixed.html")
factors = [
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
]
p = figure(x_range=FactorRange(*factors), plot_height=250,
toolbar_location=None, tools="")
x = [ 10, 12, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16 ]
p.vbar(x=factors, top=x, width=0.9, alpha=0.5)
p.line(x=["Q1", "Q2", "Q3", "Q4"], y=[12, 9, 13, 14], color="red", line_width=2)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
This example also demonstrates that other glyphs such as lines also function with categorical coordinates.
Pandas¶
Pandas is a powerful and common tool for doing data analysis on tabular and timeseries data in Python. Although it is not required by Bokeh, Bokeh tries to make life easier when you do.
Below is a plot that demonstrates some advantages when using Pandas with Bokeh:
Pandas
GroupBy
objects can be used to initialize aCoumnDataSource
, automatically creating columns for many statistical measures such as the group mean or countGroupBy
objects may also be passed directly as a range argument tofigure
.
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg as df
from bokeh.transform import factor_cmap
output_file("groupby.html")
df.cyl = df.cyl.astype(str)
group = df.groupby('cyl')
source = ColumnDataSource(group)
cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))
p = figure(plot_height=350, x_range=group, title="MPG by # Cylinders",
toolbar_location=None, tools="")
p.vbar(x='cyl', top='mpg_mean', width=1, source=source,
line_color=cyl_cmap, fill_color=cyl_cmap)
p.y_range.start = 0
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "some stuff"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
Not that in the example above, we grouped by the column 'cyl'
so our CDS
has a column 'cyl'
for this index. Additionally, other non-grouped columns
like 'mpg'
have had associated columns such 'mpg_mean'
added, that
give the mean MPG value for each group.
This usage also works when the grouping is multi-level. The example below shows
how grouping the same data by ('cyl', 'mfr')
results in a hierarchical
nested axis. In this case, the index column name 'cyl_mfr'
is made by
joining the names of the grouped columns together.
from bokeh.io import show, output_file
from bokeh.plotting import figure
from bokeh.palettes import Spectral5
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
output_file("bar_pandas_groupby_nested.html")
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(by=['cyl', 'mfr'])
index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)
p = figure(plot_width=800, plot_height=300, title="Mean MPG by # Cylinders and Manufacturer",
x_range=group, toolbar_location=None, tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")])
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
Intervals¶
So far we have seen the bar glyphs used to create bar charts, which imply bars drawn from a common baseline. However, the bar glyphs can also be used to represent arbitrary intervals across a range.
The example below uses hbar
with both left
and right
properties
supplied, to show the spread in times between bronze and gold medalists in
Olympic sprinting over many years:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.sprint import sprint
output_file("sprint.html")
sprint.Year = sprint.Year.astype(str)
group = sprint.groupby('Year')
source = ColumnDataSource(group)
p = figure(y_range=group, x_range=(9.5,12.7), plot_width=400, plot_height=550, toolbar_location=None,
title="Time Spreads for Sprint Medalists (by Year)")
p.hbar(y="Year", left='Time_min', right='Time_max', height=0.4, source=source)
p.ygrid.grid_line_color = None
p.xaxis.axis_label = "Time (seconds)"
p.outline_line_color = None
show(p)
Scatters¶
Adding Jitter¶
When plotting many scatter points in a single categorical category, it is
common for points to start to visually overlap. In this case, Bokeh provides
a jitter()
function that can automatically apply
a random dodge to every point.
The example below shows a scatter plot of every commit time for a GitHub user
between 2012 and 2016, grouped by day of the week. A naive plot of this data
would result in thousands of points overlapping in a narrow line for each day.
By using jitter
we can differentiate the points to obtain a useful plot:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.commits import data
from bokeh.transform import jitter
output_file("bars.html")
DAYS = ['Sun', 'Sat', 'Fri', 'Thu', 'Wed', 'Tue', 'Mon']
source = ColumnDataSource(data)
p = figure(plot_width=800, plot_height=300, y_range=DAYS, x_axis_type='datetime',
title="Commits by Time of Day (US/Central) 2012—2016")
p.circle(x='time', y=jitter('day', width=0.6, range=p.y_range), source=source, alpha=0.3)
p.xaxis[0].formatter.days = ['%Hh']
p.x_range.range_padding = 0
p.ygrid.grid_line_color = None
show(p)
Categorical Offsets¶
We’ve seen above how categorical locations can be modified by operations like
dodge and jitter. It is also possible to supply an offset to a categorical
location explicitly. This is done by adding a numeric value to the end of a
category, e.g. ["Jan", 0.2]
is the category “Jan” offset by a value of 0.2.
For hierachical categories, the value is added at the end of the existing
list, e.g. ["West", "Sales", -0,2]
. Any numeric value at the end of a
list of categories is always interpreted as an offset.
As an example, suppose we took our first example from the beginning and modified it like this:
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
offsets = [-0.5, -0.2, 0.0, 0.3, 0.1, 0.3]
# This results in [ ['Apples', -0.5], ['Pears', -0.2], ... ]
x = list(zip(fruits, offsets))
p.vbar(x=x, top=[5, 3, 4, 2, 4, 6], width=0.8)
Then the resulting plot has bars that are horizontally shifted by the amount of each corresponding offset:
Below is a more sophisticated example of a Ridge Plot that displays timeseries associated with different categories. It uses categorical offsets to specify patch coordinates for the timeseries inside each category.
from numpy import linspace
from scipy.stats.kde import gaussian_kde
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, FixedTicker, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.sampledata.perceptions import probly
import colorcet as cc
output_file("ridgeplot.html")
def ridge(category, data, scale=20):
return list(zip([category]*len(data), scale*data))
cats = list(reversed(probly.keys()))
palette = [cc.rainbow[i*15] for i in range(17)]
x = linspace(-20,110, 500)
source = ColumnDataSource(data=dict(x=x))
p = figure(y_range=cats, plot_width=700, x_range=(-5, 105), toolbar_location=None)
for i, cat in enumerate(reversed(cats)):
pdf = gaussian_kde(probly[cat])
y = ridge(cat, pdf(x))
source.add(y, cat)
p.patch('x', cat, color=palette[i], alpha=0.6, line_color="black", source=source)
p.outline_line_color = None
p.background_fill_color = "#efefef"
p.xaxis.ticker = FixedTicker(ticks=list(range(0, 101, 10)))
p.xaxis.formatter = PrintfTickFormatter(format="%d%%")
p.ygrid.grid_line_color = None
p.xgrid.grid_line_color = "#dddddd"
p.xgrid.ticker = p.xaxis[0].ticker
p.axis.minor_tick_line_color = None
p.axis.major_tick_line_color = None
p.axis.axis_line_color = None
p.y_range.range_padding = 0.12
show(p)
Heat Maps¶
In all of the cases above, we have had one categorical axis, and one continuous axis. It is possible to have plots with two categorical axes. If we shade the rectangle that defines each pair of categories, we end up with a Categorical Heatmap
The plot below shows such a plot, where the x-axis categories are a list of
years from 1948 to 2016, and the y-axis categories are the months of the
years. Each rectangle corresponding to a (year, month)
combination is
color mapped by the unemployment rate for that month and year. Since the
unemployment rate is a continuous variable, a LinearColorMapper
is used
to colormap the plot, and is also passed to a color bar to provide a visual
legend on the right:
import pandas as pd
from bokeh.io import output_file, show
from bokeh.models import BasicTicker, ColorBar, ColumnDataSource, LinearColorMapper, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.sampledata.unemployment1948 import data
from bokeh.transform import transform
output_file("unemploymemt.html")
data.Year = data.Year.astype(str)
data = data.set_index('Year')
data.drop('Annual', axis=1, inplace=True)
data.columns.name = 'Month'
# reshape to 1D array or rates with a month and year for each row.
df = pd.DataFrame(data.stack(), columns=['rate']).reset_index()
source = ColumnDataSource(df)
# this is the colormap from the original NYTimes plot
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
mapper = LinearColorMapper(palette=colors, low=df.rate.min(), high=df.rate.max())
p = figure(plot_width=800, plot_height=300, title="US Unemployment 1948—2016",
x_range=list(data.index), y_range=list(reversed(data.columns)),
toolbar_location=None, tools="", x_axis_location="above")
p.rect(x="Year", y="Month", width=1, height=1, source=source,
line_color=None, fill_color=transform('rate', mapper))
color_bar = ColorBar(color_mapper=mapper, location=(0, 0),
ticker=BasicTicker(desired_num_ticks=len(colors)),
formatter=PrintfTickFormatter(format="%d%%"))
p.add_layout(color_bar, 'right')
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "5pt"
p.axis.major_label_standoff = 0
p.xaxis.major_label_orientation = 1.0
show(p)
A final example combines many of the techniques in this chapter: color mappers, visual dodges, and Pandas DataFrames. These are used to create a different sort of “heatmap” that results in a periodic table of the elements. A hover tool as also been added so that additional information about each element can be inspected:
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.periodic_table import elements
from bokeh.transform import dodge, factor_cmap
output_file("periodic.html")
periods = ["I", "II", "III", "IV", "V", "VI", "VII"]
groups = [str(x) for x in range(1, 19)]
df = elements.copy()
df["atomic mass"] = df["atomic mass"].astype(str)
df["group"] = df["group"].astype(str)
df["period"] = [periods[x-1] for x in df.period]
df = df[df.group != "-"]
df = df[df.symbol != "Lr"]
df = df[df.symbol != "Lu"]
cmap = {
"alkali metal" : "#a6cee3",
"alkaline earth metal" : "#1f78b4",
"metal" : "#d93b43",
"halogen" : "#999d9a",
"metalloid" : "#e08d49",
"noble gas" : "#eaeaea",
"nonmetal" : "#f1d4Af",
"transition metal" : "#599d7A",
}
source = ColumnDataSource(df)
p = figure(plot_width=900, plot_height=500, title="Periodic Table (omitting LA and AC Series)",
x_range=groups, y_range=list(reversed(periods)), toolbar_location=None, tools="hover")
p.rect("group", "period", 0.95, 0.95, source=source, fill_alpha=0.6, legend="metal",
color=factor_cmap('metal', palette=list(cmap.values()), factors=list(cmap.keys())))
text_props = {"source": source, "text_align": "left", "text_baseline": "middle"}
x = dodge("group", -0.4, range=p.x_range)
r = p.text(x=x, y="period", text="symbol", **text_props)
r.glyph.text_font_style="bold"
r = p.text(x=x, y=dodge("period", 0.3, range=p.y_range), text="atomic number", **text_props)
r.glyph.text_font_size="8pt"
r = p.text(x=x, y=dodge("period", -0.35, range=p.y_range), text="name", **text_props)
r.glyph.text_font_size="5pt"
r = p.text(x=x, y=dodge("period", -0.2, range=p.y_range), text="atomic mass", **text_props)
r.glyph.text_font_size="5pt"
p.text(x=["3", "3"], y=["VI", "VII"], text=["LA", "AC"], text_align="center", text_baseline="middle")
p.hover.tooltips = [
("Name", "@name"),
("Atomic number", "@{atomic number}"),
("Atomic mass", "@{atomic mass}"),
("Type", "@metal"),
("CPK color", "$color[hex, swatch]:CPK"),
("Electronic configuration", "@{electronic configuration}"),
]
p.outline_line_color = None
p.grid.grid_line_color = None
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_standoff = 0
p.legend.orientation = "horizontal"
p.legend.location ="top_center"
show(p)