Handling categorical data#
In addition to plotting numerical data on continuous ranges, you can also use Bokeh to plot categorical data on categorical ranges.
Basic categorical ranges are represented in Bokeh as sequences of strings. For example, a list of the four seasons:
seasons = ["Winter", "Spring", "Summer", "Fall"]
Bokeh can also handle hierarchical categories. For example, you can use nested sequences of strings to represent the individual months within each yearly quarter:
months_by_quarter = [
("Q1", "Jan"), ("Q1", "Feb"), ("Q1", "Mar"),
("Q2", "Apr"), ("Q2", "May"), ("Q2", "Jun"),
("Q3", "Jul"), ("Q3", "Aug"), ("Q3", "Sep"),
("Q4", "Oct"), ("Q4", "Nov"), ("Q4", "Dec"),
]
Depending on the structure of your data, you can use different kinds of charts: bar charts, categorical heatmaps, jitter plots, and others. This chapter will present several kinds of common plot types for categorical data.
Bars#
One of the most common ways to handle categorical data is to present it in a bar chart. Bar charts have one categorical axis and one continuous axis. Bar charts are useful when there is one value to plot for each category.
The values associated with each category are represented by drawing a bar for that category. The length of this bar along the continuous axis corresponds to the value for that category.
Bar charts may also be stacked or grouped together according to hierarchical sub-categories. This section will demonstrate how to draw a variety of different categorical bar charts.
Basic#
To create a basic bar chart, use the hbar()
(horizontal bars) or vbar()
(vertical bars) glyph methods. The example below shows a sequence of simple
1-level categories.
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
To assign these categories to the x-axis, pass this list as the
x_range
argument to figure()
.
p = figure(x_range=fruits, ... )
Doing so is a convenient shorthand for creating a
FactorRange
object.
The equivalent explicit notation is:
p = figure(x_range=FactorRange(factors=fruits), ... )
This form is useful when you want to customize the
FactorRange
, for example, by changing the range
or category padding.
Next, call vbar()
with the list of fruit names as
the x
coordinate and the bar height as the top
coordinate. You can also specify width
or other
optional properties.
p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)
Combining the above produces the following output:
from bokeh.io import output_file, show
from bokeh.plotting import figure
output_file("bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
p = figure(x_range=fruits, height=250, title="Fruit counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
You can also assign the data to a ColumnDataSource
and supply it as the source
parameter to vbar()
instead of passing the data directly as parameters.
You will see this in later examples.
Sorting#
To order the bars of a given plot, sort the categories by value.
The example below sorts the fruit categories in ascending order based on counts and rearranges the bars accordingly.
from bokeh.io import output_file, show
from bokeh.plotting import figure
output_file("bar_sorted.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
# sorting the bars means sorting the range factors
sorted_fruits = sorted(fruits, key=lambda x: counts[fruits.index(x)])
p = figure(x_range=sorted_fruits, height=350, title="Fruit counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
Filling#
Colors#
You can color the bars in several ways:
Supply all the colors along with the rest of the data to a ColumnDataSource and assign the name of the color column to the
color
argument ofvbar()
.from bokeh.io import output_file, show from bokeh.models import ColumnDataSource from bokeh.palettes import Bright6 from bokeh.plotting import figure output_file("colormapped_bars.html") fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] counts = [5, 3, 4, 2, 4, 6] source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Bright6)) p = figure(x_range=fruits, y_range=(0,9), height=250, title="Fruit counts", toolbar_location=None, tools="") p.vbar(x='fruits', top='counts', width=0.9, color='color', legend_field="fruits", source=source) p.xgrid.grid_line_color = None p.legend.orientation = "horizontal" p.legend.location = "top_center" show(p)
You can also use the color column with the
line_color
andfill_color
arguments to change outline and fill colors, respectively.Use the
CategoricalColorMapper
model to map bar colors in a browser. You can do this with thefactor_cmap()
function.factor_cmap('fruits', palette=Spectral6, factors=fruits)
You can then pass the result of this function to the
color
argument ofvbar()
to achieve the same result:from bokeh.io import output_file, show from bokeh.models import ColumnDataSource from bokeh.palettes import Bright6 from bokeh.plotting import figure from bokeh.transform import factor_cmap output_file("colormapped_bars.html") fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] counts = [5, 3, 4, 2, 4, 6] source = ColumnDataSource(data=dict(fruits=fruits, counts=counts)) p = figure(x_range=fruits, height=250, toolbar_location=None, title="Fruit counts") p.vbar(x='fruits', top='counts', width=0.9, source=source, legend_field="fruits", line_color='white', fill_color=factor_cmap('fruits', palette=Bright6, factors=fruits)) p.xgrid.grid_line_color = None p.y_range.start = 0 p.y_range.end = 9 p.legend.orientation = "horizontal" p.legend.location = "top_center" show(p)
See Using mappers for more information on using Bokeh’s color mappers.
Stacking#
To stack vertical bars, use the vbar_stack()
function. The example below uses three sets of fruit data. Each set
corresponds to a year. This example produces a bar chart for each set and
stacks each fruit’s bar elements on top of each other.
from bokeh.io import output_file, show
from bokeh.palettes import HighContrast3
from bokeh.plotting import figure
output_file("stacked.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, height=250, title="Fruit counts by year",
toolbar_location=None, tools="")
p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data,
legend_label=years)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
You can also stack bars that represent positive and negative values:
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import GnBu3, OrRd3
from bokeh.plotting import figure
output_file("stacked_split.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
exports = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
'2015' : [-1, 0, -1, -3, -2, -1],
'2016' : [-2, -1, -3, -1, -2, -2],
'2017' : [-1, -2, -1, 0, -2, -2]}
p = figure(y_range=fruits, height=250, x_range=(-16, 16), title="Fruit import/export, by year",
toolbar_location=None)
p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),
legend_label=["%s exports" % x for x in years])
p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),
legend_label=["%s imports" % x for x in years])
p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None
show(p)
Tooltips#
Bokeh automatically sets the name
property of each layer to
its name in the data set. You can use the $name
variable to
display the names on tooltips. You can also use the @$name
tooltip variable to retrieve values for each item in a layer from
the data set.
The example below demonstrates both behaviors:
from bokeh.io import output_file, show
from bokeh.palettes import HighContrast3
from bokeh.plotting import figure
output_file("stacked.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, height=250, title="Fruit counts by year",
toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")
p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data,
legend_label=years)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
You can override the value of name
by passing it manually to
the vbar_stack
or hbar_stack
function. In this case,
$@name
will correspond to the names you provide.
The hbar_stack
and vbar_stack
functions return a list of
all the renderers (one per bar stack). You can use this list to
customize the tooltips for each layer.
renderers = p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source,
legend=[value(x) for x in years], name=years)
for r in renderers:
year = r.name
hover = HoverTool(tooltips=[
("%s total" % year, "@%s" % year),
("index", "$index")
], renderers=[r])
p.add_tools(hover)
Grouping#
Instead of stacking, you also have the option to group the bars. Depending on your use case, you can achieve this in two ways:
Nested categories#
If you provide several subsets of data, Bokeh automatically groups the bars into labeled categories, tags each bar with the name of the subset it represents, and adds a separator between the categories.
The example below creates a sequence of fruit-year pairs (tuples) and
groups the bars by fruit name with a single call to vbar()
.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
output_file("bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), height=250, title="Fruit counts by year",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
To apply different colors to the bars, use factor_cmap()
for
fill_color
in the vbar()
function call as follows:
p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",
# use the palette to colormap based on the the x[1:2] values
fill_color=factor_cmap('x', palette=palette, factors=years, start=1, end=2))
The start=1
and end=2
in the call to factor_cmap()
use the
year in the (fruit, year)
pair for color mapping.
Visual offset#
Take a scenario with separate sequences of (fruit, year)
pairs
instead of a single data table. You can plot the sequences with
separate calls to vbar()
. However, since every bar in each group
belongs to the same fruit
category, the bars will overlap. To
avoid this behavior, use the dodge()
function
to provide an offset for each call to vbar()
.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
output_file("dodged_bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
source = ColumnDataSource(data=data)
p = figure(x_range=fruits, y_range=(0, 10), height=250, title="Fruit counts by year",
toolbar_location=None, tools="")
p.vbar(x=dodge('fruits', -0.25, range=p.x_range), top='2015', width=0.2, source=source,
color="#c9d9d3", legend_label="2015")
p.vbar(x=dodge('fruits', 0.0, range=p.x_range), top='2016', width=0.2, source=source,
color="#718dbf", legend_label="2016")
p.vbar(x=dodge('fruits', 0.25, range=p.x_range), top='2017', width=0.2, source=source,
color="#e84d60", legend_label="2017")
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
Stacking and grouping#
You can also combine the above techniques to create plots of stacked and grouped bars. Here is an example that groups bars by quarter and stacks them by region:
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
output_file("bar_stacked_grouped.html")
factors = [
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
]
regions = ['east', 'west']
source = ColumnDataSource(data=dict(
x=factors,
east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],
west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ],
))
p = figure(x_range=FactorRange(*factors), height=250,
toolbar_location=None, tools="")
p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=["blue", "red"], source=source,
legend_label=regions)
p.y_range.start = 0
p.y_range.end = 18
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
p.legend.location = "top_center"
p.legend.orientation = "horizontal"
show(p)
Mixed factors#
You can use any level in a multi-level data structure to position glyphs.
The example below groups bars for each month into financial quarters and
adds a quarterly average line at the group center coordinates from Q1
to Q4
.
from bokeh.io import output_file, show
from bokeh.models import FactorRange
from bokeh.plotting import figure
output_file("mixed.html")
factors = [
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
]
p = figure(x_range=FactorRange(*factors), height=250,
toolbar_location=None, tools="")
x = [ 10, 12, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16 ]
p.vbar(x=factors, top=x, width=0.9, alpha=0.5)
p.line(x=["Q1", "Q2", "Q3", "Q4"], y=[12, 9, 13, 14], color="red", line_width=2)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
Using pandas#
pandas is a powerful and popular tool for analyzing tabular and time series data in Python. While not necessary, it can make working with Bokeh easier.
For example, you can use the GroupBy
objects offered by pandas to
initialize a ColumnDataSource
and automatically create columns for many
statistical parameters, such as group mean and count. You can also pass these
GroupBy
objects as a range
argument to figure
.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg as df
from bokeh.transform import factor_cmap
output_file("groupby.html")
df.cyl = df.cyl.astype(str)
group = df.groupby('cyl')
source = ColumnDataSource(group)
cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))
p = figure(height=350, x_range=group, title="MPG by # cylinders",
toolbar_location=None, tools="")
p.vbar(x='cyl', top='mpg_mean', width=1, source=source,
line_color=cyl_cmap, fill_color=cyl_cmap)
p.y_range.start = 0
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "some stuff"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
The example above groups data by the column 'cyl'
, which is why the
ColumnDataSource
includes this column. It also adds associated columns
to non-grouped categories such as 'mpg'
, providing, for instance, a mean
number of miles per gallon in the 'mpg_mean'
column.
This also works with multi-level groups. The example below groups the same
data by ('cyl', 'mfr')
and displays it in nested categories distributed
along the x-axis. Here, the index column name 'cyl_mfr'
is made by
joining the names of the grouped columns.
from bokeh.io import output_file, show
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
output_file("bar_pandas_groupby_nested.html")
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(by=['cyl', 'mfr'])
index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)
p = figure(width=800, height=300, title="Mean MPG by # cylinders and manufacturer",
x_range=group, toolbar_location=None, tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")])
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
Intervals#
You can use bars for more than just bar charts with a common baseline. In case each category has both a starting and ending value associated, you can also use bars to represent intervals across a range for each category.
The example below supplies the hbar()
function with both left
and
right
properties to show the spread in times between gold and bronze
medalists in Olympic sprinting over many years.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.sprint import sprint
output_file("sprint.html")
sprint.Year = sprint.Year.astype(str)
group = sprint.groupby('Year')
source = ColumnDataSource(group)
p = figure(y_range=group, x_range=(9.5,12.7), width=400, height=550, toolbar_location=None,
title="Time spreads for sprint medalists (by year)")
p.hbar(y="Year", left='Time_min', right='Time_max', height=0.4, source=source)
p.ygrid.grid_line_color = None
p.xaxis.axis_label = "Time (seconds)"
p.outline_line_color = None
show(p)
Scatters#
Sometimes there are many values associated with each category. For example, a series of measurements on different days of the week. In this case, you can visualize your data using a categorical scatter plot.
Adding jitter#
To avoid overlap between numerous scatter points for a single category, use
the jitter()
function to give each point a random
offset.
The example below shows a scatter plot of every commit time for a GitHub user
between 2012 and 2016. It groups commits by day of the week. By default, this
plot would show thousands of points overlapping in a narrow line for each day.
The jitter
function lets you differentiate the points to produce a useful
plot:
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.commits import data
from bokeh.transform import jitter
output_file("bars.html")
DAYS = ['Sun', 'Sat', 'Fri', 'Thu', 'Wed', 'Tue', 'Mon']
source = ColumnDataSource(data)
p = figure(height=300, y_range=DAYS, x_axis_type='datetime',
toolbar_location=None, sizing_mode="stretch_width",
title="Commits by time of day (US/Central) 2012—2016")
p.circle(x='time', y=jitter('day', width=0.6, range=p.y_range), source=source, alpha=0.3)
p.xaxis[0].formatter.days = '%Hh'
p.x_range.range_padding = 0
p.ygrid.grid_line_color = None
show(p)
Series#
There may also be ordered series of data associated with each category. In such cases, the series can be represented as a line or area plotted for each category. To accomplish this, Bokeh has a concept of categorical offsets that can afford explicit control over positioning “within” a category.
Categorical offsets#
Outside of the dodge
and jitter
functions, you can also supply an
offset to a categorical location explicitly. To do so, add a numeric value
to the end of a category. For example, ["Jan", 0.2]
gives the category
“Jan” an offset of 0.2.
For multi-level categories, add the value at the end of the existing list:
["West", "Sales", -0,2]
. Bokeh interprets any numeric value at the end
of a list of categories as an offset.
Take the fruit example above and modify it as follows:
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
offsets = [-0.5, -0.2, 0.0, 0.3, 0.1, 0.3]
# This results in [ ['Apples', -0.5], ['Pears', -0.2], ... ]
x = list(zip(fruits, offsets))
p.vbar(x=x, top=[5, 3, 4, 2, 4, 6], width=0.8)
This will shift each bar horizontally by the corresponding offset.
Below is a more sophisticated example of a ridge plot. It uses categorical offsets to specify patch coordinates for each category.
import colorcet as cc
from numpy import linspace
from scipy.stats import gaussian_kde
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, FixedTicker, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.sampledata.perceptions import probly
output_file("ridgeplot.html")
def ridge(category, data, scale=20):
return list(zip([category]*len(data), scale*data))
cats = list(reversed(probly.keys()))
palette = [cc.rainbow[i*15] for i in range(17)]
x = linspace(-20,110, 500)
source = ColumnDataSource(data=dict(x=x))
p = figure(y_range=cats, width=700, x_range=(-5, 105), toolbar_location=None)
for i, cat in enumerate(reversed(cats)):
pdf = gaussian_kde(probly[cat])
y = ridge(cat, pdf(x))
source.add(y, cat)
p.patch('x', cat, color=palette[i], alpha=0.6, line_color="black", source=source)
p.outline_line_color = None
p.background_fill_color = "#efefef"
p.xaxis.ticker = FixedTicker(ticks=list(range(0, 101, 10)))
p.xaxis.formatter = PrintfTickFormatter(format="%d%%")
p.ygrid.grid_line_color = None
p.xgrid.grid_line_color = "#dddddd"
p.xgrid.ticker = p.xaxis[0].ticker
p.axis.minor_tick_line_color = None
p.axis.major_tick_line_color = None
p.axis.axis_line_color = None
p.y_range.range_padding = 0.12
show(p)
Heatmaps#
It is possible to have values associated with pairs of categories. In this situation, applying different color shades to rectangles that represent a pair of categories will produce a categorical heatmap. Such a plot has two categorical axes.
The following plot lists years from 1948 to 2016 on its x-axis and months of
the year on the y-axis. Each rectangle of the plot corresponds to a
(year, month)
pair. The color of the rectangle indicates the rate of
unemployment in a given month of a given year.
This example uses the LinearColorMapper
to map the colors of the plot
because the unemployment rate is a continuous variable. This mapper is also
passed to the color bar to provide a visual legend on the right:
import pandas as pd
from bokeh.io import output_file, show
from bokeh.models import (BasicTicker, ColorBar, ColumnDataSource,
LinearColorMapper, PrintfTickFormatter)
from bokeh.plotting import figure
from bokeh.sampledata.unemployment1948 import data
from bokeh.transform import transform
output_file("unemploymemt.html")
data.Year = data.Year.astype(str)
data = data.set_index('Year')
data.drop('Annual', axis=1, inplace=True)
data.columns.name = 'Month'
# reshape to 1D array or rates with a month and year for each row.
df = pd.DataFrame(data.stack(), columns=['rate']).reset_index()
source = ColumnDataSource(df)
# this is the colormap from the original NYTimes plot
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
mapper = LinearColorMapper(palette=colors, low=df.rate.min(), high=df.rate.max())
p = figure(width=800, height=300, title="US unemployment 1948—2016",
x_range=list(data.index), y_range=list(reversed(data.columns)),
toolbar_location=None, tools="", x_axis_location="above")
p.rect(x="Year", y="Month", width=1, height=1, source=source,
line_color=None, fill_color=transform('rate', mapper))
color_bar = ColorBar(color_mapper=mapper,
ticker=BasicTicker(desired_num_ticks=len(colors)),
formatter=PrintfTickFormatter(format="%d%%"))
p.add_layout(color_bar, 'right')
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "7px"
p.axis.major_label_standoff = 0
p.xaxis.major_label_orientation = 1.0
show(p)
The following periodic table is a good example of the techniques in this chapter:
Color mappers
Visual offsets
pandas DataFrames
Tooltips
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.periodic_table import elements
from bokeh.transform import dodge, factor_cmap
output_file("periodic.html")
periods = ["I", "II", "III", "IV", "V", "VI", "VII"]
groups = [str(x) for x in range(1, 19)]
df = elements.copy()
df["atomic mass"] = df["atomic mass"].astype(str)
df["group"] = df["group"].astype(str)
df["period"] = [periods[x-1] for x in df.period]
df = df[df.group != "-"]
df = df[df.symbol != "Lr"]
df = df[df.symbol != "Lu"]
cmap = {
"alkali metal" : "#a6cee3",
"alkaline earth metal" : "#1f78b4",
"metal" : "#d93b43",
"halogen" : "#999d9a",
"metalloid" : "#e08d49",
"noble gas" : "#eaeaea",
"nonmetal" : "#f1d4Af",
"transition metal" : "#599d7A",
}
source = ColumnDataSource(df)
p = figure(width=900, height=500, title="Periodic table (omitting LA and AC series)",
x_range=groups, y_range=list(reversed(periods)), toolbar_location=None, tools="hover")
p.rect("group", "period", 0.95, 0.95, source=source, fill_alpha=0.6, legend_field="metal",
color=factor_cmap('metal', palette=list(cmap.values()), factors=list(cmap.keys())))
text_props = {"source": source, "text_align": "left", "text_baseline": "middle"}
x = dodge("group", -0.4, range=p.x_range)
r = p.text(x=x, y="period", text="symbol", **text_props)
r.glyph.text_font_style="bold"
r = p.text(x=x, y=dodge("period", 0.3, range=p.y_range), text="atomic number", **text_props)
r.glyph.text_font_size="11px"
r = p.text(x=x, y=dodge("period", -0.35, range=p.y_range), text="name", **text_props)
r.glyph.text_font_size="7px"
r = p.text(x=x, y=dodge("period", -0.2, range=p.y_range), text="atomic mass", **text_props)
r.glyph.text_font_size="7px"
p.text(x=["3", "3"], y=["VI", "VII"], text=["LA", "AC"], text_align="center", text_baseline="middle")
p.hover.tooltips = [
("Name", "@name"),
("Atomic number", "@{atomic number}"),
("Atomic mass", "@{atomic mass}"),
("Type", "@metal"),
("CPK color", "$color[hex, swatch]:CPK"),
("Electronic configuration", "@{electronic configuration}"),
]
p.outline_line_color = None
p.grid.grid_line_color = None
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_standoff = 0
p.legend.orientation = "horizontal"
p.legend.location ="top_center"
show(p)