The basis for any data visualization is the underlying data. This section describes the various ways to provide data to Bokeh, from passing data values directly to creating a ColumnDataSource (CDS) and filtering the data with a CDSView.
ColumnDataSource
CDSView
Use standard Python lists of data to pass values directly into a plotting function.
In this example, the lists x_values and y_values pass data to the circle() function (see plotting function for more examples):
x_values
y_values
circle()
from bokeh.plotting import figure x_values = [1, 2, 3, 4, 5] y_values = [6, 7, 2, 3, 6] p = figure() p.circle(x=x_values, y=y_values)
Similarly to using Python lists and arrays, you can also work with NumPy data structures in Bokeh:
import numpy as np from bokeh.plotting import figure x = [1, 2, 3, 4, 5] random = np.random.standard_normal(5) cosine = np.cos(x) p = figure() p.circle(x=x, y=random) p.line(x=x, y=cosine)
The ColumnDataSource (CDS) is the core of most Bokeh plots. It provides the data to the glyphs of your plot.
When you pass sequences like Python lists or NumPy arrays to a Bokeh renderer, Bokeh automatically creates a ColumnDataSource with this data for you. However, creating a ColumnDataSource yourself gives you access to more advanced options.
For example: Creating your own ColumnDataSource allows you to share data between multiple plots and widgets. If you use a single ColumnDataSource together with multiple renderers, those renderers also share information about data you select with a select tool from Bokeh’s toolbar (see Linked selection).
Think of a ColumnDataSource as a collection of sequences of data that each have their own, unique column name.
To create a basic ColumnDataSource object, you need a Python dictionary to pass to the object’s data parameter:
data
Bokeh uses the dictionary’s keys as column names.
The dictionary’s values are used as the data values for your ColumnDataSource.
The data you pass as part of your dict can be any non-string ordered sequences of values, such as lists or arrays (including NumPy arrays and pandas Series):
data = {'x_values': [1, 2, 3, 4, 5], 'y_values': [6, 7, 2, 3, 6]} source = ColumnDataSource(data=data)
Note
All columns in a ColumnDataSource have the same length. Therefore, all sequences of values that you pass to a single ColumnDataSource must have the same length as well. If you try to pass sequences of different lengths, Bokeh will not be able to create your ColumnDataSource.
To use a ColumnDataSource with a renderer function, you need to pass at least these three arguments:
x: the name of the ColumnDataSource’s column that contains the data for the x values of your plot
x
y: the name of the ColumnDataSource’s column that contains the data for the y values of your plot
y
source: the name of the ColumnDataSource that contains the columns you just referenced for the x and y arguments.
source
For example:
from bokeh.plotting import figure from bokeh.models import ColumnDataSource # create a Python dict as the basis of your ColumnDataSource data = {'x_values': [1, 2, 3, 4, 5], 'y_values': [6, 7, 2, 3, 6]} # create a ColumnDataSource by passing the dict source = ColumnDataSource(data=data) # create a plot using the ColumnDataSource's two columns p = figure() p.circle(x='x_values', y='y_values', source=source)
To modify the data of an existing ColumnDataSource, update the .data property of your ColumnDataSource object:
.data
To add a new column to an existing ColumnDataSource:
new_sequence = [8, 1, 4, 7, 3] source.data["new_column"] = new_sequence
The length of the column you are adding must match the length of the existing columns.
To replace all data in an existing ColumnDataSource, assign the .data property an entirely new dict:
source.data = new_dict
Replacing the entire contents of a ColumnDataSource is also the only way to update the lengths of its columns. When you update data in a way that changes the length of any column, you must update all columns at the same time by passing an new dict. It is not possible to update column lengths one column at a time.
The data parameter can also be a pandas DataFrame or GroupBy object:
DataFrame
GroupBy
source = ColumnDataSource(df)
If you use a pandas DataFrame, the resulting ColumnDataSource in Bokeh will have columns that correspond to the columns of the DataFrame. The naming of the columns follows these rules:
If the DataFrame has a named index column, the ColumnDataSource will also have a column with this name.
If the index name is None, the ColumnDataSource will have a generic name: either index (if that name is available) or level_0.
None
index
level_0
If you use a pandas MultiIndex as the basis for a Bokeh ColumnDataSource, Bokeh flattens the columns and indices before creating the ColumnDataSource. For the index, Bokeh creates an index of tuples and joins the names of the MultiIndex with an underscore. The column names will also be joined with an underscore. For example:
MultiIndex
df = pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2}, ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8}, ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}}) cds = ColumnDataSource(df)
This will result in a column named index with [(A, B), (A, C), (A, D)], as well as columns named a_b, b_a, and b_b.
[(A, B), (A, C), (A, D)]
a_b
b_a
b_b
This process only works with column names that are strings. If you are using non-string column names, you need to manually flatten the DataFrame before using it as the basis of a Bokeh ColumnDataSource.
group = df.groupby(('colA', 'ColB')) source = ColumnDataSource(group)
If you use a pandas GroupBy object, the columns of the ColumnDataSource correspond to the result of calling group.describe(). The describe method generates columns for statistical measures such as mean and count for all the non-grouped original columns.
group.describe()
describe
mean
count
The resulting DataFrame has MultiIndex columns with the original column name and the computed measure. Bokeh flattens the data using the rules described above.
For example: If a DataFrame has the columns 'year' and 'mpg', passing df.groupby('year') to a ColumnDataSource will result in columns such as 'mpg_mean'.
'year'
'mpg'
df.groupby('year')
'mpg_mean'
Adapting GroupBy objects requires pandas version 0.20.0 or above.
ColumnDataSource streaming is an efficient way to append new data to a ColumnDataSource. When you use the stream() method, Bokeh only sends new data to the browser instead of sending the entire dataset.
stream()
The stream() method takes a new_data parameter. This parameter expects a dict that maps column names to the sequences of data that you want appended to the respective columns.
new_data
The method takes an additional, optional argument rollover. This is the maximum length of data to keep. When there is more data than defined by your maximum value, Bokeh will discard data from the beginning of the column. The default value for rollover is None. This default value allows data to grow unbounded.
rollover
source = ColumnDataSource(data=dict(foo=[], bar=[])) # has new, identical-length updates for all columns in source new_data = { 'foo' : [10, 20], 'bar' : [100, 200], } source.stream(new_data)
For an example that uses streaming, see examples/app/ohlc.
ColumnDataSource patching is an efficient way to update slices of a data source. By using the patch() method, Bokeh only sends new data to the browser instead of the entire dataset.
patch()
The patch() requires a dict which maps column names to list of tuples that represent a patch change to apply.
Examples of tuples that you can use with patch():
(index, new_value) # replace a single column value # or (slice, new_values) # replace several column values
For a full example, see examples/howto/patch_app.py.
So far, you have added data to a ColumnDataSource to control Bokeh plots. However, you can also perform some data operations directly in the browser.
Dynamically calculating color maps in the browser, for example, can reduce the amount of Python code. If the necessary calculations for color mapping happen directly in the browser, you will also need to send less data.
This section provides an overview of the different transform objects that are available.
Use the linear_cmap() function to perform linear color mapping directly in the browser. This function accepts the following arguments:
linear_cmap()
The name of a ColumnDataSource column containing the data to map colors to
A palette (which can be a built-in palette name or a list of colors)
min and max values for the color mapping range.
min
max
Pass the result as a color property of a glyph:
color
fill_color=linear_cmap('counts', 'Viridis256', min=0, max=10)
import numpy as np from bokeh.plotting import figure, show from bokeh.transform import linear_cmap from bokeh.util.hex import hexbin n = 50000 x = np.random.standard_normal(n) y = np.random.standard_normal(n) bins = hexbin(x, y, 0.1) p = figure(tools="", match_aspect=True, background_fill_color='#440154') p.grid.visible = False p.hex_tile(q="q", r="r", size=0.1, line_color=None, source=bins, fill_color=linear_cmap('counts', 'Viridis256', 0, max(bins.counts))) show(p)
In addition to linear_cmap(), there are two similar functions:
log_cmap() for color mapping on a log scale
log_cmap()
factor_cmap() for color mapping categorical data (see the example below).
factor_cmap()
When you use categorical data, you can use different markers for each of the categories in your data. Use the factor_mark() function to assign different markers to different categories automatically:
factor_mark()
from bokeh.plotting import figure, show from bokeh.sampledata.iris import flowers from bokeh.transform import factor_cmap, factor_mark SPECIES = ['setosa', 'versicolor', 'virginica'] MARKERS = ['hex', 'circle_x', 'triangle'] p = figure(title = "Iris Morphology") p.xaxis.axis_label = 'Petal Length' p.yaxis.axis_label = 'Sepal Width' p.scatter("petal_length", "sepal_width", source=flowers, legend_field="species", fill_alpha=0.4, size=12, marker=factor_mark('species', MARKERS, SPECIES), color=factor_cmap('species', 'Category10_3', SPECIES)) show(p)
This example also uses factor_cmap() to color map those same categories.
The factor_mark() transform is usually only useful with the scatter glyph method because parameterization by marker type only makes sense with scatter plots.
scatter
In addition to the built-in transformation functions above, you can use your own JavaScript code. Use the CustomJSTransform() function to add custom JavaScript code that is executed in the browser.
CustomJSTransform()
The example below uses the CustomJSTransform() function with the argument v_func. v_func is short for “vectorized function”. The JavaScript code you supply to v_func needs to expect an array of inputs in the variable xs, and return a JavaScript array with the transformed values:
v_func
xs
v_func = """ const first = xs[0] const norm = new Float64Array(xs.length) for (let i = 0; i < xs.length; i++) { norm[i] = xs[i] / first } return norm """ normalize = CustomJSTransform(v_func=v_func) plot.line(x='aapl_date', y=transform('aapl_close', normalize), line_width=2, color='#cf3c4d', alpha=0.6,legend="Apple", source=aapl_source)
The code in this example converts raw price data into a sequence of normalized returns that are relative to the first data point:
Bokeh uses a concept called “view” to select subsets of data. Views are represented by Bokeh’s CDSView class. When you use a view, you can use one or more filters to select specific data points without changing the underlying data. You can also share those views between different plots.
To plot with a filtered subset of data, pass a CDSView to the view argument of any renderer method on a Bokeh plot.
view
A CDSView has two properties, source and filters:
filters
source is the ColumnDataSource that the you want to apply the filters to.
filters is a list of Filter objects, listed and described below.
Filter
In this example, you create a CDSView called view. view uses the ColumnDataSource source and a list of two filters, filter1 and filter2. view is then passed to a circle() renderer function:
filter1
filter2
from bokeh.plotting import figure from bokeh.models import ColumnDataSource, CDSView source = ColumnDataSource(some_data) view = CDSView(source=source, filters=[filter1, filter2]) p = figure() p.circle(x="x", y="y", source=source, view=view)
The IndexFilter is the simplest filter type. It has an indices property, which is a list of integers that are the indices of the data you want to include in your plot.
IndexFilter
indices
from bokeh.layouts import gridplot from bokeh.models import CDSView, ColumnDataSource, IndexFilter from bokeh.plotting import figure, show source = ColumnDataSource(data=dict(x=[1, 2, 3, 4, 5], y=[1, 2, 3, 4, 5])) view = CDSView(source=source, filters=[IndexFilter([0, 2, 4])]) tools = ["box_select", "hover", "reset"] p = figure(plot_height=300, plot_width=300, tools=tools) p.circle(x="x", y="y", size=10, hover_color="red", source=source) p_filtered = figure(plot_height=300, plot_width=300, tools=tools) p_filtered.circle(x="x", y="y", size=10, hover_color="red", source=source, view=view) show(gridplot([[p, p_filtered]]))
A BooleanFilter selects rows from a data source using a list of True or False values in its booleans property.
BooleanFilter
True
False
booleans
from bokeh.layouts import gridplot from bokeh.models import BooleanFilter, CDSView, ColumnDataSource from bokeh.plotting import figure, show source = ColumnDataSource(data=dict(x=[1, 2, 3, 4, 5], y=[1, 2, 3, 4, 5])) booleans = [True if y_val > 2 else False for y_val in source.data['y']] view = CDSView(source=source, filters=[BooleanFilter(booleans)]) tools = ["box_select", "hover", "reset"] p = figure(plot_height=300, plot_width=300, tools=tools) p.circle(x="x", y="y", size=10, hover_color="red", source=source) p_filtered = figure(plot_height=300, plot_width=300, tools=tools, x_range=p.x_range, y_range=p.y_range) p_filtered.circle(x="x", y="y", size=10, hover_color="red", source=source, view=view) show(gridplot([[p, p_filtered]]))
The GroupFilter is a filter for categorical data. With this filter, you can select rows from a dataset that are members of a specific category.
GroupFilter
The GroupFilter has two properties:
column_name: the name of the column in the ColumnDataSource to apply the filter to
column_name
group: the name of the category to select for
group
In the example below, the data set flowers contains a categorical variable called species. All data belongs to one of the three species categories setosa, versicolor, or virginica. The second plot in this example uses a GroupFilter to only display data points that are a member of the category setosa:
flowers
species
setosa
versicolor
virginica
from bokeh.layouts import gridplot from bokeh.models import CDSView, ColumnDataSource, GroupFilter from bokeh.plotting import figure, show from bokeh.sampledata.iris import flowers source = ColumnDataSource(flowers) view1 = CDSView(source=source, filters=[GroupFilter(column_name='species', group='versicolor')]) plot_size_and_tools = {'plot_height': 300, 'plot_width': 300, 'tools':['box_select', 'reset', 'help']} p1 = figure(title="Full data set", **plot_size_and_tools) p1.circle(x='petal_length', y='petal_width', source=source, color='black') p2 = figure(title="Setosa only", x_range=p1.x_range, y_range=p1.y_range, **plot_size_and_tools) p2.circle(x='petal_length', y='petal_width', source=source, view=view1, color='red') show(gridplot([[p1, p2]]))
You can also use your own JavaScript or TypeScript code to create customized filters. To include your custom filter code, use Bokeh’s CustomJSFilter class. Pass your code as a string to the parameter code of the CustomJSFilter.
CustomJSFilter
code
Your JavaScript or TypeScript code needs to return either a list of indices or a list of booleans representing the filtered subset. You can access the ColumnDataSource you are using with CDSView from within your JavaScript or TypeScript code. Bokeh makes the ColumnDataSource available through the variable source:
custom_filter = CustomJSFilter(code=''' var indices = []; // iterate through rows of data source and see if each satisfies some constraint for (var i = 0; i < source.get_length(); i++){ if (source.data['some_column'][i] == 'some_value'){ indices.push(true); } else { indices.push(false); } } return indices; ''')
Updating and streaming data works very well with Bokeh server applications. However, it is also possible to use similar functionality in standalone documents. The AjaxDataSource provides this capability without requiring a Bokeh server.
AjaxDataSource
To set up an AjaxDataSource, you need to configure it with a URL to a REST endpoint and a polling interval.
In the browser, the data source requests data from the endpoint at the specified interval. It then uses the data from the endpoint to update the data locally.
Updating data locally can happen in two ways: either by replacing the existing local data entirely or by appending the new data to the existing data (up to a configurable max_size). Replacing local data is the default setting. Pass either "replace" or "append"``as the AjaxDataSource's ``mode argument to control this behavior.
max_size
"replace"
"append"``as the AjaxDataSource's ``mode
The endpoint that you are using with your AjaxDataSource needs to return a JSON dict that matches the standard ColumnDataSource format:
{ 'x' : [1, 2, 3, ...], 'y' : [9, 3, 2, ...] }
Otherwise, using an AjaxDataSource is identical to using a standard ColumnDataSource:
# setup AjaxDataSource with URL and polling interval source = AjaxDataSource(data_url='http://some.api.com/data', polling_interval=100) # use the AjaxDataSource just like a ColumnDataSource p.circle('x', 'y', source=source)
This a preview of what a stream of live data in Bokeh can look like using AjaxDataSource:
For the full example, see examples/howto/ajax_source.py in Bokeh’s GitHub repository.
You can share selections between two plots if both of the plots use the same ColumnDataSource:
from bokeh.io import output_file, show from bokeh.layouts import gridplot from bokeh.models import ColumnDataSource from bokeh.plotting import figure output_file("brushing.html") x = list(range(-20, 21)) y0 = [abs(xx) for xx in x] y1 = [xx**2 for xx in x] # create a column data source for the plots to share source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1)) TOOLS = "box_select,lasso_select,help" # create a new plot and add a renderer left = figure(tools=TOOLS, plot_width=300, plot_height=300, title=None) left.circle('x', 'y0', source=source) # create another new plot and add a renderer right = figure(tools=TOOLS, plot_width=300, plot_height=300, title=None) right.circle('x', 'y1', source=source) p = gridplot([[left, right]]) show(p)
Using a ColumnDataSource, you can also have two plots that are based on the same data but each use a different subset of that data. Both plots still share selections and hovered inspections through the ColumnDataSource they are based on.
The following example demonstrates this behavior:
The second plot is a subset of the data of the first plot. The second plot uses a CDSView to include only y values that are either greater than 250 or less than 100.
If you make a selection with the BoxSelect tool in either plot, the selection is automatically reflected in the other plot as well.
BoxSelect
If you hover on a point in one plot, the corresponding point in the other plot is automatically highlighted as well, if it exists.
from bokeh.layouts import gridplot from bokeh.models import BooleanFilter, CDSView, ColumnDataSource from bokeh.plotting import figure, output_file, show output_file("linked_selection_subsets.html") x = list(range(-20, 21)) y0 = [abs(xx) for xx in x] y1 = [xx**2 for xx in x] # create a column data source for the plots to share source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1)) # create a view of the source for one plot to use view = CDSView(source=source, filters=[BooleanFilter([True if y > 250 or y < 100 else False for y in y1])]) TOOLS = "box_select,lasso_select,hover,help" # create a new plot and add a renderer left = figure(tools=TOOLS, plot_width=300, plot_height=300, title=None) left.circle('x', 'y0', size=10, hover_color="firebrick", source=source) # create another new plot, add a renderer that uses the view of the data source right = figure(tools=TOOLS, plot_width=300, plot_height=300, title=None) right.circle('x', 'y1', size=10, hover_color="firebrick", source=source, view=view) p = gridplot([[left, right]]) show(p)
You can also use Bokeh to render network graph data and geographical data. For more information about how to set up the data for these types of plots, see Visualizing network graphs and Mapping geo data.