Download
This notebook can be downloaded as 05_visual_coding-users.ipynb. See the button at the top right to download as markdown or pdf.
Exploring the Allen Institute’s Visual Coding dataset#
This notebook has had all its explanatory text removed and has not been run. It is intended to be downloaded and run locally (or on the provided binder), working through the questions with your small group.
This notebook serves as a group project: in groups of 4 or 5, you will analyze data from the Visual Coding - Neuropixels dataset, published by the Allen Institute. This dataset uses extracellular electrophysiology probes to record spikes from multiple regions in the brain during passive visual stimulation.
To start, we will focus on the activity of neurons in the visual cortex (VISp) during passive exposure to full-field flashes of color either black (coded as “-1.0”) or white (coded as “1.0”) in a gray background. If you have time, you can apply the same procedure to other stimuli or brain areas.
In this notebook, the pre-filled section will first select visually responsive neurons in area VISp. Then, you will fit GLMs to the selected neurons using nemos.
As this is the last notebook, the instructions are a bit more hands-off: you will make more of the modeling decisions yourselves. As a group, you will use your neuroscience knowledge and the skills gained over this workshop to decide:
How to avoid overfitting.
What features to include in your GLMs.
Which basis functions (and parameters) to use for each feature.
How to regularize your features.
How to evaluate your model.
At the end of this session, we will regroup to discuss the decisions people made and evaluate each others’ models.
import workshop_utils
# Import everything
import jax
import matplotlib.pyplot as plt
import numpy as np
import pynapple as nap
# configure pynapple to ignore conversion warning
nap.nap_config.suppress_conversion_warnings = True
import nemos as nmo
# some helper plotting functions
from nemos import _documentation_utils as doc_plots
import matplotlib as mpl
from matplotlib.ticker import MaxNLocator
from scipy.stats import gaussian_kde
from matplotlib.patches import Patch
# configure plots some
plt.style.use(nmo.styles.plot_style)
Downloading and preparing data#
In this section, we will download the data and extract the relevant parts for analysis and modeling. This section is largely presented to you as is, so that you can get to the substantive sections more quickly.
First we download and load the data into pynapple.
data = workshop_utils.fetch_data("visual_coding_data.zip")
flashes, units = [nap.load_file(d) for d in sorted(data)]
Now that we have our spiking data, let’s restrict our dataset to the relevant part.

During the flashes presentation trials, mice were exposed to white or black full-field flashes in a gray background, each lasting 250 ms, and separated by a 2 second inter-trial interval. In total, they were exposed to 150 flashes (75 black, 75 white).
# create a separate object for black and white flashes
flashes_white = flashes[flashes["color"] == "1.0"]
flashes_black = flashes[flashes["color"] == "-1.0"]
Let’s visualize our stimuli:
Preliminary analyses and neuron selection#
In this section, we will select a subset of the neurons that are visually responsive, which we will fit our GLM to.
First, we’ll construct a IntervalSet called extended_flashes which contains the peristimulus time. Right now, our flashes IntervalSet defines the start and end time for the flashes. In order to make sure we can model the pre-stimulus baseline and any responses to the stimulus being turned off, we would like to expand these intervals to go from 500 msecs before the start of the stimuli to 500 msecs after the end.
This IntervalSet will be the same shape as flashes and have the same metadata columns.
dt = .50 # 500 ms
start = flashes.start - dt # Start 500 ms before stimulus presentation
end = flashes.end + dt # End 500 ms after stimulus presentation
extended_flashes = nap.IntervalSet(start, end, metadata=flashes.metadata)
Now, we’ll create two separate IntervalSet objects, extended_flashes_black and extended_flashes_white, which contain this info for only the black and the white flashes, respectively.
extended_flashes_white = extended_flashes[extended_flashes["color"] == "1.0"]
extended_flashes_black = extended_flashes[extended_flashes["color"] == "-1.0"]
Now, we’ll select our neurons. There are four criteria we want to use:
Brain area: we are interested in analyzing VISp units for this tutorial
Quality: we will only select “good” quality units. If you’re curious, you can (optionally) read more how about the Allen Institute defines quality.
Firing rate: overall, we want units with a firing rate larger than 2Hz around the presentation of stimuli
Responsiveness: we want units that actually respond to changes in the visual stimuli, i.e., their firing rate changes as a result of the stimulus.
We’ll create a new TsGroup, selected_units, which includes only those units that meet the first three criteria, then check that it passes the assertion block.
# Filter units according criteria 1 & 2
selected_units = units[(units["brain_area"]=="VISp") & (units["quality"]=="good")]
# Restrict around stimuli presentation
selected_units = selected_units.restrict(extended_flashes)
# Filter according to criterion 3
selected_units = selected_units[(selected_units["rate"]>2.0)]
Now, in order to determine the responsiveness of the units, it’s helpful to use the compute_perievent function: this will align units’ spiking timestamps with the onset of the stimulus repetitions and take an average over them.
Let’s use that function to construct two separate perievent dictionaries, one aligned to the start of the white stimuli, one aligned to the start of the black, and they should run from 250 msec before to 500 msec after the event.
# Set window of perievent 500 ms before and after the start of the event
window_size = (-.250, .500)
peri_white = nap.compute_perievent(timestamps=selected_units,
tref=nap.Ts(flashes_white.start),
minmax=window_size)
peri_black = nap.compute_perievent(timestamps=selected_units,
tref=nap.Ts(flashes_black.start),
minmax=window_size)
Visualizing these perievents can help us determine what features we’ll want to include in our GLM. The following helper function should help.
# called like this, the function will visualize the first 9 units. play with the n_units
# and start_unit arguments to see the other units.
plot_raster_psth(peri_white, selected_units, "white", n_units=9, start_unit=0)
plot_raster_psth(peri_black, selected_units, "black", n_units=9, start_unit=0)
You could manually visualize each of our units and select those that appear, from their PSTH to be responsive.
However, it would be easier to scale (and more reproducible) if you came up with some measure of responsiveness. So how do we compute something that captures “this neuron responds to visual stimuli”? Here, we will define “responsiveness” as the normalized difference in average firing rate between during stimulus presentation and before the stimulus was presented. We’ll define a function that does that in the following hidden cell.
If you have other ideas and the time to explore them, you can return to this section and try other definitions of responsiveness.
# Compute the responsiveness measure
responsiveness_white = get_responsiveness(peri_white)
responsiveness_black = get_responsiveness(peri_black)
# Get threshold for top 15% most responsive
thresh_black = np.percentile(responsiveness_black, 85)
thresh_white = np.percentile(responsiveness_white, 85)
# Only keep units that are within the 15% most responsive for either black or white
selected_units = selected_units[(responsiveness_black > thresh_black) | (responsiveness_white > thresh_white)]
Let’s visualize the selected units PSTHs to make sure they all look reasonable:
print(f"Remaining units: {len(selected_units)}")
peri_white = {k: peri_white[k] for k in selected_units.index}
peri_black = {k: peri_black[k] for k in selected_units.index}
plot_raster_psth(peri_black, selected_units, "black", n_units=len(peri_black))
plot_raster_psth(peri_white, selected_units, "white", n_units=len(peri_white))
Avoiding overfitting#
As we’ve seen throughout this workshop, it is important to avoid overfitting your model. We’ve covered two strategies for doing so: either separate your dataset into train and test subsets or set up a cross-validation scheme. Pick one of these approaches and use it when fitting your GLM model in the next section.
You might find it helpful to refer back to the advanced nemos (“How to know when to regularize” header) notebook and / or to use the following pynapple functions: IntervalSet.set_diff, IntervalSet.union, TsGroup.restrict (see phase precession notebook (“Aside: Cross-validation” admonition)).
Hints
Throughout the rest of the notebook we’ll include hints. They’ll either be links back to earlier notebooks from this workshop that show an example of how to do the step in question, or a hint like this, which you can expand to get a hint.
# enter code here
Fit a GLM#
In this section, you will use nemos to build a GLM. There are a lot of scientific decisions to be made here, so we suggest starting simple and then adding complexity. Construct a design matrix with a single predictor, using a basis of your choice, then construct, fit, and score your model to a single neuron (remembering to either use your train/test or cross-validation to avoid overfitting). Then add regularization to your GLM. Then return to the beginning and add more predictors. Then fit all the neurons. Then evaluate what basis functions and parameters are best for your predictors. Then use the tricks we covered in the advanced nemos notebook (“Feature selection” header) to evaluate whether which predictors are necessary for your model, which are the most important.
You don’t have to exactly follow those steps, but make sure you can go from beginning to end before getting too complex.
Good luck and we look forward to seeing what you come up with!
Prepare data#
Decide on bin size and create spike count data. (Hint: review the current injection notebook (“Extending the model to use injection history” header).)
bin_size =
units_counts =
Construct design matrix#
Decide on feature(s).
The code block below constructs
stim, aTsdFramecontaining 1s whenever the stimulus is being presented (in separate columns for white and black).You can use this, but you may also want to perform additional computations on
stimto construct other features.
Decide on basis. (Hint: review the current injection (“Extending the model to use injection history”) or place cell (“Select basis” header) notebooks.)
If you set the
labelargument for your basis objects, interpreting the output will be easier.
Construct design matrix. (Hint: review the place cell (“Basis evaluation” header) notebook.)
What features should I include?
If you’re having trouble coming up with features to include, here are some possibilities:
Stimulus, using
stim. (Review the current injection (“Preparing data” header) notebook.)Stimulus onset. (Hint: you can use numpy.diff to find when the stimulus transitions from off to on.)
Stimulus offset. (Hint: you can use numpy.diff to find when the stimulus transitions from on to off.)
For multiple neurons: neuron-to-neuron coupling, using
units_counts. (Refer back to the the head direction notebook from the first day (“Fitting the Model” header) to see an example of fitting coupling filters.)
For the stimuli predictors, you probably want to model white and black separately.
# Create a TsdFrame filled by zeros, for the size of units_counts
stim = nap.TsdFrame(
t=units_counts.t,
d=np.zeros((len(units_counts), 2)),
columns = ['white', 'black']
)
# Check whether there is a flash within a given bin of spikes
idx_white = flashes_white.in_interval(units_counts)
idx_black = flashes_black.in_interval(units_counts)
# Put a 1 at those locations
stim.d[~np.isnan(idx_white), 0] = 1
stim.d[~np.isnan(idx_black), 1] = 1
Construct and fit your model#
Decide on regularization. (Hint: review Edoardo’s presentation and the place cell (“How to know when to regularize?” header) notebook.)
Initialize GLM. (Hint: review the current injection (“Fitting the model” header) or place cell (“How to know when to regularize?” header) notebooks.)
Call fit. (Hint: review the current injection (“Fitting the model” header) or place cell (“How to know when to regularize?” header) notebooks.)
# enter code here
Visualize model PSTH#
Generate model predictions (remember to compute to spikes / sec!). (Hint: )
Compute model PSTHs. (Note that you should use compute_perievent_continuous here! Otherwise, this looks very similar to our PSTH calculation above.)
Visualize these PSTHs.
The following helper function should help with the visualization step (it works for one or multiple neurons).
The following cell shows you how to call this visualization function for a PSTH computed from a GLM (i.e., single neuron fit). Its arguments are:
The PSTH object computed from the data. This should only contain the responses to either the black or white flashes, but can contain the PSTHs from one or more-than-one neurons.
A string, either
"white"or"black", which determines some of the styling.Any number of keyword arguments (e.g.,
predictions=shown below) whose values are a tuple of(style, peri), wherestyleis a valid matplotlib style (e.g.,"red") andperiis additional PSTHs to plot. Expected use is, as below, to plot the predictions in a different color on top of the actual data.
plot_pop_psth(peri_white[unit_id], "white", predictions=("red", peri_white_pred_unit))
plot_pop_psth(peri_black[unit_id], "black", predictions=("red", peri_black_pred_unit))
# enter code here
Visualize learned model filters#
“Expand” model coefficients into filters.
Visualize these filters.
When using basis functions, GLM coefficients are hard to interpret directly. Recall in the current injection (search for “Visualize the current history model’s learned filter”) and head direction (search for “We can plot the resluting response,”) notebooks; we can multiply these coefficients by the basis functions to create the filter for visualization. This can be done using numpy.matmul or numpy.einsum.
If you get stuck here, you can expand the following dropdown to see a hint and then expand the one after that to see a possible way of doing this.
How to compute the filters?
There are two components here:
If you’ve used a single basis object, your
GLMwill have weights of shape(n_basis_funcs,)(equivalently, aPopulationGLMwill have weights of shape(n_basis_funcs, n_neurons)). If you call your basis’sevaluate_on_gridmethod, you’ll get back an array of shape(window_size, n_basis_funcs)containing the basis functions. You can then use either numpy.matmul or numpy.einsum to multiply the weights by the functions, computing the filter.If you’ve used more than one basis object, how do you know which weights correspond to which basis?
You can slice the weights yourselves: each basis will have
n_basis_funcsweights associated with it (where this is the argument passed on initialization and also an attribute of the object), and so you can do some algebra to figure out which weights correspond to which basis.However, if you have used a single AdditiveBasis object to construct your GLM, you can take advantage of its
split_by_featuremethod to do the splitting for you.
Code to compute the filters
If you have used a single AdditiveBasis object to construct your GLM (called additive_basis in the following), you can take advantage of its split_by_feature method:
weights = additive_basis.split_by_feature(model.coef_, 0)
filters = {}
for k, v in weights.items():
this_basis = additive_basis[k]
_, this_basis = this_basis.evaluate_on_grid(this_basis.window_size)
filters[k] = np.matmul(this_basis, v)
filters is then a dictionary whose keys match the label of each basis object and whose values are numpy arrays.
# enter code here
Score your model#
We trained on the train set, so now we score on the test set. (Or use cross-validation.)
Get a score for your model that you can use to compare across the modeling choices outlined above. (Hint: refer back to the place cell (“How to know when to regularize?” header) notebook.)
# enter code here
Try to improve your model?#
Go back to the beginning of this section and try to improve your model’s performance (as reflected by increased score).
Keep track of what you’ve tried and their respective scores.
You can do this by hand, but constructing a pandas DataFrame, as we’ve seen in the place cell (“How to know when to regularize?” header), is useful:
# Example construction of dataframe.
# In this:
# - additive_basis is the single AdditiveBasis object we used to construct the entire design matrix
# - model is the GLM we fit to a single neuron
# - unit_id is the int identifying the neuron we're fitting
# - score is the float giving the model score, summarizing model performance (on the test set)
import pandas as pd
data = [
{
"model_id": 0,
"regularizer": model.regularizer.__class__.__name__,
"regularizer_strength": model.regularizer_strength,
"solver": model.solver_name,
"score": score,
"n_predictors": len(additive_basis),
"unit": unit_id,
"predictor_i": i,
"predictor": basis.label.strip(),
"basis": basis.__class__.__name__,
# any other info you think is important ...
}
for i, basis in enumerate(additive_basis)
]
df = pd.DataFrame(data)
df