Exercises "Neuronal Networks" - Summer School MCS
Jutta Kretzberg & Andreas Engel, Oldenburg 27.8.2009
Goal of the exercises:
Imagine your colleague from the Neuroscience department comes to you with an experimentally recorded data set and the question "How well do the neuronal responses represent the stimulus?"
As world specialist in neural networks your approach is to classify the neuronal responses into classes which represent the different stimuli. By using different aspects of the neuronal responses for the classification you can help your colleague to find out a lot about his data.
The data you will use was recorded with a multi-electrode array from a turtle retina. It contains responses of 20 cells to 200 presentations of a moving light stimulus which was projected to the isolated retina. The pattern moved with equal probability either to the left or to the right. Your task is to find out which of the responses belongs to which stimulus direction.
The approach to solve this task is to train an artificial neural network with parts of the data, classify the rest of the responses with this neural network and compare your prediction with the actual stimulus that elicited the responses.
Introduction to Matlab:
Instructions for the practical sessions:
It is a good idea to write scripts or functions to solve these exercises rather than just to program on the command line. Otherwise it will be difficult to compare your solution to others...
- Preparations for the exercises:
- Please download the following files to your computer: CellRespMat.mat, sigma.mat, training.m, generror.m
- Load the data (.mat) files into the Matlab workspace. Which variables occur?
- Load the Matlab functions (.m) into the editor and take a look at them. What are their inputs and outputs?
- Take a look at the data set:
- The recorded neuronal data is stored in the 3D matrix CellRespMat. Look at the dimensions: The first dimension represents the cell number (20 cells), the second dimension the time (500 time bins of 1 ms length) and the third dimension the number of stimulus presentations (192 presentations). E.g. r=CellRespMat(5,100,45) is the 100th time bin of the response of cell number 5 to the 45th stimulus presentation.
- Use the array editor and the graphical display imagesc(M) to look at the data. You will need to apply squeeze to your matrix (e.g. M=squeeze(CellRespMat(5,:,:);) to get two-dimensional matrices which are easier to handle. Look at the responses of all responses of one individual cell at a time. Do you see differences between the cells?
- Also look at the responses of all cells for individual stimulus presentations. Can you see differences (e.g. between presentations 1 and 5)?
- Reduce the data set to spike counts and apply a threshold:
- The first guess of neuroscientists about neural coding is always to ignore the temporal response structure and consider only the spike rate - the number of spikes a stimulus elicits in a given time window. Reduce the data set to a 20x1x192 matrix, representing the total number of spikes of each cell for each presentation.
- Take a look at the spike counts: do they differ between cells and between presentations?
- The neural network needs binary input. Therefore, we need to further reduce the data by applying thresholds to the cell number. The idea is that a given cell responds (in most of the cases) with less than N spikes to movement in one direction and with ≥N spikes to movement in the other direction. What would be a good way to find the thresholds?
- Apply the thresholds to the data and produce a 20x1x192 matrix which consists only of the values -1 and 1. By applying squeeze to your matrix (e.g. M2=squeeze(M1);) you get a two-dimensional 20x192 matrix. (To make sure everything went ok so far you can compare it to this matrix: xi.mat)
- Apply the provided learning algorithm to the data set:
- Separate your data set into training and test data. Randomly select 10 presentations as training data set and the other presentations as test data set.
- Apply the provided learning algorithm training.m to your training data set.
- Apply the provided algorithm generror.m with the connection matrix you obtained to the test data set and determine the generalization error.
- Evaluate the learning:
- Use several different randomly selected test data sets and calculate the average generalization error and its standard deviation. Also take a look at the average connection matrix.
- Vary the size of the training data set. Plot (with command "errorbar") how the mean generalization error and its standard deviation depends on the size of the data set. Is the separation of stimulus direction an easy or a difficult separation task compared to the example shown in the lecture?
Additional exercises (please pick the most interesting topics):
- Apply a different learning rule:
- To compare the performances of different learning rules, re-do the exercises 4 and 5 from above, using the Hebb learning rule J=1/N*sigma_tr*xi_tr' (with N=length of input strings, sigma_tr=target outputs of training data set, xi_tr=training data set) instead of the adatron algorithm (provided in training.m). Why do the weight matrices look different? Compare the generalization errors for different data set sizes.
- For a more general comparison of both algorithms you could use random rather than experimental data. Generate an input matrix with randomly selected values 1 or -1 for each element. Generate a random connection matrix J_rand and apply it to the random inputs to generate "correct" outputs. Then try to learn these outputs with both algorithms and varied data sizes. Which algorithm learns faster?
- A look at different cells:
- Take a look at the connection matrix: are all cells equally important for the classification?
- Do you find the same cells to have high impact if you repeat the learning several times?
- You would expect that the most direction-selective cells, meaning the cells with the clearest difference in spike counts in response to motion in both directions, should have the highest impact for the classification. Calculate mean and standard deviation of the responses for the different stimuli and find out which cells are most clearly direction-selective. Are they the ones with the high "synaptic weights"?
- Stimulus encoding with different response features:
- Spike counts on shorter timescales: A biological system like the retina should be designed to work as fast as possible to enable the animal to detect and classify a stimulus. (It is a bad idea to think 10 minutes if you really see a tiger towards you before you start to react...). Try to use only parts of the full 500ms long responses, e.g. only the first 100ms. How well can the two directions be discriminated compared to the full responses?
- Use response segments of different lengths to calculate and plot how the classification performance depends on the response integration time.
- Is it also possible to discriminate the stimuli based on later response parts, e.g. the last 100ms of each response?
- Influence of stimulus history:
- For this exercise we use only parts of the complete experimental data set. In the experiment, 9 different stimulus velocities were used (4 to the left, 4 to the right and no motion). The vector PrevVelVec contains the stimulus history of each response in your data set. In this vector, 1 means maximum velocity to the left (corresponding to -1 in DirVec); 2, 3 and 4 are slower velocities to the left, 5 is no motion at all, and 6, 7, 8, 9 are consecutively faster velocities to the right (9 in PrevVelVec corresponds to +1 in sigma.) Try to classify if the stimulus history:
- Try to classify if the stimulus moved at all (PrevVelVec(N)~=5) before the response was measured.
- For those cases when the stimulus moved (PrevVelVec(N)~=5) try to classify if it moved in the same direction as during the response period.
And if you have ideas what else you want to find out about the data or the learning algorithms, please feel free to do so!
Solution for 1-5: