== Introduction == This package contains all the source code to reproduce the nummerical experiments described in [1]. It contains a parallelized implementation of the MCA generative model training algorithm. If you have problems running the code, please contact Joerg Bornschein == Overview == pulp/ - Python library/framework for MPI parallelized EM-based algorithms. The MCA implementation is in pulp/em/camodels/mca_et.py examples/ - Small example programs for the ulp library gf-learning - Actual main program that performs EM based learing using pulp.em preproc/ - Scripts used to extract patches from the Van-Hateren database and to preprocess them usnig pseudo-whitening. data/ - Traing data. This package does not include the training data but contains a script that can be used to doenload the datasets. == Data Sets == The script "data/download.sh" will download 3 datasets: 1) vanhateren-linear.h5 2) patches16.h5 patches of size 16x16, preprocessed as described in [1] 3) patches20.h5 patches of size 20x20, preprocessed as described in [1] 4) patches26.h5 oatches of size 26x26, preprocessed as described in [1] == Software dependencies == * Python (>= 2.6) * NumPy (reasonable recent) * SciPy (reasonable recent) * pytables (reasonable recent) * mpi4py (>= 1.2) == Running == First, run some examples. E.g. $ cd examples $ python mca-barstest.py This should run the MCA algorithm on artificaial bars data and visualize the result. To run the real experiments change into the 'gf-learing/' directory and choose a parameter-file: $ python run-em.py params-16x16-h100.py The results will be stored into a file called "output/.../results.h5". This file contains the W, pi, sigma parameters for each EM iteration. To visualize these results you can use "./report ", which will save the generated plots into the "output/.../" directory again. Running on big datasets, with high values of H, Hprime and gamma is computationally very expensive; use MPI to parallelize: a) On a multi-core machine with 32 cores $ mpirun -np 32 python run-em.py params-h200.py b) On a cluster: $ mpirun --hostfile machines python run-em.py params-h200.py where 'machines' contains a list of suitable machines. See you MPI documentation for the details how to start MPI parallelized . Using 160 parallel cores for the 16x16 patches, a single EM step takes about 1:30 minutes -- less 2h for a full EM training run. Running 26x26 takes much longer (more than 24h). The Hprime and gamma parameters have a strong influence on the computational cost. == References == [1] The Maximal Causes of Natural Scenes are Edge Filters, G. Puertas, J. Bornschein, J. Lucke, Advances in Neural Information Processing Systems 23, 2010