bayesian inference of conformational ensembles
A method based on Bayesian statistics that infers conformational ensembles from a structural library generated by all-atom Monte Carlo simulations. The first stage of the method involves a fast model selection approach based on variational Bayesian inference that maximizes the model evidence of the selected ensemble. This is followed by a complete Bayesian inference of population weights in the selected ensemble.
unzip bioce-1.0.zip
cd bioce-1.0
conda env create -f bioce.yml
source activate bioce
python setup.py build
python fullBayesian.py --help
python variationalBayesian.py –-help
If you see no errors but options menu pops up, you are good to go.
There is a known issue with xcode installation on OSX 10.14 (Mojave) If you see the following error:
test.c:8:10: fatal error: stdio.h: No such file or directory
#include <stdio.h>
^~~~~~~~~
compilation terminated.
please try the installation of the following package
open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
There may also be issues on with running variationalBayesian.py. The solution to this may be compiling the module manually. This would require running compile.sh file (you may need to change permission to this file).
https://github.com/Andre-lab/bioce/blob/master/compile.sh
I am aware this is not optimal solution and proper fix will hopefully come soon
The above example assumed that all input data are in the right format. This may however not be the case when you start from a set of PDB models We assume here that you have generated set of structural models and you have experimental scattering on NMR chemical shifts data available.\ You can refer to The typical workflow may look as follows.
prepareBayesian.py -s strcuture_lib_dir -e experimental_data
Apart from SimulatedIntensities.txt, which contains tabularized intensities for each model, a file with starting weights and a list of files is generated. These files are needed to run Bayesian inference in the next step.
python ../variationalBayesian.py -s SimulatedIntensities.txt -e simulated.dat -f structures.txt -p weights.txt -w 0.01 -o output
where w is the weight therhold used for prunning models in after each iteration
python ../fullBayesian.py -p weights.txt -s SimulatedIntensities.txt -e simulated.dat -f structures.txt
Script also produces two images stan_weights.png and stan_scale.png, which graphically ilustrated distribution of population weights (shown below) and scaling parameter
source deactivate
Once shiftx2 is installed, cd to directory with pdbs and run:
python shiftx2.py -b '*.pdb' -f csv
This will run batch job to process all pds and generate file in csv format. Please refer to shiftx2 manual for further options
prepareChemicalShifts.py -s strcuture_lib_dir -e experimental_data
This will generate a few files: cs.dat, cs.err with simulated chemical shifts and errors from PDB structures and cs_exp.dat, which contains experimental data in the aproprinate format.
Up to now any method could be used to a pool structural models and we assumed that all models are equaly probable (by assaigining equal weights to all of them). However one can also use information about energy evaluated for each structural models. We use Rosetta to generate structural library and each of the models comes with energy value. In order to make use of them, one needs to simply save them in the text file and tell scripts to use them by suppling “-P” instead of “-p” flag, e.g.
python ../variationalBayesian.py -P energies.txt -s TrmSimulatedIntensities5models.dat -e synthetic_60p.dat -f names5models.txt -w 0.01
In order to analyze data from time-resolved SAS experimenent where multiple data sets are available one should use fullBayesianTR.py script. It requiers the same input files and parameters as a single file version, however now experimental file is the list of files eperimental files (experimental_list) In addition the script also reads in errors of the simulated scattering pattern givem for each q value. And in order to run script iteratively one can specify iteration with ‘-I’ parameter, e.g.
python ../fullBayesianTR.py -p weights.txt -s SimulatedInitensities.txt -E SimulatedErrors.txt -e experimental_list -o batch_results -f file_list -i 2000 -j 4 -c 4 -I 0
The fullBayesianTR.py script produces iteration_* folder, which contains necessary python files. In order to perform Gaussian Processing regression to smooth weights one needs to run:
python ../gpSmoothing iteration_0/batch_results
gpSmoothing script will produce a set of weights for each specie and save them to prior_matrix.txt file. This file can be used as a input for next iterations
Webserver coming up soon!