CCP-EM: Collaborative Computational Project for Electron ...



Flex-EM & TEMPy, CCP-EM Icknield workshop, 4-6 Apr 2017

Setup

Files for this tutorial are in /dls/tmp/ccpem/flex-em/data.zip. Please copy this in to your local directory and unpack:

mkdir flex-em

cd flex-em

cp /dls/tmp/ccpem/flex-em/data.zip .

unzip data.zip

Part 1: Flexible fitting with Flex-EM

Local rigid fitting methods do not fully utilize the information in the cryoEM map. Often, the conformation of the atomic structure may differ from the conformation represented by the map. Also, if a comparative model is used for the fitting, the model may suffer from loop distortions, movement of secondary structure elements, or other errors introduced in the comparative modelling procedure itself (e.g. incorrect alignment). These problems can be overcome by using a flexible fitting method, whereby the structure is refined using a scoring function that optimizes the fit into the EM map and the stereochemistry and nonbonded interactions simultaneously.

For this tutorial, a homology model of e-coli adenylate kinase (generated by MODELLER using a homologous structure, PDB code: 1DVR:B) is refined within its 10 Å resolution native density map. The map was generated from adenylate kinase crystal structure (PDB code: 1AKE:A) using Chimera molmap command (using sigmaFactor=0.225). To improve the fit between the model and the density map we will run the Flex-EM program, using the MODELLER.

All Flex-EM files are in flex-em directory. In your data directory go to flex-em:

cd flex-em

For this part of the tutorial you should be familiar with the basic features of the Chimera or any other molecular visualization software that allows you to display density maps. We will use Chimera for demonstration.

Start Chimera and open 1akeA_10A.mrc and mdl1.pdb.

You can see in the Volume Viewer dialog that the size of 1ake_10A.map is 223 voxels where each voxel size is 3 Å/pixel. The resolution of this map is 10 Å. The origin index should be: 2.1647 –4.4603 1.8033. You can set the threshold level to 0.18 (The estimate uses 1.21 Å3 per dalton and molecular weight of ~24 kDa)

mdl1.pdb is the comparative model of e-coli adenylate kinase, already fitted in the map. To get a clearer view of the fit change the display to ribbons (if it is not already in ribbons) and make the map transparent. From visual inspection you can see that the fit can be improved (also by changing the contour level). For example, colour residues 31-74 by typing the command

color white #1:31-74

in the entry field at the bottom of the Chimera Graphics Window and press Enter. You can see that the residues in white lie outside the density.

i. Flexible fitting using secondary structure elements as rigid bodies

To improve the fit will run Flex-EM using MODELLER(version 9 and above).

In the current application of Flex-EM we will employ only 2 iterations of simulated annealing molecular dynamics (MD) refinement. We first have to edit the control file flex-em1.py. Open the file in an editor (gedit, nedit, vi, TextWrangler):

gedit flex-em1.py

Edit the parameters below INPUT PARAMETERS:

1. Define the mode of optimization:

optimization = 'MD'.

2. Set the input parameters of the atomic structure that you want to fit:

input_pdb_file=’mdl1.pdb’

3. Edit the EM map parameters (note that the origin is in Å, ie, multiply by voxel size ∗ -1):

em_map_file = '1akeA_10A.mrc' # name of EM density map (mrc)

format='MRC' # map format: MRC or XPLOR

apix=3 # voxel size: A/pixel

box_size=22 # size of the density map (cubic)

resolution=10.0 # resolution

x= -6.494; y=13.381; z=-5.410 # origin of the map (in Å)

4. Specify the directory in which you want the results to be found:

init_dir = 1

This will produce the results in a directory called 1_md.

5. Specify the number of simulated annealing iterations (for the purpose of this tutorial we are going to do only 2 iterations):

num_of_iter = 2

and close flex-em1.py.

In order to reduce the conformational space that has to be searched during this procedure, groups of atoms are defined and moved as rigid bodies (e.g., domains, sub-domains, secondary structure elements). This is done by editing the file rigid.txt.

Open the file in an editor. The file uses the following format:

- Comment lines begin with '#' (e.g., describing the rigid body: '#domain', '#helix', '#beta').

- Other lines: each line describes one rigid body by specifying the initial and final residue of each of the segments in that rigid body (e.g., '2 6 28 30' means that residues 2-6 and 28-30 will be included in the same rigid body).

- Note that you can add chain id when needed (not in this case). For example, '2:A 6:A 28:B 30:B' means that residues 2-6 from chain A and 28-30 from chain B will be included in the same rigid body.

In this tutorial, we use the secondary structure elements of mdl1.pdb as rigid bodies. This level of description is typically suitable for a map of 10 Å resolution. To obtain those elements you could use Chimera. Open the Model Panel dialog and select mdl1.pdb by clicking on the ID column next to it (#1). Click the sequence bar on the side menu. A new panel corresponding to the sequence of mdl1.pdb will appear. The secondary structure elements are indicated on top of the sequence – beta strands are coloured green and alpha helices are coloured yellow (based on the information in the PDB file or on DSSP (Kabsch & Sander, Biopolymers 1983). Placing the left mouse on top of an amino-acid letter will show its corresponding number at the bottom left of the sequence panel.

You probably noticed that we already edited the file using the secondary structure elements of mdl1.pdb as rigid bodies. Check that the numbers are correct.

Run the program by typing

module load EM/ccpem [required for every new terminal instance]

ccpem-python flex-em1.py > flex-em1.log &

Apart from the output file flex-em1.log, the program generates a number of files in the 1_md directory. After each iteration of simulated annealing a pdb file with the latest refined coordinates is generated (e.g., md1_1.pdb is generated after one cycle of simulated annealing, md1_2.pdb after a second cycle of simulated annealing, and final1_mdcg.pdb after a final Conjugate Gradient minimisation). You can open these files in Chimera to see the progression of the optimisation. On completion, the program generates a file final1_mdcg.pdb containing the final structure that has been refined by flexible fitting into the map.

To look at the change in CC during the optimization you can type the following command:

grep "Mod-EM" flex-em1.log | awk '{printf "%7.4f\n", $7}' | more

(You can press ctrl C to stop this command). When the optimisation is finished you could look directly at the final CC value by typing:

grep "Mod-EM" flex-em1.log | awk '{printf "%7.4f\n", $7}' | tail -1

Report the initial and final values of the CC. You can also save the CC values into a text file and open it in Excel in order to see the convergence of the score.

Open final1_mdcg.pdb in Chimera and change it into ribbon representation. By visual inspection of the initial model (mdl1.pdb, in magenta) and the final model (final1_mdcg.pdb, in cyan), it is clear that several secondary structure elements have moved towards the density.

ii. Assessment of accuracy

The level of refinement can be understood via comparison to the native structure (which in this case is already known). We will compare both models (initial and final) with the native structure using the Cα Root Mean Square Deviation (Cα RMSD) measure.

Open the crystal structure 1akeA.pdb (that corresponds to the simulated EM density map) in Chimera and change it into ribbon representation. To calculate the Cα RMSD between the initial model and the native structure type the command

rmsd #1:1-213@ca #3:1-213@ca

in the Chimera command line.

You can see that the Cα RMSD (shown in the bottom left of Chimera Graphics Window) is ~4.5 Å. Now check the RMSD from the native structure of your refined model by typing:

rmsd #2:1-213@ca #3:1-213@ca

Using the density information, the Cα RMSD of the model from the crystal structure has been reduced by ~2 Å reaching ~2.3 Å

iii. Hierarchical flexible fitting with Flex-EM/RIBFIND

It is possible to reduce ‘overfitting’ by first running Flex-EM with coarser rigid bodies and then remove some restraints by running it with rigid bodies defined as secondary-structure elements (as in Part 6: section 1). To get the larger rigid bodies you can submit mdl1.pdb to the RIBFIND server: . Open the website with Firefox (not Chrome) or Safari.

You can start the refinement using the rigid bodies downloaded from RIBFIND server (with the set that contains the maximal number of rigid bodies – in this case 2). Click on move the cutoff value to 11 and then “Download rigid body file”.

Call the file rigid_RF.txt not to confuse it with rigid.txt. The rigid bodies should look like this:

#RIBFIND_clusters: 6.5 14.0 2 8 10 2.8

13 24 202 212 2 6 28 30 81 84 105 110 193 197 161 185 7 12 25 27 186 192 198 201

31 41 44 54 61 73 90 95 42 43 55 60

#individual_rigid_bodies 2

113 119

123 126 132 134 153 154

You can select the rigid bodies in chimera in order to visualise them on mdl1.pdb by typing the following in the Chimera command line:

Color red #1:13-24,202-212,2-6,28-30,81-84,105-110,193-197,161-185,7-12,25-27,186-192,198-201

Press Enter. Then type:

color blue #1:31-41,44-54,61-73,90-95,42-43,55-60

and press Enter again.

The two large rigid bodies should be coloured red and blue respectively.

In the new input file (flex-em2.py) edit the following lines:

init_dir = 2

num_of_iter = 1

rigid_filename = 'rigid_RF.txt'

Re-run Flex-EM by typing

ccpem-python flex-em2.py > flex-em2.log &

Once this run is finished open the refined structure (md2_1.pdb) in Chimera. Now, calculate the Cα RMSD of your refined model from the native structure. Has the model improved? Report the RMSD value.

To further improve the results use this model as an initial model for further refinement using the rigid bodies as in the original rigid.txt.

Part 2: Model assessment with TEMPy

TEMPy is a toolkit for processing atomic models & maps and assessing fitted models in maps. Currently the main functionalities incorporated in TEMPy are assessment/improvement of fits by generating an ensemble of poses and using multiple functions to score fits locally and globally, in addition to map and model processing routines. TEMPy is available from .

A few lines of python code (as a script file) are usually required for carrying out these tasks. To make full use of TEMPy one needs basic python programming skills but different example scripts included in the package cover basic use cases and provide starting points for advanced tasks.

In this part of the tutorial, we will use scoring with the Segment-based Cross Correlation Coefficient (SCCC) implemented in TEMPy to assess the fitted models generated with Flex-EM (in Part 1). SCCC is a local cross correlation score calculated on user-defined segments in a protein chain. The segments that were allowed to move during flexible refinement can be scored through the refinement process to check the improvement. Density of each segment is simulated to the resolution of the target map and cross correlation is calculated on the segment density (generated using a mask).

The files to be used are in under tempy directory:

cd ../tempy

(You can also copy your own files resulting from the flex-em run into this folder).

The example script get_sccc.py can be used to get the SCCC scores for each segment mentioned in the corresponding rigid body file (from Part 1). The script file can be opened in a text editor to see the comments associated with different steps in SCCC score calculation with TEMPy.

The script has to be run as:

ccpem-python get_sccc.py –m map_file –p pdb_file –r resolution –rf rigid_body_file

- map_file corresponds to the map we use for scoring (here 1ake_10A.mrc)

- pdb_file corresponds to the fit we want to score:

For example: 1akeA.pdb - native structure; mdl1.pdb  - initial fit; final1_mdcg.pdb - final after SS refinement (from flex-em1.py in the flex-em/1_md/ folder)

- resolution – the resolution of the map (here: 10)

- rigid_body_file – the corresponding rigid bodies we want to score (here: rigid.txt, rigid_RF.txt)

Once the program is running the scores are printed on the terminal and a Chimera attribute file (filename_attribute.txt), which can be used to colour the model based on the SCCC scores, is generated. For example, the attribute file corresponding to model mdl1.pdb is mdl1.pdb_attribute.txt. The attribute file has the score corresponding to each residue in the segment.

Open the model in Chimera. As we are not scoring/colouring the fit of loops here, the whole model may be first coloured white (using the command color white #X where X is the model number).

Use the option:

Tools / Structure Analysis / Define Attribute

to assign the attribute file (e.g. mdl1.pdb_attribute.txt) for the model and colour based on the SCCC scores (note the restrict to model dialog). A new window Render/ Select by Attribute opens in Chimera where this can be done. Keep the attribute of residues, and press Apply. A single model should be coloured by opening the corresponding attribute file, at a time.

Note that the range of score values used to colour segments has to be consistent when you compare multiple models. The lower and upper bounds can be fixed by clicking on the red and blue sliders and entering the corresponding value in Render/Select by attribute window.

See for different ways to assign attributes to models.

Part 3: High-resolution refinement

In this part we refine the homology model of e-coli adenylate kinase (as in part 5) in its map at 4.5 Å resolution. The map was generated from adenylate kinase crystal structure (1AKE:A) with Chimera molmap command, using sigmaFactor=0.225.

Go to the directory flex-em_highres:

cd ../flex-em_highres

Now start Chimera and open two files: a density file: 1ake_molmap45_pad10_cubic.mrc and a structure file: mdl1.pdb.

You can see in the Volume Viewer dialog that the size of the map is 503 voxels where each voxel size is 1.5 Å/pixel. The resolution of this map is 4.5 Å. The origin index should be 7.996 -5.254 7.2733. You can set the threshold level to 0.09 (The estimate uses 1.21 Å3 per dalton and molecular weight of ~24 kDa)

As in part 1, from visual inspection you can see that the fit can be improved. For example, colour residues 31-74 by typing the command

color white #1:31-74

in the entry field at the bottom of the Chimera Graphics Window and press Enter. You can see that the residues in white lie outside the density.

[pic]

To improve the fit we will run Flex-EM. This is done by editing the file rigid.txt.

i. Stage 1 (SSE refinement)

In the current application of Flex-EM we will employ 2 iterations of simulated annealing molecular dynamics (MD) optimisation. We first have to edit the control file flex-em1.py. Edit the following parameters below INPUT PARAMETERS:

1. Define the mode of optimization:

optimization = 'MD'.

2. Set the input parameters of the atomic structure that you want to fit:

input_pdb_file=’mdl1.pdb’

3. Edit the EM map parameters (note that the origin is in Å):

em_map_file = '1ake_molmap45_pad10_cubic.mrc' # name of EM density map (mrc)

format='MRC' # map format: MRC or XPLOR

apix=1.5 # voxel size: A/pixel

box_size=50 # size of the density map (cubic)

resolution=4.5 # resolution (A)

x=-11.994; y=7.881; z=-10.910 # origin of the map (A) (Chimera origin* (-1.5))

4. Specify the directory in which you want the results to be found:

init_dir = 1

This will produce the results in a directory called 1_md.

5. Specify the number of simulated annealing iterations:

num_of_iter = 2

6. Change the maximum atom displacement for each MD step:

cap_shift = 0.15

With resolutions better than ~5 Å for large rigid bodies (domains or those from RIBFIND ( ) use the default value of 0.39. When secondary structures are restrained, use a lower value (e.g. 0.15) and for all-atom refinement reduce it further (e.g. 0.10). Lower atom shifts help to avoid distortions and maintain the geometry in the final stages of refinement.

Re-run Flex-EM by typing

ccpem-python flex-em1.py > flex-em1.log &

To visualise the result of the first stage of refinement open 1_md/md1_2.pdb in Chimera and compare it (visually) with mdl1.pdb.

ii. Stage 2 (all-atom refinement)

Flex-EM run can be repeated for a final ALL-ATOM refinement without any rigid body restraints.

Copy 1_md/md1_2.pdb to mdl_ss2.pdb (this is the final model following SSE refinement):

cp 1_md/md1_2.pdb mdl_ss2.pdb

In flex-em2.py edit the following parameters:

1. input_pdb_file=’mdl_ss2.pdb’

2. Change to cap_shift = 0.1 (to limit large atom displacements).

3. Specify the number of iterations: num_of_iter = 1

4. Change init_dir = 2 and close flex-em.py

5. Provide the rigid body filename: rigid_filename = 'rigid0.txt'

6. Since we do not use any rigid bodies at this stage, create an empty rigid0.txt file by creating a new empty file (you can do that with gedit).

Re-run the program with flex-em2.py. On completion, the program also generates a file final2_mdcg.pdb containing the final structure that has gone through a final CG refinement step. Copy 2_md/final2_mdcg.pdb to mdl_ss2_all1.pdb by:

cp 2_md/final2_mdcg.pdb mdl_ss2_all1.pdb

iii. Assessment of accuracy:

To visualise the initial model (mdl1.pdb model #1 in Chimera) and the final model open mdl_ss2_all1.pdb in Chimera (model #2). It is clear that several secondary structure elements have moved towards the density.

Open the original structure from which the map was simulated (1ake.pdb) in Chimera and change it into ribbon representation (model #3). To calculate the all-atom and Cα RMSDs between the initial model and the correct structure type the command. For all-atom RMSD, use:

rmsd #1:1-213 #3:1-213

For Cα RMSD:

rmsd #1:1-213@ca #3:1-213@ca

in the Chimera command line.

You can see that the Cα RMSD (shown in the bottom left of Chimera Graphics Window) is ~4.5 Å. Now check the RMSDs to the native structure and your refined model by typing:

rmsd #2:1-213 #3:1-213

rmsd #2:1-213@ca #3:1-213@ca

Using the density information, the Cα RMSD of the model from the native structure has been reduced from ~4.5 Å to ~1.3 Å (and to 2.1 Å all-atom RMSD).

iv. Model validation by local scoring of models (using TEMPy)

Another metric for local assessment – segment overlap coefficient score (SMOC) – is also available in TEMPy to evaluate the quality of local fit on sliding residue windows along the protein chain (Joseph et al. 2016). The segment overlap score is calculated over the voxels occupied by each overlapping model segment (window) along the chain and assigned to the central residue in the segment. Hence each residue gets a local goodness-of-fit score and this helps to plot the local score profiles of the initial and refined models and compare them.

For this specific set of generated output files, the script can be run as

ccpem-python score_smoc.py

For general purposes, this script can be run otherwise as:

ccpem-python score_smoc.py –m mapfile –p pdbfile [or –pdir directory with pdb files] –r resolution

For scoring and plotting multiple pdb files, use:

ccpem-python score_smoc.py –m mapfile –pdir dir_pdb –r resolution

where dir_pdb is the directory with pdb files. For pdb files in the current directory, use ‘.’ in place of pdir.

The plot generated can be saved (in the bottom panel) and closed for terminating the run.

The SMOC profile plot is generated, where the relative improvement in the local fit quality across different refinement steps is highlighted. Regions of poor fit can be identified.

[pic]

References:

Chimera: Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem. 25:1605-12, 2004.

MODELLER: N. Eswar, M. A. Marti-Renom, B. Webb, M. S. Madhusudhan, D. Eramian, M. Shen, U. Pieper, A. Sali. Comparative Protein Structure Modeling With MODELLER. Current Protocols in Bioinformatics, John Wiley & Sons, Inc., Supplement 15, 5.6.1-5.6.30, 2006.

Flex-EM: Topf M, Lasker K, Webb B, Wolfson H, Chiu W, Sali A. Protein structure fitting and refinement guided by CryoEM density. Structure, 16:295-307, 2008.

RIBFIND: Pandurangan AP, Topf M. Finding rigid bodies in protein structures: Application to flexible fitting into cryoEM maps. J Struct Biol. 177:520-531, 2012.

TEMPy: Farabella, I., Vasishtan, D., Joseph, AP, Pandurangan, AP, Sahota, H., and Topf, M. TEMPy : a Python library for assessment of three-dimensional electron microscopy density fits. J. Appl. Crystallogr. 48, 2015.

Flex-EM/RIBFIND/TEMPy (and applications):

Pandurangan AP, Shakeel S, Butcher SJ, Topf M. Combined approaches to flexible fitting and assessment in virus capsids undergoing conformational change.  J Struct Biol, 185:427–439, 2014.

Joseph AP, Malhotra S, Burnley T, Wood C, Clare DK, Winn M, Topf M. Refinement of atomic models in high resolution EM reconstructions using Flex-EM and local assessment.  Methods, 100:42-49, 2016.

Zeev-Ben-Mordehai T, Vasishtan D, Hernandez A, Vollmer B, White P, Pandurangan AP, Siebert A, Topf M, Grünewald K. Proc Natl Acad Sci USA. Two distinct trimeric conformations of natively membrane-anchored full-length Herpes simplex virus 1 glycoprotein B. 113:4176-418, 2016.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download