Convergence of Metadynamics simulation

Convergence of metadynamics or any enhanced sampling calculations plays an important role in deciding how much you can trust the outcomes from metadynamics simulations. here is a step by step tutorial and codes on how to check convergence in metadynamics simulations.

I ran two metadynamics simulations with DISTANCE CVs using Plumed 1.3 patched with Gromacs 4.6.2. As an output I got HILLS file whose first part looks like following

 2.000 0.312700063 0.676572204 0.050000001 0.050000001 0.555555556 10.000 
 4.000 0.304164201 0.674881995 0.050000001 0.050000001 0.543420527 10.000 
 6.000 0.343289763 0.649008214 0.050000001 0.050000001 0.539083481 10.000 
 8.000 0.297886729 0.662928343 0.050000001 0.050000001 0.525467237 10.000 
 10.000 0.291971087 0.575445116 0.050000001 0.050000001 0.547548082 10.000 
 12.000 0.343111038 0.602572322 0.050000001 0.050000001 0.531554010 10.000 
 14.000 0.285310268 0.667471945 0.050000001 0.050000001 0.513465246 10.000 
 16.000 0.319890589 0.655008733 0.050000001 0.050000001 0.497615549 10.000 
 18.000 0.389135718 0.529440880 0.050000001 0.050000001 0.550858286 10.000 
 20.000 0.286446780 0.555136204 0.050000001 0.050000001 0.533876313 10.000

DISTANCE CVs used in the calculation

PRINT W_STRIDE 50
HILLS RESTART W_STRIDE 1000 HEIGHT 0.5
WELLTEMPERED SIMTEMP 298 BIASFACTOR 10
DISTANCE LIST 1143 1221 SIGMA 0.05 
DISTANCE LIST 1143 520 SIGMA 0.05 
DISTANCE LIST 513 1153

UWALL CV 1 LIMIT 0.45 KAPPA 5000.0 EXP 4.0
LWALL CV 1 LIMIT 0.25 KAPPA 5000.0 EXP 4.0

UWALL CV 2 LIMIT 0.65 KAPPA 5000.0 EXP 4.0
LWALL CV 2 LIMIT 0.25 KAPPA 5000.0 EXP 4.0

UWALL CV 3 LIMIT 1.50 KAPPA 5000.0 EXP 4.0

ENDMETA

I also post processed all my trajectories to generate a file named trajfit.xtc using the following script

#Combining the .xtc files
/home/per/lun/gromacs-4.6.2/build/src/tools/trjcat_mpi -f traj.part00*.xtc -o trajall.xtc
#PBC fixed, echo 1 signifies only protein as output, 1was the indx number
echo 1 | /home/per/lun/gromacs-4.6.2/build/src/tools/trjconv_mpi -f trajall.xtc -pbc nojump -o trajnoj.xtc -s md.tpr 
# Aligning on CA and protein as output
echo 3 1 | /home/per/lun/gromacs-4.6.2/build/src/tools/trjconv_mpi -s md.tpr -f trajnoj.xtc -fit rot+trans -o trajfit.xtc

I used plumed driver on trajfit.xtc file to generate a COLVAR file with 1 ps time step. The plumed.dat file used with the driver looks like following

d1: DISTANCE ATOMS=1143,1221
d2: DISTANCE ATOMS=1143,520

PRINT ARG=d1,d2 STRIDE=1 FILE=COLVAR

I issued the driver command on the trajfit.xtc file

plumed driver --mf_xtc trajfit.xtc --plumed plumed.dat

I got a COLVAR file as output and removed the header as well as removed the value corresponds to 0.00 ps. The next step was to match the end part of COLVAR and HILLS files.  

Now I dumped fes files corresponds to each ns using the following command

~/plumed-1.3/utilities/WT_reweight/reweight -colvar COLVAR -hills HILLS -ncv 2 -nvar 2 -stride 2 -fes 1 -temp 300 -welltemp -timeout 1000 -kjoule

So the one dimensional reweighted free energy surface looks like following

fes1W50_1d

Figure 1. 1D FES reweighted on CV1.

The FES above shows two basins, basin 1: averaged over 0.27-0.33 nm and basin 2: averaged over 0.62-0.74 nm.

We created a script which is named calcfediff

#Usage: calcfediff fes.dat a b c d
#Calculates difference between the two regions a-b and c-d.

#This is the exponentially averaged energy
grep . $1 | awk 'BEGIN{s1='$2'; s2='$3'; s3='$4'; s4='$5'; kt=2.5}{e=$2; c=exp(-e/kt); if ($1>s1 && $1<=s2) {q1+=c} if($1>s3 && $1<s4) {q2+=c} }END{print -kt*log(q1/q2)}'
#grep . $1 | awk 'BEGIN{s1='$2'; s2='$3'; s3='$4'; s4='$5'; kt=2.5}{e=$2; c=exp(-e/kt); if ($1>s1 && $1<=s2) {q1+=c} if($1>s3 && $1<s4) {q2+=c} }END{print -kt*log(q1/q2), q1, q2}'

We incorporated calcfediff inside a .sh file by following away

shopt -s nullglob
rm conv2.dat
for a in fes.dat.? fes.dat.?? fes.dat.???; do
echo -n ${a#fes.dat.} " " >>conv2.dat
calcfediff $a 0.27 0.33 0.62 0.74 >>conv2.dat
done

This gives a plot of Free Energy as a function of time which looks something like following

conv

Figure 2. Free Energy as a function of time

We can say from Figure 2 that free energy value is converged to a certain value after ~300 ns. 

Another important part to see if the simulation is converged is to see if there is enough recrossings in CVs. Figure 3 shows the fluctuation of CVs during our metadynamics simulations. As our CVs are hydrogen bond distances we can definitely say from the fluctuation that we have enough alternative fluctuations between CV1 and CV2.

recrossing

Figure 3. Fluctuation of CV1 and CV2 in our metadynamics simulation.

 

However the free energy profiles showed in Figure 1 is not reweighted and corrected for the wall bias. We need to reweight it properly and apply the correction for the applied wall potential. Keep an eye on the upcoming post where I will explain how to reweight and correct for wall potential.

Parameter for zinc centre proteins

Parametrization of metallo-proteins always been a challenging problem during the set-up of classical molecular dynamics simulation. This tutorial will describe a short-cut way to set up and Zinc (Zn2+) co-ordinated metallo-protein where Zinc stays in co-ordination with two histidines (HID) and one glutamate as described in the following figure (Figure 1). Note: All the waters are explicitly treated.

zn_coordination

Figure 1. Zn co-ordinated with two histidines and one glutamate.

The parameters for this Zn and active site residues can be found here. The work has been carried out by Yang et al. See their paper here.

Step by step guide on how to execute this:

Pre-requisites:

  1. tleap
  2. Acpype to convert Amber topology and coordinates to Gromacs compatible versions.
  3. H++ server.
  4. Downloaded files in a particular folder

 

1. Upload your PDB in H++ server and set the protonation state (for example pH 7.0).

2. Download the output PDB (Make sure you modeled all non-terminal missing residues using Modeler or using any other packages).

3. Now rename two histidines that are bound to ZINC as HID instead of HIS and visualize the structure to see if this looks logical or not?

4. Inside M1_chg2.prep file you will see the histidines are named as HD1 and HD2 whereas glutamate is named as GL1. Edit the names in your PDB files. See below.

ATOM 2392 N HD1 301 -18.445 5.436 9.635 1.00 0.00 N
ATOM 2393 CA HD1 301 -18.898 4.830 10.883 1.00 0.00 C
ATOM 2394 CB HD1 301 -20.360 4.381 10.785 1.00 0.00 C
ATOM 2395 CG HD1 301 -20.924 3.935 12.100 1.00 0.00 C
ATOM 2396 ND1 HD1 301 -21.549 4.797 12.971 1.00 0.00 N
ATOM 2397 CE1 HD1 301 -21.908 4.137 14.063 1.00 0.00 C
ATOM 2398 NE2 HD1 301 -21.523 2.880 13.931 1.00 0.00 N
ATOM 2399 CD2 HD1 301 -20.904 2.727 12.714 1.00 0.00 C
ATOM 2400 C HD1 301 -18.018 3.660 11.319 1.00 0.00 C
ATOM 2401 O HD1 301 -17.449 3.688 12.388 1.00 0.00 O

ATOM 2424 N HD2 305 -15.025 3.292 14.546 1.00 0.00 N
ATOM 2425 CA HD2 305 -14.974 2.206 15.529 1.00 0.00 C
ATOM 2426 CB HD2 305 -15.786 0.998 15.064 1.00 0.00 C
ATOM 2427 CG HD2 305 -17.207 1.024 15.511 1.00 0.00 C
ATOM 2428 ND1 HD2 305 -17.561 1.168 16.834 1.00 0.00 N
ATOM 2429 CE1 HD2 305 -18.884 1.160 16.932 1.00 0.00 C
ATOM 2430 NE2 HD2 305 -19.390 0.999 15.723 1.00 0.00 N
ATOM 2431 CD2 HD2 305 -18.367 0.901 14.814 1.00 0.00 C
ATOM 2432 C HD2 305 -13.542 1.760 15.815 1.00 0.00 C
ATOM 2433 O HD2 305 -13.223 1.284 16.920 1.00 0.00 O

ATOM 2587 N GL1 324 -21.029 3.090 20.760 1.00 0.00 N
ATOM 2588 CA GL1 324 -21.816 3.640 19.651 1.00 0.00 C
ATOM 2589 CB GL1 324 -23.171 2.938 19.536 1.00 0.00 C
ATOM 2590 CG GL1 324 -23.080 1.441 19.199 1.00 0.00 C
ATOM 2591 CD GL1 324 -22.408 1.155 17.857 1.00 0.00 C
ATOM 2592 OE1 GL1 324 -22.275 2.081 17.022 1.00 0.00 O
ATOM 2593 OE2 GL1 324 -22.020 -0.009 17.626 1.00 0.00 O
ATOM 2594 C GL1 324 -22.002 5.163 19.683 1.00 0.00 C
ATOM 2595 O GL1 324 -22.066 5.793 18.636 1.00 0.00 O

5. Rename the rest of histidines as either HIP, HIE or HID based on the following

HID: Histidine with hydrogen on the delta nitrogen

HIE: Histidine with hydrogen on the epsilon nitrogen

HIP: Histidine with hydrogens on both nitrogens; this is positively charged.

6. Remove all the hydrogens from the amino acids.

7. Make sure the Zn is separated from the amino acids using a TER. Something like following. Renamed the ZN as ZN1 as in M1_chg2.prep file it is named as ZN1. Rest of the ZINC should be ZN not Zn.

ATOM 5146 CB  LEU 632 -45.178 -16.254 12.289 1.00 0.00 C
ATOM 5147 CG  LEU 632 -45.901 -16.066 10.953 1.00 0.00 C
ATOM 5148 CD1 LEU 632 -44.934 -16.199 9.786 1.00 0.00 C
ATOM 5149 CD2 LEU 632 -47.046 -17.057 10.822 1.00 0.00 C
ATOM 5150 C   LEU 632 -45.155 -16.312 14.792 1.00 0.00 C
ATOM 5151 O   LEU 632 -44.462 -15.489 15.391 1.00 0.00 O
ATOM 5152 OXT LEU 632 -44.797 -17.516 14.897 1.00 0.00 O
TER
ATOM    1  ZN ZN1 633 -21.374 1.098 15.223 1.00 0.00 ZN
END

8. Run the tleap. See the tleap input below

source leaprc.ff14SB
loadamberparams frcmod.ionsjc_tip3p
loadamberprep ~/Dropbox/LTA4H_collab/M1_chg2.prep
loadamberparams ~/Dropbox/LTA4H_collab/M1.frcmod
loadamberparams ~/Dropbox/LTA4H_collab/metals.frcmod
x = loadPDB test.pdb
savepdb x pdbout
bond x.301.NE2 x.633.ZN
bond x.305.NE2 x.633.ZN
bond x.324.OE1 x.633.ZN
check x
saveAmberParm x prot.top prot.crd
charge x
addIons2 x Na+ 0
solvateOct x TIP3PBOX 10.0
saveAmberParm x prot_solvated.top prot_solvated.crd
savepdb x prot_solvated.pdb
quit

Use the following command to run tleap

tleap -s -f tleap.all

If you get any error please check the leap.log and post it as comments.

9. You will get AMBER generated topology and coordinates as outputs.

10. Use acpype to convert them to Gromacs compatible version. See here for the usage of acpype.