Reweighting Metadynamics

In my last post I described how to check convergence of metadynamics simulation. However, the 1D free energy surface showed in the previous post is not reweighted and corrected properly because I had both upper and lower wall in my metadynamics simulation. Hence we have to correct the free energy surface for the walls and we can do it using Plumed-2.4.1. It is important to mention that I ran my simulations with Plumed-1.3 and have to do some adjustments to get a 1ps sparsed COLVAR file whose header looks like following

 

#! FIELDS time d1 d2 d3 metad.rbias uwall.bias    
    1.0000      0.297346503      0.718053460      1.629577041    0.000000000     1.516799331 
    2.0000      0.312700063      0.676572204      1.619366407    0.500000000     1.017568588 
    3.0000      0.337963104      0.671517015      1.576508045    0.437842011     0.172387958 
    4.0000      0.304164201      0.674881995      1.545186877    0.981566668     0.022762353 
    5.0000      0.281478077      0.678171694      1.506772399    0.851512551     0.003159886 
    6.0000      0.343289763      0.649008214      1.564545155    1.156349301     0.086780988 
    7.0000      0.314589500      0.690125763      1.542209387    1.231884360     0.028832838 
    8.0000      0.297886729      0.662928343      1.551209688    1.714575529     0.034525435 
    9.0000      0.293216735      0.665717304      1.541974306    1.670215130     0.015825573 

 

Worth mentioning that the original COLVAR output from Plumed 1.3 looks like following

 

#! FIELDS time cv1 cv2 cv3 vbias vwall
    0.0000      0.331873775      0.679191828      1.648215294    0.000000000     2.416546345 
    0.1000      0.305706650      0.693865001      1.633054614    0.000000000     1.585586548 
    0.2000      0.289182395      0.736790895      1.625117898    0.000000000     1.509020209 
    0.3000      0.306852549      0.739897966      1.635408998    0.000000000     2.007535696 
    0.4000      0.289105147      0.741805196      1.605738878    0.000000000     0.980214059 
    0.5000      0.314303517      0.736100793      1.611924648    0.000000000     1.059433341 
    0.6000      0.316665620      0.693137527      1.657169104    0.000000000     3.068289280 
    0.7000      0.313074738      0.689189851      1.670769334    0.000000000     4.263953686 
    0.8000      0.281177700      0.716560900      1.678995728    0.000000000     5.230778694

I edited cv1, cv2, cv3 to d1, d2 and d3 and changed vbias to metad.rbias and vwall to uwall.bias just for naming convenience.

vwall in the original COLVAR defines the external wall potential as a combination of both upper and lower wall. 

Note: You have to delete the values corresponds to 0.000 ps from the final COLVAR file. 

The method is well described here.

See the following script which produces 1D reweighted free energy surface on CV1 which is d1 in the new COLVAR 1ps sparsed file. You will get two outputs fes.dat and fes_wall.dat and try to see how different the wall corrected (fes_wall.dat) and with wall free energy surfaces (fes.dat) look like. It will look something like following

fes1W50_wall.png

Figure 1. 1D reweighted free energy surface on CV1.

I used the following script

 

# Read COLVAR file
distance:    READ FILE=COLVAR_sparse  IGNORE_TIME VALUES=d1
metad:       READ FILE=COLVAR_sparse IGNORE_TIME VALUES=metad.rbias
uwall:       READ FILE=COLVAR_sparse IGNORE_TIME VALUES=uwall.bias

# Define weights
weights1: REWEIGHT_METAD TEMP=300 ARG=metad.rbias
weights2: REWEIGHT_BIAS TEMP=300 ARG=metad.rbias,uwall.bias

# Calculate histograms
HISTOGRAM ...
  ARG=distance
  GRID_MIN=0.0
  GRID_MAX=1.0
  GRID_BIN=100
  BANDWIDTH=0.002
  LOGWEIGHTS=weights1
  LABEL=hh1
... HISTOGRAM

HISTOGRAM ...
  ARG=distance
  GRID_MIN=0.0
  GRID_MAX=1.0
  GRID_BIN=100
  BANDWIDTH=0.002
  LOGWEIGHTS=weights2
  LABEL=hh2
... HISTOGRAM

# Print histograms to file
ff1: CONVERT_TO_FES GRID=hh1 TEMP=300
DUMPGRID GRID=ff1 FILE=fes.dat
DUMPGRID GRID=hh1 FILE=histo FMT=%24.16e
ff2: CONVERT_TO_FES GRID=hh2 TEMP=300
DUMPGRID GRID=ff2 FILE=fes_wall.dat
DUMPGRID GRID=hh2 FILE=histo_wall FMT=%24.16e

Ofcourse we can reweight the free energy surface on both CV1 and CV2 using the following script

# Read COLVAR file

distance1:   READ FILE=COLVAR_sparse  IGNORE_TIME VALUES=d1
distance2:   READ FILE=COLVAR_sparse  IGNORE_TIME VALUES=d2
metad:       READ FILE=COLVAR_sparse IGNORE_TIME VALUES=metad.rbias
uwall:       READ FILE=COLVAR_sparse IGNORE_TIME VALUES=uwall.bias

# Define weights

weights1: REWEIGHT_METAD TEMP=300 ARG=metad.rbias
weights2: REWEIGHT_BIAS TEMP=300 ARG=metad.rbias,uwall.bias

# Calculate histograms

HISTOGRAM ...
  ARG=distance1,distance2
  GRID_MIN=0.0,0.0
  GRID_MAX=1.0,1.0
  GRID_BIN=200,200
  BANDWIDTH=0.05,0.05
  LOGWEIGHTS=weights1
  LABEL=hh1
... HISTOGRAM

HISTOGRAM ...

  ARG=distance1,distance2
  GRID_MIN=0.0,0.0
  GRID_MAX=1.0,1.0
  GRID_BIN=200,200
  BANDWIDTH=0.05,0.05
  LOGWEIGHTS=weights2
  LABEL=hh2
... HISTOGRAM

# Print histograms to file

ff1: CONVERT_TO_FES GRID=hh1 TEMP=300
DUMPGRID GRID=ff1 FILE=fes.dat
DUMPGRID GRID=hh1 FILE=histo FMT=%24.16e
ff2: CONVERT_TO_FES GRID=hh2 TEMP=300
DUMPGRID GRID=ff2 FILE=fes_wall.dat
DUMPGRID GRID=hh2 FILE=histo_wall FMT=%24.16e

Here I plotted fes.dat again which is two dimensional free energy surface reweighted on CV1 and CV2. You can plot fes.dat and fes_wall.dat in a 2D plot to see what was the effect of wall. Here I plotted fes_wall.dat which is wall corrected 2D FES.

fes2d_wall_colornew

 Figure 2. 2D reweighted free energy surface on CV1 and CV2.

Note: Sorry for bad quality rendering (email me for better picture)

Reweighting algorithm described here

Pratyush Tiwary and Michele Parrinello. A time-independent free energy estimator for metadynamics. The Journal of Physical Chemistry B, 119(3):736–742, 2015. PMID: 25046020.

Manual of how to use reweighting metadynamics described here.

 

 

 

 

Convergence of Metadynamics simulation

Convergence of metadynamics or any enhanced sampling calculations plays an important role in deciding how much you can trust the outcomes from metadynamics simulations. here is a step by step tutorial and codes on how to check convergence in metadynamics simulations.

I ran two metadynamics simulations with DISTANCE CVs using Plumed 1.3 patched with Gromacs 4.6.2. As an output I got HILLS file whose first part looks like following

 2.000 0.312700063 0.676572204 0.050000001 0.050000001 0.555555556 10.000 
 4.000 0.304164201 0.674881995 0.050000001 0.050000001 0.543420527 10.000 
 6.000 0.343289763 0.649008214 0.050000001 0.050000001 0.539083481 10.000 
 8.000 0.297886729 0.662928343 0.050000001 0.050000001 0.525467237 10.000 
 10.000 0.291971087 0.575445116 0.050000001 0.050000001 0.547548082 10.000 
 12.000 0.343111038 0.602572322 0.050000001 0.050000001 0.531554010 10.000 
 14.000 0.285310268 0.667471945 0.050000001 0.050000001 0.513465246 10.000 
 16.000 0.319890589 0.655008733 0.050000001 0.050000001 0.497615549 10.000 
 18.000 0.389135718 0.529440880 0.050000001 0.050000001 0.550858286 10.000 
 20.000 0.286446780 0.555136204 0.050000001 0.050000001 0.533876313 10.000

DISTANCE CVs used in the calculation

PRINT W_STRIDE 50
HILLS RESTART W_STRIDE 1000 HEIGHT 0.5
WELLTEMPERED SIMTEMP 298 BIASFACTOR 10
DISTANCE LIST 1143 1221 SIGMA 0.05 
DISTANCE LIST 1143 520 SIGMA 0.05 
DISTANCE LIST 513 1153

UWALL CV 1 LIMIT 0.45 KAPPA 5000.0 EXP 4.0
LWALL CV 1 LIMIT 0.25 KAPPA 5000.0 EXP 4.0

UWALL CV 2 LIMIT 0.65 KAPPA 5000.0 EXP 4.0
LWALL CV 2 LIMIT 0.25 KAPPA 5000.0 EXP 4.0

UWALL CV 3 LIMIT 1.50 KAPPA 5000.0 EXP 4.0

ENDMETA

I also post processed all my trajectories to generate a file named trajfit.xtc using the following script

#Combining the .xtc files
/home/per/lun/gromacs-4.6.2/build/src/tools/trjcat_mpi -f traj.part00*.xtc -o trajall.xtc
#PBC fixed, echo 1 signifies only protein as output, 1was the indx number
echo 1 | /home/per/lun/gromacs-4.6.2/build/src/tools/trjconv_mpi -f trajall.xtc -pbc nojump -o trajnoj.xtc -s md.tpr 
# Aligning on CA and protein as output
echo 3 1 | /home/per/lun/gromacs-4.6.2/build/src/tools/trjconv_mpi -s md.tpr -f trajnoj.xtc -fit rot+trans -o trajfit.xtc

I used plumed driver on trajfit.xtc file to generate a COLVAR file with 1 ps time step. The plumed.dat file used with the driver looks like following

d1: DISTANCE ATOMS=1143,1221
d2: DISTANCE ATOMS=1143,520

PRINT ARG=d1,d2 STRIDE=1 FILE=COLVAR

I issued the driver command on the trajfit.xtc file

plumed driver --mf_xtc trajfit.xtc --plumed plumed.dat

I got a COLVAR file as output and removed the header as well as removed the value corresponds to 0.00 ps. The next step was to match the end part of COLVAR and HILLS files.  

Now I dumped fes files corresponds to each ns using the following command

~/plumed-1.3/utilities/WT_reweight/reweight -colvar COLVAR -hills HILLS -ncv 2 -nvar 2 -stride 2 -fes 1 -temp 300 -welltemp -timeout 1000 -kjoule

So the one dimensional reweighted free energy surface looks like following

fes1W50_1d

Figure 1. 1D FES reweighted on CV1.

The FES above shows two basins, basin 1: averaged over 0.27-0.33 nm and basin 2: averaged over 0.62-0.74 nm.

We created a script which is named calcfediff

#Usage: calcfediff fes.dat a b c d
#Calculates difference between the two regions a-b and c-d.

#This is the exponentially averaged energy
grep . $1 | awk 'BEGIN{s1='$2'; s2='$3'; s3='$4'; s4='$5'; kt=2.5}{e=$2; c=exp(-e/kt); if ($1>s1 && $1<=s2) {q1+=c} if($1>s3 && $1<s4) {q2+=c} }END{print -kt*log(q1/q2)}'
#grep . $1 | awk 'BEGIN{s1='$2'; s2='$3'; s3='$4'; s4='$5'; kt=2.5}{e=$2; c=exp(-e/kt); if ($1>s1 && $1<=s2) {q1+=c} if($1>s3 && $1<s4) {q2+=c} }END{print -kt*log(q1/q2), q1, q2}'

We incorporated calcfediff inside a .sh file by following away

shopt -s nullglob
rm conv2.dat
for a in fes.dat.? fes.dat.?? fes.dat.???; do
echo -n ${a#fes.dat.} " " >>conv2.dat
calcfediff $a 0.27 0.33 0.62 0.74 >>conv2.dat
done

This gives a plot of Free Energy as a function of time which looks something like following

conv

Figure 2. Free Energy as a function of time

We can say from Figure 2 that free energy value is converged to a certain value after ~300 ns. 

Another important part to see if the simulation is converged is to see if there is enough recrossings in CVs. Figure 3 shows the fluctuation of CVs during our metadynamics simulations. As our CVs are hydrogen bond distances we can definitely say from the fluctuation that we have enough alternative fluctuations between CV1 and CV2.

recrossing

Figure 3. Fluctuation of CV1 and CV2 in our metadynamics simulation.

 

However the free energy profiles showed in Figure 1 is not reweighted and corrected for the wall bias. We need to reweight it properly and apply the correction for the applied wall potential. Keep an eye on the upcoming post where I will explain how to reweight and correct for wall potential.

tICA Metadynamics

Time-lagged independent component analysis is a dimensionality reduction technique which transforms a high dimensional dataset to lower dimensional co-ordinates. It’s philosophy is kind of similar to PCA (principal component analysis). 

For more details check here.

It is a very recently developed technique which has a lot of promise. tICA can separate slow degrees of freedom which can be used as an input in metadynamics or any other enhanced sampling techniques for better sampling of slow transitions.

Note: Google all the recent papers which used tICA technique and you will find this technique pretty interesting. However, it needs to be tested in large scale systems to actually say something about its advantage over PCA.

This is my own tutorial where I will show how to generate tICA from a simulation and render the tICA co-ordinates as Plumed input file.

You will need following softwares before we start the tutorial.

  1. Python 3.4 or higher
  2. Jupyter Notebook
  3. MSM Builder 3.8 or higher
  4. MSM explorer
  5. MDTraj 1.9.0

Here we go:

Download the trajectory and topology (in this case PDB file) from here

Make sure the path of input is correct. Open a new Python 3 terminal in Jupyter Notebook

#Importing all the modules
#msmbuilder imports 
from msmbuilder.dataset import dataset
from msmbuilder.featurizer import ContactFeaturizer
from msmbuilder.decomposition import tICA
from msmbuilder.cluster import MiniBatchKMeans
from msmbuilder.msm import ContinuousTimeMSM
from msmbuilder.utils import verbosedump,verboseload

#other imports
import os,glob
import numpy as np
import mdtraj as md
import pandas as pd 
import pickle

#prettier plots
import seaborn as sns
%pylab inline
matplotlib.rcParams['xtick.labelsize'] = 14 
matplotlib.rcParams['ytick.labelsize'] = 14 
matplotlib.rcParams['font.family'] = "sans-serif"
#It is time to import the trajectory and the topology files
trajectory_folder_path=("*.xtc")
t = md.load('../tfit.xtc', top='../prot0.pdb')
ds = dataset('../tfit.xtc', topology='../prot0.pdb')

#get some stats about the dataset
print("Number of trajectories is %d"%len(ds))

This will give a output like

Number of trajectories is 1
#define our featurizer. I want to use the distance as a CV
residue_ind = np.array([[35, 79],[215, 293]])
feat = ContactFeaturizer(scheme='ca', contacts= np.array([[35, 79],[215, 293]]))

#transform the data with the featurizer
ds_contact=ds.fit_transform_with(feat, "res_contact/",fmt='dir-npy')

#we can load the features using
ds_contact = dataset("./res_contact/")

#see how many trajectories were retained during featurization
print(len(ds_contact),len(ds))

You  will see an output like following

1 1
#define our tica model
tica_mdl = tICA(lag_time=1000,n_components=2)

#transform the dataset with tica
tica_features = ds_contact.fit_transform_with(tica_mdl, out_ds = 'reg_ticas_1000lt_2tics//')

#load the features
tica_features = dataset("./reg_ticas_1000lt_2tics/")

#for each feature from our contact featurization we now have 2 tICs
tica_features[0].shape

Note: Choose the lag_time carefully. Go through the literatures to know more about lag time and set it according to your suitability.

Output

(6809, 2)
#lets dump the tica mdl for future use
verbosedump(tica_mdl,"tica_mdl.pkl")

Output

Saving "tica_mdl.pkl"... (<class 'msmbuilder.decomposition.tica.tICA'>)

Time to render tICA model in a Plumed like file

import pandas as pd 
import mdtraj as md 
from msmbuilder.utils import load
from msmbuilder.featurizer import ContactFeaturizer
from tica_metadynamics.plumed_writer import render_tica_plumed_file

#python imports
import os,glob
import numpy as np
import pickle
#make sure that you have already fit the model using the notebooks above.
tica_model = load("./tica_mdl.pkl")
top = md.load("../prot0.pdb")
# swap this for whatever you have. The code for now supports contacts, dihedral, and angles. 
feat =ContactFeaturizer([[35,79],[215,293]], scheme='ca')
# this basically maps every feature to atom indices. 
df1 = pd.DataFrame(feat.describe_features(top))
print(df1)

Output

atominds featuregroup featurizer otherinfo resids resnames \
0 [[534], [1203]] ca Contact 20 [35, 79] [ASP, VAL] 
1 [[3336], [4558]] ca Contact 20 [215, 293] [ASH, LEU]

resseqs 
0 [36, 80] 
1 [216, 294]
# to see the first 2 contacts for example 
print(df1.head())

output  = render_tica_plumed_file(tica_mdl= tica_model, df= df1, n_tics= 2, multiple_tics=None)

# output is a dictionary whose integer values are the tICs
# to get the 0th tIC
print(output[0])

# to get the 1st tIC
print(output[1])

Output

RESTART
DISTANCE ATOMS=535,1204 LABEL=ca_35_79 

DISTANCE ATOMS=3337,4559 LABEL=ca_215_293 

MATHEVAL ARG=ca_35_79 FUNC=x-1.53548354803 LABEL=meanfree_None_ca_35_79 PERIODIC=NO 

MATHEVAL ARG=ca_215_293 FUNC=x-1.49968935084 LABEL=meanfree_None_ca_215_293 PERIODIC=NO 

COMBINE LABEL=tic0 ARG=meanfree_None_ca_35_79,meanfree_None_ca_215_293 COEFFICIENTS=-0.138211254458,-11.6562007437 PERIODIC=NO 
METAD ARG=tic0 SIGMA=0.2 HEIGHT=1.0 FILE=HILLS TEMP=300 PACE=1000 LABEL=metad BIASFACTOR=50
PRINT ARG=tic0,metad.bias STRIDE=1000 FILE=BIAS 
RESTART
DISTANCE ATOMS=535,1204 LABEL=ca_35_79 

DISTANCE ATOMS=3337,4559 LABEL=ca_215_293 

MATHEVAL ARG=ca_35_79 FUNC=x-1.53548354803 LABEL=meanfree_None_ca_35_79 PERIODIC=NO 

MATHEVAL ARG=ca_215_293 FUNC=x-1.49968935084 LABEL=meanfree_None_ca_215_293 PERIODIC=NO 

COMBINE LABEL=tic1 ARG=meanfree_None_ca_35_79,meanfree_None_ca_215_293 COEFFICIENTS=-2.78923032711,-0.309608739165 PERIODIC=NO 
METAD ARG=tic1 SIGMA=0.2 HEIGHT=1.0 FILE=HILLS TEMP=300 PACE=1000 LABEL=metad BIASFACTOR=50
PRINT ARG=tic1,metad.bias STRIDE=1000 FILE=BIAS

A little bit of adjustment will give you a Plumed file which you can use for Metadynamics simulations

DISTANCE ATOMS=535,1204 LABEL=d1
DISTANCE ATOMS=3337,4559 LABEL=d2

MATHEVAL ARG=d1 FUNC=x-1.53548354803 LABEL=c1 PERIODIC=NO 
MATHEVAL ARG=d2 FUNC=x-1.49968935084 LABEL=c2 PERIODIC=NO 

COMBINE LABEL=tic0 ARG=c1,c2 COEFFICIENTS=-0.138211254458,-11.6562007437 PERIODIC=NO
COMBINE LABEL=tic1 ARG=c1,c2 COEFFICIENTS=-2.78923032711,-0.309608739165 PERIODIC=NO 

UPPER_WALLS ARG=d1,d2 AT=2.2,1.65 KAPPA=5000.0,5000.0 EXP=4,4 EPS=1,1 OFFSET=0,0 LABEL=uwall
LOWER_WALLS ARG=d1,d2 AT=1.0,1.3 KAPPA=5000.0,5000.0 EXP=4,4 EPS=1,1 OFFSET=0,0 LABEL=lwall
 
#METAD ARG=tic0,tic1 SIGMA=0.02,0.02 HEIGHT=1.0 FILE=HILLS TEMP=300 PACE=1000 LABEL=metad BIASFACTOR=15
PRINT ARG=tic0,tic1,d1,d2,c1,c2 STRIDE=1 FILE=COLVAR

Try to run plumed driver using the above input to confirm if everything is okay or not.