tICA Metadynamics

Time-lagged independent component analysis is a dimensionality reduction technique which transforms a high dimensional dataset to lower dimensional co-ordinates. It’s philosophy is kind of similar to PCA (principal component analysis). 

For more details check here.

It is a very recently developed technique which has a lot of promise. tICA can separate slow degrees of freedom which can be used as an input in metadynamics or any other enhanced sampling techniques for better sampling of slow transitions.

Note: Google all the recent papers which used tICA technique and you will find this technique pretty interesting. However, it needs to be tested in large scale systems to actually say something about its advantage over PCA.

This is my own tutorial where I will show how to generate tICA from a simulation and render the tICA co-ordinates as Plumed input file.

You will need following softwares before we start the tutorial.

  1. Python 3.4 or higher
  2. Jupyter Notebook
  3. MSM Builder 3.8 or higher
  4. MSM explorer
  5. MDTraj 1.9.0

Here we go:

Download the trajectory and topology (in this case PDB file) from here

Make sure the path of input is correct. Open a new Python 3 terminal in Jupyter Notebook

#Importing all the modules
#msmbuilder imports 
from msmbuilder.dataset import dataset
from msmbuilder.featurizer import ContactFeaturizer
from msmbuilder.decomposition import tICA
from msmbuilder.cluster import MiniBatchKMeans
from msmbuilder.msm import ContinuousTimeMSM
from msmbuilder.utils import verbosedump,verboseload

#other imports
import os,glob
import numpy as np
import mdtraj as md
import pandas as pd 
import pickle

#prettier plots
import seaborn as sns
%pylab inline
matplotlib.rcParams['xtick.labelsize'] = 14 
matplotlib.rcParams['ytick.labelsize'] = 14 
matplotlib.rcParams['font.family'] = "sans-serif"
#It is time to import the trajectory and the topology files
trajectory_folder_path=("*.xtc")
t = md.load('../tfit.xtc', top='../prot0.pdb')
ds = dataset('../tfit.xtc', topology='../prot0.pdb')

#get some stats about the dataset
print("Number of trajectories is %d"%len(ds))

This will give a output like

Number of trajectories is 1
#define our featurizer. I want to use the distance as a CV
residue_ind = np.array([[35, 79],[215, 293]])
feat = ContactFeaturizer(scheme='ca', contacts= np.array([[35, 79],[215, 293]]))

#transform the data with the featurizer
ds_contact=ds.fit_transform_with(feat, "res_contact/",fmt='dir-npy')

#we can load the features using
ds_contact = dataset("./res_contact/")

#see how many trajectories were retained during featurization
print(len(ds_contact),len(ds))

You  will see an output like following

1 1
#define our tica model
tica_mdl = tICA(lag_time=1000,n_components=2)

#transform the dataset with tica
tica_features = ds_contact.fit_transform_with(tica_mdl, out_ds = 'reg_ticas_1000lt_2tics//')

#load the features
tica_features = dataset("./reg_ticas_1000lt_2tics/")

#for each feature from our contact featurization we now have 2 tICs
tica_features[0].shape

Note: Choose the lag_time carefully. Go through the literatures to know more about lag time and set it according to your suitability.

Output

(6809, 2)
#lets dump the tica mdl for future use
verbosedump(tica_mdl,"tica_mdl.pkl")

Output

Saving "tica_mdl.pkl"... (<class 'msmbuilder.decomposition.tica.tICA'>)

Time to render tICA model in a Plumed like file

import pandas as pd 
import mdtraj as md 
from msmbuilder.utils import load
from msmbuilder.featurizer import ContactFeaturizer
from tica_metadynamics.plumed_writer import render_tica_plumed_file

#python imports
import os,glob
import numpy as np
import pickle
#make sure that you have already fit the model using the notebooks above.
tica_model = load("./tica_mdl.pkl")
top = md.load("../prot0.pdb")
# swap this for whatever you have. The code for now supports contacts, dihedral, and angles. 
feat =ContactFeaturizer([[35,79],[215,293]], scheme='ca')
# this basically maps every feature to atom indices. 
df1 = pd.DataFrame(feat.describe_features(top))
print(df1)

Output

atominds featuregroup featurizer otherinfo resids resnames \
0 [[534], [1203]] ca Contact 20 [35, 79] [ASP, VAL] 
1 [[3336], [4558]] ca Contact 20 [215, 293] [ASH, LEU]

resseqs 
0 [36, 80] 
1 [216, 294]
# to see the first 2 contacts for example 
print(df1.head())

output  = render_tica_plumed_file(tica_mdl= tica_model, df= df1, n_tics= 2, multiple_tics=None)

# output is a dictionary whose integer values are the tICs
# to get the 0th tIC
print(output[0])

# to get the 1st tIC
print(output[1])

Output

RESTART
DISTANCE ATOMS=535,1204 LABEL=ca_35_79 

DISTANCE ATOMS=3337,4559 LABEL=ca_215_293 

MATHEVAL ARG=ca_35_79 FUNC=x-1.53548354803 LABEL=meanfree_None_ca_35_79 PERIODIC=NO 

MATHEVAL ARG=ca_215_293 FUNC=x-1.49968935084 LABEL=meanfree_None_ca_215_293 PERIODIC=NO 

COMBINE LABEL=tic0 ARG=meanfree_None_ca_35_79,meanfree_None_ca_215_293 COEFFICIENTS=-0.138211254458,-11.6562007437 PERIODIC=NO 
METAD ARG=tic0 SIGMA=0.2 HEIGHT=1.0 FILE=HILLS TEMP=300 PACE=1000 LABEL=metad BIASFACTOR=50
PRINT ARG=tic0,metad.bias STRIDE=1000 FILE=BIAS 
RESTART
DISTANCE ATOMS=535,1204 LABEL=ca_35_79 

DISTANCE ATOMS=3337,4559 LABEL=ca_215_293 

MATHEVAL ARG=ca_35_79 FUNC=x-1.53548354803 LABEL=meanfree_None_ca_35_79 PERIODIC=NO 

MATHEVAL ARG=ca_215_293 FUNC=x-1.49968935084 LABEL=meanfree_None_ca_215_293 PERIODIC=NO 

COMBINE LABEL=tic1 ARG=meanfree_None_ca_35_79,meanfree_None_ca_215_293 COEFFICIENTS=-2.78923032711,-0.309608739165 PERIODIC=NO 
METAD ARG=tic1 SIGMA=0.2 HEIGHT=1.0 FILE=HILLS TEMP=300 PACE=1000 LABEL=metad BIASFACTOR=50
PRINT ARG=tic1,metad.bias STRIDE=1000 FILE=BIAS

A little bit of adjustment will give you a Plumed file which you can use for Metadynamics simulations

DISTANCE ATOMS=535,1204 LABEL=d1
DISTANCE ATOMS=3337,4559 LABEL=d2

MATHEVAL ARG=d1 FUNC=x-1.53548354803 LABEL=c1 PERIODIC=NO 
MATHEVAL ARG=d2 FUNC=x-1.49968935084 LABEL=c2 PERIODIC=NO 

COMBINE LABEL=tic0 ARG=c1,c2 COEFFICIENTS=-0.138211254458,-11.6562007437 PERIODIC=NO
COMBINE LABEL=tic1 ARG=c1,c2 COEFFICIENTS=-2.78923032711,-0.309608739165 PERIODIC=NO 

UPPER_WALLS ARG=d1,d2 AT=2.2,1.65 KAPPA=5000.0,5000.0 EXP=4,4 EPS=1,1 OFFSET=0,0 LABEL=uwall
LOWER_WALLS ARG=d1,d2 AT=1.0,1.3 KAPPA=5000.0,5000.0 EXP=4,4 EPS=1,1 OFFSET=0,0 LABEL=lwall
 
#METAD ARG=tic0,tic1 SIGMA=0.02,0.02 HEIGHT=1.0 FILE=HILLS TEMP=300 PACE=1000 LABEL=metad BIASFACTOR=15
PRINT ARG=tic0,tic1,d1,d2,c1,c2 STRIDE=1 FILE=COLVAR

Try to run plumed driver using the above input to confirm if everything is okay or not.

Jupyter notebook and Matplotlib

I was thinking to use Jupyter notebook for quite a some time and finally I started using it. Here is a quick installation guide for Linux (Ubuntu).

If Python is already installed

pip3 install --upgrade pip

pip3 install jupyter

Also you can install it with Anaconda. Details here 

To run notebook 

jupyter notebook

This will open up a webpage in your default browser and you can click on New and create a new notebook.

Gnuplot with Jupyter

I always wanted to have a gnuplot module integrated with Jupyter hence I installed the Gnuplot kernel with Jupyter notebook. The instructions are here

Pre-requisites

  1. System installation og Gnuplot in either Linux or Mac
  2. Installation of Jupyter notebook
  3. Metakernel

Install Metakernel by following command

pip install metakernel --upgrade 

Installation of Gnuplot kernel

pip install git+https://github.com/has2k1/gnuplot_kernel.git@master

Now when you will open jupyter notebook you will see Python 3 and Gnuplot option under New.

You can use bash within Jupyter notebook cell by starting with the following command

%%bash

Matplotlib with Jupyter

Jupyter uses matplotlib as one of the default plotting programme. If you have a .dat file (space separated) you can use something like following and click Run to see the plot. Make sure you have numpy, pandas installed.

import pandas as pd
import numpy as np
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

# read data
NPT=pd.read_csv('data/NPT_time_vol_pres.dat',sep='\s+');

nDataPts = len(NPT.index);
selFrames = np.arange(0,nDataPts,10);

# compute running average 
def movingaverage(interval, window_size):
    window = np.ones(int(window_size))/float(window_size);
    return np.convolve(interval, window, 'same');
p_ma = movingaverage(NPT['PRESS'],1000);

#plot
plt.figure(figsize=(15,5)) 
plt.scatter(NPT["TIME(PS)"][selFrames], NPT["PRESS"][selFrames],alpha=0.3, edgecolors='none');
plt.plot(NPT['TIME(PS)'],p_ma);
plt.xlabel('time (ps)'); plt.ylabel('pressure (bar)'); plt.title('pressure fluctuations during NPT simulation');

 

Table of contents in Jupyter

Often Jupyter notebook gets long enough by inclusion of plots and figures. In order to navigate easily within the notebook it needs a hyperlinked Table of Contents. All the jupyter extensions such as Table of Contents can be downloaded from here.

Installation

sudo pip install https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tarball/master
jupyter contrib nbextension install --user

Now load the jupyter notebook. You can then open the nbextensions tab on the tree (dashboard/file browser) notebook page to configure nbextensions. You will have access there to a dashboard where extensions can be enabled/disabled via checkboxes. Additionally a short documentation for each extension is displayed, and configuration options are presented.

Note: To be continued