##### Life in Computational Biology

Biology in motion

Time-lagged independent component analysis is a dimensionality reduction technique which transforms a high dimensional dataset to lower dimensional co-ordinates. It’s philosophy is kind of similar to PCA (principal component analysis).

For more details check here.

It is a very recently developed technique which has a lot of promise. tICA can separate slow degrees of freedom which can be used as an input in metadynamics or any other enhanced sampling techniques for better sampling of slow transitions.

Note: Google all the recent papers which used tICA technique and you will find this technique pretty interesting. However, it needs to be tested in large scale systems to actually say something about its advantage over PCA.

This is my own tutorial where I will show how to generate tICA from a simulation and render the tICA co-ordinates as Plumed input file.

You will need following softwares before we start the tutorial.

1. Python 3.4 or higher
2. Jupyter Notebook
3. MSM Builder 3.8 or higher
4. MSM explorer
5. MDTraj 1.9.0

Here we go:

Download the trajectory and topology (in this case PDB file) from here

Make sure the path of input is correct. Open a new Python 3 terminal in Jupyter Notebook

```#Importing all the modules
#msmbuilder imports
from msmbuilder.dataset import dataset
from msmbuilder.featurizer import ContactFeaturizer
from msmbuilder.decomposition import tICA
from msmbuilder.cluster import MiniBatchKMeans
from msmbuilder.msm import ContinuousTimeMSM

#other imports
import os,glob
import numpy as np
import mdtraj as md
import pandas as pd
import pickle

#prettier plots
import seaborn as sns
%pylab inline
matplotlib.rcParams['xtick.labelsize'] = 14
matplotlib.rcParams['ytick.labelsize'] = 14
matplotlib.rcParams['font.family'] = "sans-serif"```
```#It is time to import the trajectory and the topology files
trajectory_folder_path=("*.xtc")
ds = dataset('../tfit.xtc', topology='../prot0.pdb')

#get some stats about the dataset
print("Number of trajectories is %d"%len(ds))```

This will give a output like

`Number of trajectories is 1`
```#define our featurizer. I want to use the distance as a CV
residue_ind = np.array([[35, 79],[215, 293]])
feat = ContactFeaturizer(scheme='ca', contacts= np.array([[35, 79],[215, 293]]))

#transform the data with the featurizer
ds_contact=ds.fit_transform_with(feat, "res_contact/",fmt='dir-npy')

#we can load the features using
ds_contact = dataset("./res_contact/")

#see how many trajectories were retained during featurization
print(len(ds_contact),len(ds))```

You  will see an output like following

`1 1`
```#define our tica model
tica_mdl = tICA(lag_time=1000,n_components=2)

#transform the dataset with tica
tica_features = ds_contact.fit_transform_with(tica_mdl, out_ds = 'reg_ticas_1000lt_2tics//')

tica_features = dataset("./reg_ticas_1000lt_2tics/")

#for each feature from our contact featurization we now have 2 tICs
tica_features.shape```

Note: Choose the lag_time carefully. Go through the literatures to know more about lag time and set it according to your suitability.

Output

`(6809, 2)`
```#lets dump the tica mdl for future use
verbosedump(tica_mdl,"tica_mdl.pkl")```

Output

`Saving "tica_mdl.pkl"... (<class 'msmbuilder.decomposition.tica.tICA'>)`

Time to render tICA model in a Plumed like file

```import pandas as pd
import mdtraj as md
from msmbuilder.featurizer import ContactFeaturizer

#python imports
import os,glob
import numpy as np
import pickle```
```#make sure that you have already fit the model using the notebooks above.
```# swap this for whatever you have. The code for now supports contacts, dihedral, and angles.
feat =ContactFeaturizer([[35,79],[215,293]], scheme='ca')```
```# this basically maps every feature to atom indices.
df1 = pd.DataFrame(feat.describe_features(top))```
`print(df1)`

Output

```atominds featuregroup featurizer otherinfo resids resnames \
0 [, ] ca Contact 20 [35, 79] [ASP, VAL]
1 [, ] ca Contact 20 [215, 293] [ASH, LEU]

resseqs
0 [36, 80]
1 [216, 294]```
```# to see the first 2 contacts for example

output  = render_tica_plumed_file(tica_mdl= tica_model, df= df1, n_tics= 2, multiple_tics=None)

# output is a dictionary whose integer values are the tICs
# to get the 0th tIC
print(output)

# to get the 1st tIC
print(output)```

Output

```RESTART
DISTANCE ATOMS=535,1204 LABEL=ca_35_79

DISTANCE ATOMS=3337,4559 LABEL=ca_215_293

MATHEVAL ARG=ca_35_79 FUNC=x-1.53548354803 LABEL=meanfree_None_ca_35_79 PERIODIC=NO

MATHEVAL ARG=ca_215_293 FUNC=x-1.49968935084 LABEL=meanfree_None_ca_215_293 PERIODIC=NO

COMBINE LABEL=tic0 ARG=meanfree_None_ca_35_79,meanfree_None_ca_215_293 COEFFICIENTS=-0.138211254458,-11.6562007437 PERIODIC=NO
RESTART
DISTANCE ATOMS=535,1204 LABEL=ca_35_79

DISTANCE ATOMS=3337,4559 LABEL=ca_215_293

MATHEVAL ARG=ca_35_79 FUNC=x-1.53548354803 LABEL=meanfree_None_ca_35_79 PERIODIC=NO

MATHEVAL ARG=ca_215_293 FUNC=x-1.49968935084 LABEL=meanfree_None_ca_215_293 PERIODIC=NO

COMBINE LABEL=tic1 ARG=meanfree_None_ca_35_79,meanfree_None_ca_215_293 COEFFICIENTS=-2.78923032711,-0.309608739165 PERIODIC=NO

A little bit of adjustment will give you a Plumed file which you can use for Metadynamics simulations

```DISTANCE ATOMS=535,1204 LABEL=d1
DISTANCE ATOMS=3337,4559 LABEL=d2

MATHEVAL ARG=d1 FUNC=x-1.53548354803 LABEL=c1 PERIODIC=NO
MATHEVAL ARG=d2 FUNC=x-1.49968935084 LABEL=c2 PERIODIC=NO

COMBINE LABEL=tic0 ARG=c1,c2 COEFFICIENTS=-0.138211254458,-11.6562007437 PERIODIC=NO
COMBINE LABEL=tic1 ARG=c1,c2 COEFFICIENTS=-2.78923032711,-0.309608739165 PERIODIC=NO

UPPER_WALLS ARG=d1,d2 AT=2.2,1.65 KAPPA=5000.0,5000.0 EXP=4,4 EPS=1,1 OFFSET=0,0 LABEL=uwall
LOWER_WALLS ARG=d1,d2 AT=1.0,1.3 KAPPA=5000.0,5000.0 EXP=4,4 EPS=1,1 OFFSET=0,0 LABEL=lwall