# Task 1
Read a space separated file using Python.
1 0.296275 2 0.290138 3 0.299182 4 0.286541 5 0.294814 6 0.287718 7 0.289807 8 0.279831 9 0.325793
#Code
>> f = open(‘test.txt’, ‘r’)
>>> for line in f:
… line = line.strip()
… columns = line.split()
… time = columns[0]
… dist = float(columns[1])
… print(time, dist)
So if the file has a header
#Time distance 1 0.296275 2 0.290138 3 0.299182 4 0.286541 5 0.294814 6 0.287718 7 0.289807 8 0.279831 9 0.325793
Here is the code
>>>f = open('test.txt', 'r') #Remove the header >>> header1 = f.readline() >>> for line in f: ... line = line.strip() ... columns = line.split() ... time = columns[0] ... dist = float(columns[1]) ... print(time, dist)
# If Column 2 > cutoff print out the corresponding Column 1
#Using Pandas import pandas as pd df = pd.read_csv('test.txt', sep=' ', header=None) print (df[df[1]>0.29][0])
#Output
0 1
1 2
2 3
4 5
8 9
#Alternative
>>> f = open('test.txt', 'r') >>> for line in f: ... line = line.strip() ... columns = line.split() ... time = columns[0] ... dist = float(columns[1]) ... if dist > 0.29: ... print(time)
#Correlation coefficient between two columns
#Input (coor.txt)
2 3 3 6 4 9 5 12 6 15 7 21 8 24 9 27
# Command
import pandas as pd df = pd.read_csv('coor.txt', sep=' ', header=None) df[0].corr(df[1])
# Introduction to function
The following is our dataset named test.txt (this dataset has header)
#Time dist 1 0.296275 2 0.290138 3 0.299182 4 0.286541 5 0.294814 6 0.287718 7 0.289807 8 0.279831 9 0.325793
The aim is to write a function which will do the following:
If Column 2 > cutoff print out the corresponding Column 1 else print out the values of column 2
#Code
>>> f = open('test.txt', 'r') >>> header1 = f.readline() >>> for line in f: ... line=line.strip() ... columns=line.split() ... time=columns[0] ... dist=float(columns[1]) ... def is_cutoff(input_list): ... if input_list > 0.29: ... return time ... else: ... return dist ... bhakat_data = is_cutoff(dist) ... print(bhakat_data)
#Output
1 2 3 0.286541 5 0.287718 0.289807 0.279831 9
# Task: Reshape a list into a 2*3 column using pandas
#Code
a=[-0.26087587488489344,-0.5473357965371729,-0.27654025363442625,1.404881305053428,-0.018092484992362462,0.12383162178724855]
len(a)
s = pd.DataFrame(np.array(a).reshape(2,3), columns = list("abc"))
print (s)
# Outcome
a b c
0 -0.260876 -0.547336 -0.276540
1 1.404881 -0.018092 0.123832
# List to column (with two decimal points in the output)
#Code
import pandas as pd a=[-0.26087587488489344,-0.5473357965371729] df = pd.Series(a) df = df.round(2)
# Output
>>> df 0 -0.26 1 -0.55
# Task: Selecting multiple columns using pandas
The dataset looks like
0.000000 0.287274 0.281606 1.000000 0.296275 0.278471 2.000000 0.290138 0.265047 3.000000 0.299182 0.282561 4.000000 0.286541 0.285144
#Code
import pandas as pd df = pd.read_csv('test_colvar.dat', sep=' ', header=None) df1 = df.iloc[:,0:2] #this selects first and the second column df1 = df.iloc[:,1:3] #this selects second and the third column
#Output > df1 0 1 0 0.0 0.287274 1 1.0 0.296275 2 2.0 0.290138 3 3.0 0.299182 4 4.0 0.286541
# Task: Deleting specific column using pandas
Using the same dataset as the previous task.
# Code
import pandas as pd df = pd.read_csv('test_colvar.dat', sep=' ', header=None) df.drop(df.columns[[0, 1]], axis=1) # Removing the first and the second column
# Task: Convert a column to a space separated list
My dataset consists of a single column named tmp
893 896 899 902 905 908 911 914 917 920
We will use pandas to convert this column to space separated list
# Code import pandas as pd df = pd.read_csv('tmp', sep=' ', header=None) list = df[0].tolist() #this converts it to a comma seperated list " ".join([str(i) for i in list]) #this converts it to a space separated list
Output
'893 896 899 902 905 908 911 914 917 920'