Skip to content

cyclica/cyclicaDrugClustDemo

 
 

Repository files navigation

What's the big picture?

  • Some problems are so difficult that no one researcher, research group, research institute, or multi-national company can make meaningful progress. It takes a world wide effort and collaborations between industry and academia. Drug discovery is one such area. Advances in data science (aka artificial intelligence, machine learning) are being applied to data sets from high-throughput experimental techniques and historical databases of biomedical literature, publicly available to the world community.

  • The process of small molecule drug development involves the gradual reduction of tens of thousands of small molecules to a drug candidate that eventually is given to patients in clinical trials. This is a long (decades, often the whole career of a researcher), costly process and engages all corners of our interconnected economy (scientists, physicians, doctors, entrepreneurs, investors, pharmaceutical companies, government officials). These real world constraints pressure research questions to shy away from too much risk and leave many diseases untreated. But computational methods that have become popularized within the past decade can help make data driven decisions earlier in the decision making process, so that drugs can be developed better, faster, and cheaper. At this workshop you will get hands on experience solving the types of problems that keep our researchers up at night.

The plan

  • Input data: a precomputed and relatively clean data set of ~1000 drugs-like molecules by ~100 chemical features
  • Goal: Your job is to categorize drug-like molecules into a smaller diverse and representative set. This is a real-world unsupervised multi-class classification problem encountered in a biotech startup. There is underlying structure in this data set and we have solved it one way and are curious to see how you solve it.
  • Hints: you will be given clues about the structure of the data at the event, but for now it's top secret! We have prepared Python code snippets (pandas, numpy, scikit-learn) for a solution using k-means clustering to move you along toward the goal within the time constraints of the event.
  • This jupyter notebook is here to help facilitate the workshop
from IPython.display import Image
Image("Screen Shot 2016-10-27 at 3.29.58 PM.png")

png

Technical remarks

  • If you don't have pandas, numpy, scikit-learn, matplotlib, etc installed then do so with

pip install pandas, numpy, scikit-learn, matplotlib

  • You can check which libraries you have installed with

pip freeze

Import data

import pandas as pd
inputfile = 'chemicalDataForStudents20161027-110104.csv'
df = pd.read_csv(inputfile, sep=',')
# take a peak at the data
print df.shape
print df.tail(3)
(1650, 111)
       LabuteASA  MaxAbsEStateIndex  MaxAbsPartialCharge  MaxEStateIndex  \
1514  164.703793          14.880596             0.496768       14.880596   
161   148.584776           5.798308             0.493601        5.798308   
1220  138.515751          12.549085             0.347020       12.549085   

      MaxPartialCharge  MinAbsEStateIndex  MinAbsPartialCharge  \
1514          0.350866           0.002799             0.350866   
161           0.215753           0.686138             0.215753   
1220          0.244674           0.081001             0.244674   

      MinEStateIndex  MinPartialCharge  MolLogP         ...          \
1514       -0.875036         -0.496768  3.48928         ...           
161         0.686138         -0.493601  4.33182         ...           
1220       -0.668981         -0.347020  1.25950         ...           

      fr_term_acetylene  fr_tetrazole  fr_thiazole  fr_thiophene  \
1514                  0             0            0             0   
161                   0             0            0             0   
1220                  0             0            0             0   

      fr_unbrch_alkane  fr_urea  \
1514                 0        0   
161                  5        0   
1220                 0        0   

                                                 smiles  \
1514  COc1ccc(cc1)c2ccc(c(c2)F)N\3C(=O)CS/C3=C(/C#N)...   
161                Cc1cc(on1)CCCCCCCOc2ccc(cc2)C3=NCCO3   
1220  CCCC[C@H](CN(C=O)O)C(=O)N[C@H](C(=O)N(C)C)C(C)...   

                                 codeName  y_pred  clusterSize_y_pred  
1514        JamesWatt-JeanBaptisteLamarck    1390                   1  
161         DanielBernoulli-CharlesDarwin    1391                   1  
1220  Empedocles-CharlesAugustindeCoulomb    1392                   1  

[3 rows x 111 columns]
# the chemicals can be represented by a string.
print df.head().smiles

# each compound has a codeName
# the codeNames are how we will can refer to them after the analysis (rather than by row number or smiles)
print df.head().codeName 
525           Cc1cc(nc(c1)N)COC[C@H](CN)OCc2cc(cc(n2)N)C
526         Cc1cc(nc(c1)N)COCC[C@@H](CN)OCc2cc(cc(n2)N)C
527    Cc1cc(nc(c1)N)COC[C@@H]([C@H](C)OCc2cc(cc(n2)N...
528          Cc1cc(nc(c1)N)COC[C@@H](CN)OCc2cc(cc(n2)N)C
415    CC(C)(C)NC(=O)[C@@H](c1ccccc1)NC(=O)N(C)Cc2ccc...
Name: smiles, dtype: object
525              JamesClerkMaxwell-ErnstMayr
526                      BillNye-FrankHornby
527            CharlesLyell-ErwinSchrodinger
528                Empedocles-GustavKirchoff
415    CharlesAugustindeCoulomb-FrancisCrick
Name: codeName, dtype: object

Cleaning the data

  • Real data is messy. Data sanitization involves
    • removing features or samples that didn't compute for all samples
    • removing outliers that you suspect are artefacts or that will wildnly bias the predictions that come from the data
    • The data provided has been filtered a bit, bit be warned that this is an important part of the process and can take a long time

Normalizing the data

  • The features need to be treated equally. Just because units change from grams to kilograms does not mean there is a 100x difference
  • There are various ways to standardize data. You may have read about standard scores (Z-statistic). In the end each feature should be centred around the same value and have the same max and min.
  • The way this is done should preserve the variation in each feature. So remember your numerical methods computer science class and beware of subtracting errors and the like.
# just get features from data, remove labels
df_un = df.drop(['codeName', 'smiles'], 1)

# normalize
import numpy as np
df_norm = (df_un - df_un.mean()) / (df_un.max() - df_un.min())
X = np.array(df_norm)

# X is basically scaled to be between 1 and zero in way that is robust to real word data
# you can uncomment this to check
# print 'mean', np.mean(X,0)
# print 'max', np.max(X,0)
# print 'min', np.min(X,0) 

K-means clustering

y_pred = KMeans(n_clusters=k, random_state=random_state).fit_predict(X)

  • It takes the normalized data and asigns cluster labels to it, such that there are k unique clusters.
  • Properties of k
    • k is integer, since clusters are countable
    • k is at least 1. This would be one big cluster
    • k is at most teh number of samples (the rows of X). This would treat every sample as its own cluster (a singleton)
# cluster by kmeans
from sklearn.cluster import KMeans
import random
random.seed(0)
k = int(random.uniform(1, len(X))) # set k without any prior knowledge... any number between 1 and the number of samples
print 'k', k
random_state = 0
y_pred = KMeans(n_clusters=k, random_state=random_state).fit_predict(X)
df['y_pred'] = y_pred # plot and analyze unnormalized data with labels
k 1393
  • Now that the clustering is done we can look at the sizes of the clsuters. The function

np.histogram

  • outputs two arrays, [the number of clusters of a given size], [the size of the clusters]
# look at cluster size
print np.histogram(df.groupby('y_pred').size(), bins = np.append(np.unique(df.groupby( ["y_pred"] ).size()), np.max(df.groupby( ["y_pred"] ).size())+1))
# add in cluster size to df
df = pd.merge(df, pd.DataFrame({'clusterSize_y_pred' : df.groupby( ["y_pred"] ).size()}).reset_index(), on='y_pred') 
print df.tail()
(array([1161,  208,   23,    1]), array([1, 2, 3, 4, 5]))
       LabuteASA  MaxAbsEStateIndex  MaxAbsPartialCharge  MaxEStateIndex  \
1645  147.806545          14.525346             0.378511       14.525346   
1646  149.812648          12.574862             0.477880       12.574862   
1647  181.229439          14.001653             0.460949       14.001653   
1648  203.683877          11.743598             0.438042       11.743598   
1649  124.973421          11.101663             0.507823       11.101663   

      MaxPartialCharge  MinAbsEStateIndex  MinAbsPartialCharge  \
1645          0.154401           0.031220             0.154401   
1646          0.330899           0.050841             0.330899   
1647          0.258894           0.164579             0.258894   
1648          0.233112           0.053390             0.233112   
1649          0.230804           0.042424             0.230804   

      MinEStateIndex  MinPartialCharge  MolLogP         ...          \
1645       -0.910887         -0.378511  2.73290         ...           
1646       -1.397395         -0.477880  1.01617         ...           
1647       -0.569145         -0.460949  1.87380         ...           
1648       -0.053390         -0.438042  4.59410         ...           
1649       -1.291383         -0.507823  1.83460         ...           

      fr_term_acetylene  fr_tetrazole  fr_thiazole  fr_thiophene  \
1645                  0             0            0             0   
1646                  0             0            0             0   
1647                  0             0            0             0   
1648                  0             0            0             0   
1649                  0             0            0             0   

      fr_unbrch_alkane  fr_urea  \
1645                 0        0   
1646                 0        0   
1647                 0        0   
1648                 0        0   
1649                 0        0   

                                                 smiles  \
1645  Cn1cc(cn1)[C@H]2C[C@H]3CSC(=N[C@]3(CO2)c4ccc(c...   
1646  [H]/N=C/1\NC(=O)[C@]2(S1)C=C(C[C@H]([C@@H]2NC(...   
1647  c1cc(oc1)c2nc3nc(nc(n3n2)N)NCCN4CCN(CC4)c5ccc(...   
1648  CCC(=O)Nc1cccc(c1)Oc2c3cc[nH]c3nc(n2)Nc4ccc(cc...   
1649     c1cc2c(cc1O)OC[C@]3([C@@H]2Oc4c3cc5c(c4)OCO5)O   

                           codeName  y_pred  clusterSize_y_pred  
1645  MichaelFaraday-GalileoGalilei    1317                   1  
1646          CarlBosch-RobertHooke     404                   1  
1647      FrancisGalton-Anaximander     402                   1  
1648  BenjaminThompson-KonradLorenz    1332                   1  
1649    RobertKoch-AndreMarieAmpere     268                   1  

[5 rows x 111 columns]
# get top clusters
topClusters=df[['y_pred', 'clusterSize_y_pred']].drop_duplicates().sort_values(by='clusterSize_y_pred', ascending=[0]).head()
print topClusters
      y_pred  clusterSize_y_pred
525      142                   4
1103     199                   3
341       57                   3
36      1233                   3
546      218                   3

Sanity check... 2d chemical structures

Submit you classes to Cyclica

  • Since we know the real clusters by another method we can compare your to ours
  • Output your final list of y_pred classes with the codeNames and smiles and we can go back and check if they are the same as our classes
  • The code below outputs a csv file. Details of how to submit will be given at the workshop
# sort data
df = df.sort_values(by=['clusterSize_y_pred', 'y_pred'], ascending=[0,1])

# output data
import time
timestr = time.strftime("%Y%m%d-%H%M%S")
initials='gw'
output = 'predictedClasses' + initials + timestr +'.csv'
df.to_csv(output, sep=',', index=False)
df.head(50)
LabuteASA MaxAbsEStateIndex MaxAbsPartialCharge MaxEStateIndex MaxPartialCharge MinAbsEStateIndex MinAbsPartialCharge MinEStateIndex MinPartialCharge MolLogP ... fr_term_acetylene fr_tetrazole fr_thiazole fr_thiophene fr_unbrch_alkane fr_urea smiles codeName y_pred clusterSize_y_pred
525 141.729151 5.759574 0.383683 5.759574 0.123477 0.226210 0.123477 -0.226210 -0.383683 1.31854 ... 0 0 0 0 0 0 Cc1cc(nc(c1)N)COC[C@H](CN)OCc2cc(cc(n2)N)C JamesClerkMaxwell-ErnstMayr 142 4
526 148.094093 5.819164 0.383683 5.819164 0.123477 0.098079 0.123477 -0.098079 -0.383683 1.70864 ... 0 0 0 0 1 0 Cc1cc(nc(c1)N)COCC[C@@H](CN)OCc2cc(cc(n2)N)C BillNye-FrankHornby 142 4
527 148.094093 6.124918 0.383683 6.124918 0.123477 0.181412 0.123477 -0.259896 -0.383683 1.70704 ... 0 0 0 0 0 0 Cc1cc(nc(c1)N)COC[C@@H]([C@H](C)OCc2cc(cc(n2)N... CharlesLyell-ErwinSchrodinger 142 4
528 141.729151 5.759574 0.383683 5.759574 0.123477 0.226210 0.123477 -0.226210 -0.383683 1.31854 ... 0 0 0 0 0 0 Cc1cc(nc(c1)N)COC[C@@H](CN)OCc2cc(cc(n2)N)C Empedocles-GustavKirchoff 142 4
415 185.862478 12.947534 0.477530 12.947534 0.339488 0.007434 0.339488 -1.177893 -0.477530 2.91090 ... 0 0 0 0 0 1 CC(C)(C)NC(=O)[C@@H](c1ccccc1)NC(=O)N(C)Cc2ccc... CharlesAugustindeCoulomb-FrancisCrick 6 3
416 185.862478 12.911895 0.477530 12.911895 0.339488 0.003046 0.339488 -1.172572 -0.477530 2.91250 ... 0 0 0 0 1 1 CCCCNC(=O)[C@H](c1ccccc1)NC(=O)N(C)Cc2ccc3c(c2... WolfgangErnstPauli-Lucretius 6 3
417 201.824746 13.056169 0.477530 13.056169 0.339488 0.016616 0.339488 -1.181843 -0.477530 3.31260 ... 0 0 0 0 0 1 CN(Cc1ccc2c(c1C(=O)O)OCO2)C(=O)N[C@@H](c3ccccc... ErwinSchrodinger-EvangelistaTorricelli 6 3
151 112.519202 12.044467 0.504068 12.044467 0.200850 0.003845 0.200850 -0.738647 -0.504068 2.57680 ... 0 0 0 0 0 0 c1ccc(cc1)C2=CC(=O)c3c(cc(c(c3O)O)O)O2 LinusPauling-IsaacNewton 36 3
152 123.997689 12.233506 0.507966 12.233506 0.203372 0.033681 0.203372 -0.472106 -0.507966 2.58540 ... 0 0 0 0 0 0 COc1c(cc2c(c1O)C(=O)C=C(O2)c3ccc(cc3)O)O CarlFriedrichGauss-BillNye 36 3
153 112.519202 12.350655 0.507822 12.350655 0.199995 0.009887 0.199995 -0.312312 -0.507822 2.57680 ... 0 0 0 0 0 0 c1cc(c(cc1C2=COc3cc(ccc3C2=O)O)O)O SigmundFreud-BenjaminFranklin 36 3
220 130.436067 14.155215 0.378494 14.155215 0.154894 0.034118 0.154894 -0.853547 -0.378494 3.16290 ... 0 0 0 0 0 0 C[C@]1(C[C@H](SC(=N1)N)c2cncnc2)c3ccc(cc3F)F LeonardodaVinci-JamesWatson 44 3
221 130.436067 14.314858 0.378512 14.314858 0.154285 0.254836 0.154285 -0.794143 -0.378512 3.08860 ... 0 0 0 0 0 0 C[C@]1(CCSC(=N1)N)c2cc(c(cc2F)F)c3cncnc3 JeanBaptisteLamarck-ThomasKuhn 44 3
222 136.691989 14.247607 0.378494 14.247607 0.154895 0.047960 0.154895 -0.867130 -0.378494 3.97774 ... 0 0 0 0 0 0 Cc1c(c(on1)C)[C@@H]2C[C@@](N=C(S2)N)(C)c3ccc(c... CarolusLinnaeus-FranzBoas 44 3
341 194.858937 12.370274 0.488253 12.370274 0.488253 0.250590 0.423170 -1.478263 -0.423170 2.46130 ... 0 0 0 0 0 0 B(c1ccccc1CN2CCN(CC2)C3=NC(=O)/C(=C/c4ccc(c(c4... Lucretius-Avicenna 57 3
342 194.858937 12.356106 0.487918 12.356106 0.487918 0.236423 0.423177 -1.443879 -0.423177 2.46130 ... 0 0 0 0 0 0 B(c1ccc(cc1)CN2CCN(CC2)C3=NC(=O)/C(=C/c4ccc(c(... LouisdeBroglie-HenryMoseley 57 3
343 194.858937 12.362383 0.487928 12.362383 0.487928 0.242705 0.423177 -1.460291 -0.423177 2.46130 ... 0 0 0 0 0 0 B(c1cccc(c1)CN2CCN(CC2)C3=NC(=O)/C(=C/c4ccc(c(... FranzBoas-HermannvonHelmholtz 57 3
105 148.269255 12.380685 0.312156 12.380685 0.236417 0.064117 0.236417 -3.479281 -0.312156 3.31770 ... 0 0 0 0 0 0 CCCN1c2ccc(cc2CCC1=O)NS(=O)(=O)Cc3ccccc3 MaxPlanck-JackHorner 62 3
106 154.634197 12.465613 0.312156 12.465613 0.236417 0.064672 0.236417 -3.492131 -0.312156 3.62612 ... 0 0 0 0 0 0 CCCN1c2ccc(cc2CCC1=O)NS(=O)(=O)Cc3ccc(cc3)C IsaacNewton-HeinrichHertz 62 3
107 141.904313 12.363117 0.315211 12.363117 0.236417 0.066713 0.236417 -3.481908 -0.315211 2.84592 ... 0 0 0 0 0 0 Cc1ccc(cc1)CS(=O)(=O)Nc2ccc3c(c2)CCC(=O)N3C Lucretius-LouisPasteur 62 3
693 217.369820 13.986122 0.443692 13.986122 0.407311 0.053205 0.407311 -4.004868 -0.443692 3.13200 ... 0 0 0 0 0 0 CC(C)[C@H]1Cc2cc(ccc2S(=O)(=O)N(C1)C[C@H]([C@H... FrancisCrick-AlbertEinstein 114 3
694 223.734762 14.081573 0.443692 14.081573 0.407311 0.047279 0.407311 -4.031627 -0.443692 3.52210 ... 0 0 0 0 0 0 CC(C)(C)[C@@H]1Cc2cc(ccc2S(=O)(=O)N(C1)C[C@H](... WilliamHarvey-AlessandroVolta 114 3
695 223.734762 14.081573 0.443692 14.081573 0.407311 0.047279 0.407311 -4.031627 -0.443692 3.52210 ... 0 0 0 0 0 0 CC(C)(C)[C@H]1Cc2cc(ccc2S(=O)(=O)N(C1)C[C@H]([... MarieCurie-AlexanderVonHumboldt 114 3
1000 161.901036 7.541884 0.485185 7.541884 0.150640 0.007064 0.150640 -0.263058 -0.485185 1.24734 ... 0 0 0 0 0 0 c1cc(cc(c1)O[C@H]2CO[C@H]3[C@@H]2OC[C@H]3Oc4cc... JamesWatt-JamesWatson 148 3
1001 161.901036 7.544344 0.485160 7.544344 0.150640 0.009842 0.150640 -0.273739 -0.485160 1.24734 ... 0 0 0 0 0 0 c1cc(cc(c1)O[C@@H]2CO[C@H]3[C@@H]2OC[C@@H]3Oc4... FriedrichAugustKekule-MichaelFaraday 148 3
1002 161.901036 7.439745 0.485185 7.439745 0.150639 0.019691 0.150639 -0.234382 -0.485185 1.24734 ... 0 0 0 0 0 0 c1cc(ccc1C(=N)N)O[C@H]2CO[C@H]3[C@@H]2OC[C@H]3... GalileoGalilei-HenryMoseley 148 3
11 190.631761 12.020077 0.312609 12.020077 0.258254 0.175468 0.258254 -0.175468 -0.312609 3.65440 ... 0 0 0 0 0 0 c1cc(ccc1CC2CCN(CC2)CCc3cnn(c3)c4c5c(ccn4)C(=O... WernerHeisenberg-BenjaminFranklin 171 3
12 184.266819 12.009169 0.312609 12.009169 0.258254 0.177748 0.258254 -0.177748 -0.312609 3.57930 ... 0 0 0 0 0 0 c1cc(ccc1C2CCN(CC2)CCc3cnn(c3)c4c5c(ccn4)C(=O)... ArthurEddington-PeterDebye 171 3
13 194.570085 12.020277 0.312609 12.020277 0.258254 0.185669 0.258254 -0.185669 -0.312609 4.23270 ... 0 0 0 0 0 0 c1cnc(c2c1C(=O)NC=N2)n3cc(cn3)CCN4CCC(CC4)c5cc... JagadishChandraBose-AlbertEinstein 171 3
217 138.697590 9.485220 0.507966 9.485220 0.115120 0.191250 0.115120 0.191250 -0.507966 4.92030 ... 0 0 0 0 0 0 c1ccc(c(c1)N=C(c2ccc(cc2)O)c3ccc(cc3)O)Cl CharlesAugustindeCoulomb-Anaximander 180 3
218 134.759266 9.506331 0.507966 9.506331 0.115120 0.217313 0.115120 0.217313 -0.507966 4.57532 ... 0 0 0 0 0 0 Cc1ccccc1N=C(c2ccc(cc2)O)c3ccc(cc3)O HenryMoseley-AmedeoAvogadro 180 3
219 128.394324 9.462216 0.507966 9.462216 0.115120 0.216220 0.115120 0.216220 -0.507966 4.26690 ... 0 0 0 0 0 0 c1ccc(cc1)N=C(c2ccc(cc2)O)c3ccc(cc3)O Euclid-Avicenna 180 3
1103 146.057666 10.279459 0.385467 10.279459 0.138992 0.341144 0.138992 -0.613237 -0.385467 4.00098 ... 0 0 0 0 0 0 C[C@H](c1nc2cnc3c(c2n1C4CCC(CC4)CCC#N)cc[nH]3)O Avicenna-JamesWatt 199 3
1104 133.327782 10.193187 0.385467 10.193187 0.138992 0.157389 0.138992 -0.634183 -0.385467 3.22078 ... 0 0 0 0 0 0 C[C@H](c1nc2cnc3c(c2n1C4CCC(CC4)C#N)cc[nH]3)O ArthurEddington-LinusPauling 199 3
1105 139.692724 10.240533 0.385467 10.240533 0.138992 0.314854 0.138992 -0.622015 -0.385467 3.61088 ... 0 0 0 0 0 0 C[C@H](c1nc2cnc3c(c2n1C4CCC(CC4)CC#N)cc[nH]3)O MarianoArtigas-RichardFeynman 199 3
546 148.927912 12.326558 0.393567 12.326558 0.236881 0.205146 0.236881 -1.125546 -0.393567 -1.72870 ... 0 0 0 0 0 0 CCCC(C(=O)N[C@@H]1[C@@H]([C@H](O[C@H]1n2cnc3c2... JamesClerkMaxwell-GottfriedLeibniz 218 3
547 148.927912 12.378086 0.393567 12.378086 0.237148 0.080400 0.237148 -1.134309 -0.393567 -1.87280 ... 0 0 0 0 0 0 CC(C)[C@@H](C(=O)N[C@@H]1[C@@H]([C@H](O[C@H]1n... AlbertEinstein-GottfriedLeibniz 218 3
548 155.292854 12.536974 0.393567 12.536974 0.237158 0.029474 0.237158 -1.134521 -0.393567 -1.48270 ... 0 0 0 0 0 0 CC[C@H](C)[C@@H](C(=O)N[C@@H]1[C@@H]([C@H](O[C... Anaximander-HenryCavendish 218 3
188 158.025073 11.988379 0.496768 11.988379 0.407501 0.019053 0.407501 -1.083643 -0.496768 2.51200 ... 0 0 0 0 0 0 CC(C)NC(=O)O[C@@H]1CC[C@@](c2c1nnn2Cc3ccc(cc3)... Avicenna-JohnDalton 219 3
189 158.025073 11.988379 0.496768 11.988379 0.407501 0.019053 0.407501 -1.083643 -0.496768 2.51200 ... 0 0 0 0 0 0 CC(C)NC(=O)O[C@@H]1CC[C@](c2c1nnn2Cc3ccc(cc3)O... LouisdeBroglie-FrancisGalton 219 3
190 158.025073 11.988379 0.496768 11.988379 0.407501 0.019053 0.407501 -1.083643 -0.496768 2.51200 ... 0 0 0 0 0 0 CC(C)NC(=O)O[C@H]1CC[C@@](c2c1nnn2Cc3ccc(cc3)O... AlexanderFleming-CarlSagan 219 3
831 137.462123 13.727927 0.352008 13.727927 0.323307 0.144989 0.323307 -0.291978 -0.352008 2.80530 ... 0 0 0 0 0 0 C[C@@H](CC(=O)NCc1ccc2c(c1)NC(=O)N2)c3ccccc3F LouisPasteur-ThomasKuhn 233 3
832 136.772520 13.606261 0.348248 13.606261 0.323307 0.268656 0.323307 -0.368969 -0.348248 2.71500 ... 0 0 0 0 0 0 C/C(=C\c1ccccc1F)/C(=O)NCc2ccc3c(c2)NC(=O)N3 WernerHeisenberg-RobertHooke 233 3
833 143.827065 13.804605 0.349565 13.804605 0.323307 0.147304 0.323307 -0.293785 -0.349565 3.36630 ... 0 0 0 0 0 0 C[C@@H](CC(=O)N[C@H](C)c1ccc2c(c1)NC(=O)N2)c3c... AageBohr-EmilFischer 233 3
143 135.810779 12.106708 0.477639 12.106708 0.346775 0.073073 0.346775 -3.898899 -0.477639 1.66550 ... 0 0 0 1 0 0 c1cc(ccc1CCNS(=O)(=O)c2ccsc2C(=O)O)C(=O)O JaneGoodall-AlexanderVonHumboldt 258 3
144 142.175721 12.116470 0.477639 12.116470 0.346775 0.150715 0.346775 -3.863136 -0.477639 2.05560 ... 0 0 0 1 1 0 c1cc(ccc1CCCNS(=O)(=O)c2ccsc2C(=O)O)C(=O)O EvangelistaTorricelli-ClaudiusPtolemy 258 3
145 129.445837 12.105157 0.477639 12.105157 0.346775 0.072284 0.346775 -3.954005 -0.477639 1.62300 ... 0 0 0 1 0 0 c1cc(ccc1CNS(=O)(=O)c2ccsc2C(=O)O)C(=O)O AlessandroVolta-JohannesKepler 258 3
173 145.727128 11.780729 0.550172 11.780729 0.311615 0.003531 0.311615 -1.085590 -0.550172 0.87400 ... 0 0 0 0 3 0 c1c(cc(c(c1[N+](=O)[O-])O)I)CC(=O)NCCCCCC(=O)[O-] JohnvonNeumann-ReneDescartes 276 3
174 126.465290 11.681878 0.502092 11.681878 0.310480 0.017613 0.310480 -0.835222 -0.502092 1.60410 ... 0 0 0 0 3 0 c1cc(c(cc1CC(=O)NCCCCCC(=O)O)[N+](=O)[O-])O CarlSagan-WillardGibbs 276 3
175 126.465290 11.671878 0.550172 11.671878 0.310480 0.005082 0.310480 -1.085222 -0.550172 0.26940 ... 0 0 0 0 3 0 c1cc(c(cc1CC(=O)NCCCCCC(=O)[O-])[N+](=O)[O-])O FlorenceNightingale-ErnstHaeckel 276 3
210 169.244099 12.730152 0.496758 12.730152 0.259824 0.043983 0.259824 -0.381021 -0.496758 3.42917 ... 0 0 0 0 0 0 [H]/N=C(\Cc1cccc(c1)OC)/NC(=O)c2ccc(cc2OC3CCNC... FrancisGalton-FrancescoRedi 318 3

50 rows × 111 columns

More ideas

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%