At the core of my project is an attempt to predict the activity of compounds from their chemical structure. The compounds in which I am interested are HIV inhibitors, and I am using a genetic algorithm (GA) for the task. Prediction of the activity of these compounds has in the past been done using multiple linear regression models and my hope is that a GA approach will provide more accurate predictions.
The compounds in question are 1-[(2-hydroxyethoxy)-methyl]-6-(phenylthio)thymine (HEPT) and its derivatives, discovered by Tanaka and co-workers.
The electronic and molecular parameters used to describe each molecule are atomic charges on 8 atoms (carbon, hydrogen, oxygen and nitrogen), hydration enthalpy and molar refractivity. For each compound the corresponding experimental activity is known.
Compounds with related values for atomic charges, hydration enthalpy and molar refractivity should exhibit a similar activity. It is on this basis the GA works, evolving an interpretive model from the data set. This model can then be used to predict the activity of a compound from the values of electronic and molecular parameters. The preliminary results below show the predicted activity (from the model) plotted against the experimental activity.

These results are from a model left to evolve for 1000 generations (which takes a few minutes on a PII 450). The GA contains a number of parameters which influence the quality of the model produced. These parameters are still being modified in order to optimise the program.