GAINS package

gains.engine module

gains.engine.get_best(get_fitness, optimalFitness, geneSet, display, show_ion, target, parent_candidates)[source]

the primary public function of the engine

Parameters:
  • get_fitness (function) – the fitness function. Usually based on a molecular property. An example can be found in the salt_generator module
  • optimalFitness (float) – 0-1 the user specifies how close the engine should get to the target (1 = exact)
  • geneSet (object) – consists of atomtypes (by periodic number), rdkit molecular fragments and custom fragments (that are currently hard coded into the engine). These are the building blocks that the engine can use to mutate the molecular candidate via the _mutate() function
  • display (function) – for printing results to the screen. Display is called for every accepted mutation
  • show_ion (function) – for printing results to the screen. show_ion is called when a candidate has achieved the desired fitness score and is returned by the engine
  • target (array, float, or int) – the desired property value to be achieved by the engine. If an array, a model containing multi-output targets must be supplied to the engine
  • parent_candidates (array) – an array of smiles strings that the engine uses to choose a starting atomic configuration
Returns:

child – the accepted molecular configuration. See Chromosome class for details

Return type:

Chromosome object

gains.engine.molecular_similarity(best, parent_candidates, all=False)[source]

returns a similarity score (0-1) of best with the closest molecular relative in parent_candidates

Parameters:
  • best (object) – Chromosome object, the current mutated candidate
  • parent_candidates (array) – parent pool of molecules to compare with best. These are represented by SMILES
  • all (boolean, optional, default = False) – default behavior is false and the tanimoto similarity score is returned. If True tanimoto, dice, cosine, sokal, kulczynski, and mcconnaughey similarities are returned
Returns:

  • similarity_score (float)
  • similarity_index (int) – if all=False the best tanimoto similarity score as well as the index of the closest molecular relative are returned if all=True an array of best scores and indeces of the closest molecular relative are returned

class gains.engine.suppress_rdkit_sanity[source]

Bases: object

Context manager for doing a “deep suppression” of stdout and stderr during certain calls to RDKit.

gains.engine.generate_geneset()[source]

Populates the GeneSet class with atoms and fragments to be used by the engine. As it stands these are hardcoded into the engine but will probably be adapted in future versions

Parameters:None
Returns:GeneSet – returns an instance of the GeneSet class containing atoms, rdkit fragments, and custom fragments
Return type:object
gains.engine.load_data(data_file_name, pickleFile=False, simpleList=False)[source]

Loads data from module_path/data/data_file_name.

Parameters:
  • data_file_name (string) – name of csv file to be loaded from module_path/data/ data_file_name.
  • pickleFile (boolean, optional, default = False) – if True opens pickled file
  • simpleList (boolean, optional, default = False) – if true will open the saved list and properly handle split lines
Returns:

data

Return type:

Pandas DataFrame

class gains.engine.Chromosome(genes, fitness)[source]

Bases: rdkit.Chem.rdchem.Mol

The main object handled by the engine. The Chromosome object inherits the RWMol and Mol attributes from rdkit. Two additional attributes are added: genes and fitness. Genes is the SMILES encoding of the molecule, fitness is the score (0-1) returned by the fitness function

class gains.engine.GeneSet(atoms, rdkitFrags, customFrags)[source]

Bases: object

Consists of atomtypes (by periodic number), rdkit molecular fragments and custom fragments (that are currently hard coded into the engine). These are the building blocks that the engine can use to mutate the molecular candidate via the _mutate() function

class gains.engine.Benchmark[source]

Bases: object

benchmark method used by the unittests

static run()[source]

gains.salt_generator module

gains.salt_generator.generate_solvent(target, model_ID, heavy_atom_limit=50, sim_bounds=[0.4, 1.0], hits=1, write_file=False)[source]

the primary public function of the salt_generator module

Parameters:
  • target (array, float, or int) – the desired property value to be achieved by the engine, if an array, a multi-output model must be supplied to the engine
  • model_ID (str) – the name of the model to be used by the engine. Gains has several built-in models to choose from
  • heavy_atom_limit (int, optional) – the upper value for allowable heavy atoms in the returned candidate
  • sim_bounds (array, optional) – the tanimoto similarity score between the returned candidate and its closest molecular relative in parent_candidates
  • hits (int, optional) – the number of desired solutions
  • write_file (boolean, optional) – defaults to False. if True will return the solutions and a csv log file
Returns:

new – default behavior is to return a pandas DataFrame. This is a log file of the solution(s). if write_file = True the function will also return pdb files of the solutions

Return type:

object