Compound Cloud Collection Details

Download Structures
Original Suppliers

The Compound Cloud collection was comprised of commercially available compounds from various suppliers. Over 120,000 are in DMSO solution (2mM & 10mM), with 86,000 as solid stock.

  • Chembridge

  • Enamine

  • CSC-Chemdiv

  • Asinex

  • Maybridge

  • Specs


The Compound Cloud collection was originally part of MSD’s screening collection and had been selected by Organon/Schering-Plough/MSD medicinal chemists.


Compounds were selected using drug-like properties e.g.


  • Lipinski’s Rule of 5 compliance
  • polar surface area <120
  • avoiding structural features known to result in problems in biochemical assay &/or late candidate drug optimisation phase
  • focus on attractive chemotypes and on compounds found in recent drug discovery projects

Prior to purchase, quality analysis data provided by the six preferred suppliers was validated. All compounds were then bought in solid form, and quality criteria was mass confirmed with purity >80%.
Currently a proportion of compounds are available in DMSO solution but without solid stock. This was because the solid was either depleted or stored in Merck USA at the time of the Newhouse site closure in 2010. Nevertheless, you may find solid form to be available from one of the original suppliers.

Chemical Space

Compound Cloud was designed for diversity and to provide good medicinal chemistry starting points for drug discovery projects. Its diversity is demonstrated by containing 56.4k different Murcko Molecular framework scaffolds.

The collection has been analysed for drug-like and lead-likeness, using a model based on the same molecular properties as in a Drug Discovery Today review (Lusher et al, DDT 16:13/14 2011). The marketed drugs and leads they were derived from were presented on a principle component plot of chemical space. Blue diamonds represent the leads, and red diamonds the drugs they became:

The following diagrams show a plot of the Compound Cloud collection (green) when overlaid onto these lead-like (blue) and drug-like (red) chemical spaces. Although a diverse, the Compound Cloud collection fits well within these attractive chemical spaces:

Feature Distribution Summary
Preparation & Storage

All Compound Cloud solutions have been prepared in our facility. A few umoles of each compound was solubilised in Matrix 2D barcoded storage tubes with 100% HPLC-grade DMSO (Fisher UK Ref#10387791) under controlled atmosphere (all equipment operate in dehumidified enclosures) to preserve the integrity and purity of the compounds.


All solutions were prepared after 2005 when our facility was first operational. Following dissolution in DMSO, aliquots of these solutions were transferred into REMP microtubes at various volumes and concentrations. Each of these microtubes were individually sealed and stored in our REMP store at -20°C under dry conditions (<0.4g/kg water).


The REMP microtube technology avoids freeze/thaw cycles associated with other systems and therefore increases the chance to preserve the structural integrity of the compounds. All cherry-picking occurs within the -20°C store, so tubes remain sealed and frozen, until the point you receive them on dry ice.


LC-MS analysis was performed at the dissolution stage for all solutions. The criteria for integration was mass confirmed and purity determined to be >80% (pharma industry standard). Solutions which failed these criteria were rejected.


The overall purity of these samples was:


  • Average purity: ~95% (94.61%)
  • Median: 98.5%

Modelling for Target Classes

The Compound Cloud collection was primarily designed for maximum diversity with good medicinal chemistry starting points. However, there are compounds more focused towards specific drug target classes.

We have performed some computational analysis (which we will share if of value), to identify groups of compounds with the potential to interact with some targets. We have not however pre-selected these sets as we understand users will want to access the collection in their own way.

We have analysed the collection through published computational models for kinase interacting-like, GPCR interacting potential, and allosteric interaction potential.

27% demonstrated some features associated with kinase interacting compounds; 15-24% of the collection demonstrated similarity to GPCR ligands; 6 – 10% of the collection were selected using an allosteric modulator model:



Download Structures


Download Structures


(Kinase, GPCR or other)

Download Structures

We used a Vertex Kinase “2-0” kinase likeness model (AM Aronov et al . J. Med. Chem. 2008, 51, 1214–1222). The model is based on a combination of structural fragment counts. In validation experiments, the rule provided approximately 5-fold enrichment in kinase active compounds.


27.5% of BioAscent compounds are classified as kinase-like according to this model. The model may be limited by the conservative nature of early kinase discovery, focusing more on features related to known hinge-binding motifs. The potential for novel modes of binding exists from within the remaining Compound Cloud compounds.


A representative subset of 35 GPCR ligands from a range of GPCR families were used to build a model. Various structural fingerprint similarity techniques to the GPCR ligands were used to search against the Compound Cloud collection.


15 -24% of the collection were found to be similar (Tanimoto similarity >0.67 to 0.64) to members of GPCR reference drugs. The potential for novel GPCR actives exists in the remaining compounds.

Allosteric Interaction Potential

The Van Westen definition and curation of Allosteric vs non-Allosteric compounds was take from ChEMBL14 (Van Westen et al. PLOS Computational Biology, 2014, 10; (4)). This created a balanced data set of 18k allosteric, and 18k non-allosteric compounds. The model building involved setting and testing multiple parameters (89 van Westen descriptors ), then statistical testing (using various measures for accuracy, specificity, robustness) of model performance. The model was built using 70% of the available raw data, then tested on the remaining 30% of selected test compounds.


The tested model was then applied to the Compound Cloud collection. The results are ranked according to a confidence measurement to select subsets with higher allosteric interaction probability:


  • 0.75  –  13.3k predicted allosteric compounds
  • 0.80  –  7.9k predicted allosteric compounds