Mapping Protein Sequence Space: a High Performance workflow to compute the world's first protein sequence and structure map

description:The overall aim of this project is to systematically chart the vast space of all possible protein sequences of certain lengths and several structural properties, creating valuable roadmaps guiding any protein design or engineering effort, and research into protein sequences and their function. Creating maps of sequence space is a technically straightforward, though data-intensive, three-step process: (1) Generating the appropriate pseudo-random sequences that systematically transect sequence space using the perfect sampling method; (2) Predicting the structural properties associated with the sequences using various algorithms; (3) Analyzing these data for relevant trends. The process will be implemented as a VLAM workflow that dynamically submits appropriate numbers of parallel jobs to the grid. The process will be performed for sequences of 12, 40, 80, 160, or 320 amino acid residues. For a specific use case in phage display experiments, all possible 1.28 billion peptides of 7 amino acid positions will be enumerated and stored in the Sequenome database.
applicant:Marco Roos, LUMC Klinische genetica
results:
status:ongoing
team:Marco Roos, eBioGrid support team
type:This is a dedicated project.

Loading feed..

Subscribe to our newsletter here