Difference between revisions of "CH391L/S14/CAD systems"

From SynBioCyc
Jump to: navigation, search
(iGEM Software Tools Development)
(Standardizing Representation of Synthetic Biology Parts)
 
(9 intermediate revisions by one user not shown)
Line 1: Line 1:
 
==Introduction==
 
==Introduction==
[http://en.wikipedia.org/wiki/Computer-aided_design Computer-Aided Design] (CAD) tools are software packages which are created to help in designing and engineering new systems. In traditional engineering fields, these programs have long been used to aid in optimizing production processes, modeling chemical reactions, and creating new products. Graphical User Interfaces (GUIs) act as the human-readable visualization of computer languages which are designed to assemble components into useful products or devices. Many of these programs include capabilities for simulating the outcome of a given assembled device as well as automating the assembly with a specific goal in mind. The field of synthetic biology is advancing to the point where high throughput automated design of synthetic biological devices will be necessary to realize the potential of the discipline.
+
[http://en.wikipedia.org/wiki/Computer-aided_design Computer-Aided Design] (CAD) tools are software packages which are created to help design and engineer new systems. In traditional engineering fields, these programs have long been used to aid in optimizing production processes, modeling chemical reactions, and creating new products. Graphical User Interfaces (GUIs) act as the human-readable visualization of computer languages which are designed to assemble components into useful products or devices. Many of these programs include capabilities for simulating the outcome of a given assembled device as well as automating the assembly with a specific goal in mind. CAD tools offer high throughput design and analysis of synthetic biological devices, making synthetic biology more accessible, cost-effective and powerful.[[Image:CAD_USER_PROCESS.jpg|250 px|right]]
[[Image:CAD_USER_PROCESS.jpg|250 px|right]]<br>
+
 
In general, the use of a CAD program in synthetic biology will involve the following steps:<br>
 
In general, the use of a CAD program in synthetic biology will involve the following steps:<br>
 
<br>
 
<br>
'''step 1''': user draws a biological system <br>
+
'''step 1''': User draws a biological system <br>
  
  
'''step 2''': user performs some analysis and repeats step 1 if didn't obtain what expected
+
'''step 2''': User performs some analysis and repeats step 1 if didn't obtain what expected
  
  
Line 53: Line 52:
 
'''Full Featured Tools'''
 
'''Full Featured Tools'''
  
There are a several larger packages which take all of these tools into consideration. From importing large sets of parts in spreadsheet format  (i.e. [http://clothocad.org Clotho]) to simulating the metabolite levels from a network containing synthetic devices (i.e. [http://www.tinkercell.com/ Tinker Cell]<cite>TinkerCell2009</cite>), these integrated packages aim to provide the entire toolbox of CAD capabilities to synthetic biologists. In addition to these full featured packages, some programs are designed solely for the purpose of modeling metabolic networks (i.e. [http://synbioss.sourceforge.net/ SynBioSS]<cite>SynBioSS2010</cite>).  
+
There are a several stand-alone CAD applications which combine various of these tools into one single package. From importing large sets of parts in spreadsheet format  (i.e. [http://clothocad.org Clotho]) to simulating the metabolite levels from a network containing synthetic devices (i.e. [http://www.tinkercell.com/ Tinker Cell]<cite>TinkerCell2009</cite>), these integrated packages aim to provide the entire toolbox of CAD capabilities to synthetic biologists. In addition to these full featured packages, some programs are designed solely for the purpose of modeling metabolic networks (i.e. [http://synbioss.sourceforge.net/ SynBioSS]<cite>SynBioSS2010</cite>).  
  
 
[http://j5.jbei.org/index.php/Main_Page j5] is a web-based tool that has multiple design features. It features automated assembly of scar-free devices from multiple biological parts. j5 can perform a variety of assembly protocols, including [http://en.wikipedia.org/wiki/Gibson_assembly Gibson], Golden Gate, and circular polymerase extension cloning (CPEC). j5 also showcases engineering-related features such as cost optimization, enforcing design specification rules, and automated construction of combinatorial libraries.<cite>j52011</cite>
 
[http://j5.jbei.org/index.php/Main_Page j5] is a web-based tool that has multiple design features. It features automated assembly of scar-free devices from multiple biological parts. j5 can perform a variety of assembly protocols, including [http://en.wikipedia.org/wiki/Gibson_assembly Gibson], Golden Gate, and circular polymerase extension cloning (CPEC). j5 also showcases engineering-related features such as cost optimization, enforcing design specification rules, and automated construction of combinatorial libraries.<cite>j52011</cite>
Line 61: Line 60:
 
[http://www.genocad.org/ GenoCAD] has designed framework that can automatically manage the constraints associated with the different standards, this will help the community better leverage ongoing standardization efforts. It uses context-free grammar (CFG) <cite>CFG</cite> to model the structure of genetic constructs making it possible for users to quickly assemble from a rich library of genetic parts, constructs compliant with any of six BioBrick assembly standards <cite>Cai2010</cite>. GenoCAD's design strategy of synthetic genetic constructs in the form of grammatical models allows two different ways in which it can be used: a user can design a synthetic construct by successively selecting design rules to transform the structure of the design; or a user can upload a DNA sequence designed outside GenoCAD to validate its consistency with the grammatical model.
 
[http://www.genocad.org/ GenoCAD] has designed framework that can automatically manage the constraints associated with the different standards, this will help the community better leverage ongoing standardization efforts. It uses context-free grammar (CFG) <cite>CFG</cite> to model the structure of genetic constructs making it possible for users to quickly assemble from a rich library of genetic parts, constructs compliant with any of six BioBrick assembly standards <cite>Cai2010</cite>. GenoCAD's design strategy of synthetic genetic constructs in the form of grammatical models allows two different ways in which it can be used: a user can design a synthetic construct by successively selecting design rules to transform the structure of the design; or a user can upload a DNA sequence designed outside GenoCAD to validate its consistency with the grammatical model.
  
==Standardizing Representation of Synthetic Biology Designs ==
+
==Standardizing Representation of Synthetic Biology Parts ==
 
[[Image:TinkerCellSBOLRepresentation.png|thumb|350 px|right|TinkerCell representation of parts in a lactose-inducible GFP part]]
 
[[Image:TinkerCellSBOLRepresentation.png|thumb|350 px|right|TinkerCell representation of parts in a lactose-inducible GFP part]]
  
The [http://www.sbolstandard.org Synthetic Biology Open Language] is an open-source standard for representing designs consisting of both DNA sequence information and higher level annotation of parts with defined roles and behaviors <cite>Galdzicki2011</cite>. The core specification of this system has been developed as an RFC <cite>SBOLRFC</cite>. Several different [http://www.sbolstandard.org/sbolstandard/software-tools-using-sbol synthetic biology CAD software programs] use this format. Representation at this higher level of parts can be visualized and simulated in some of these systems (e.g., [http://www.tinkercell.com/ TinkerCell]).
+
If images representing biological parts are not formalized and every CAD software developer creates their own symbols and representations, this would generate much confusion and increase the CAD learning curve for the synthetic biology community.  Standard biological representations of parts is critical for the advancement of synthetic biology.  The [http://www.sbolstandard.org Synthetic Biology Open Language] is an open-source standard for representing designs consisting of both DNA sequence information and higher level annotation of parts with defined roles and behaviors <cite>Galdzicki2011</cite>. The core specification of this system has been developed as an RFC <cite>SBOLRFC</cite>. Several different [http://www.sbolstandard.org/sbolstandard/software-tools-using-sbol synthetic biology CAD software programs] use this format. Representation at this higher level of parts can be visualized and simulated in some of these systems (e.g., [http://www.tinkercell.com/ TinkerCell]).
  
 
The [http://www.eugenecad.org/ Eugene Language]<cite>Eugene2011</cite> is an open-source human-readable language designed to facilitate automatic creation of new devices from a collection of parts. Eugene includes a standardized format for specifying devices and parts as well as constraints on how they can be assembled into higher level devices (i.e. genetic toggle switch). Eugene also features functions for automatic generation of functional assemblies into complex devices. Eugene does not support visualization of constructs.
 
The [http://www.eugenecad.org/ Eugene Language]<cite>Eugene2011</cite> is an open-source human-readable language designed to facilitate automatic creation of new devices from a collection of parts. Eugene includes a standardized format for specifying devices and parts as well as constraints on how they can be assembled into higher level devices (i.e. genetic toggle switch). Eugene also features functions for automatic generation of functional assemblies into complex devices. Eugene does not support visualization of constructs.

Latest revision as of 19:41, 3 February 2014

Contents

Introduction

Computer-Aided Design (CAD) tools are software packages which are created to help design and engineer new systems. In traditional engineering fields, these programs have long been used to aid in optimizing production processes, modeling chemical reactions, and creating new products. Graphical User Interfaces (GUIs) act as the human-readable visualization of computer languages which are designed to assemble components into useful products or devices. Many of these programs include capabilities for simulating the outcome of a given assembled device as well as automating the assembly with a specific goal in mind. CAD tools offer high throughput design and analysis of synthetic biological devices, making synthetic biology more accessible, cost-effective and powerful.
CAD USER PROCESS.jpg

In general, the use of a CAD program in synthetic biology will involve the following steps:

step 1: User draws a biological system


step 2: User performs some analysis and repeats step 1 if didn't obtain what expected


Analysis in step#2 can include: mathematical analysis of non-linear systems kinetic and chemical analysis stochastic simulations, structural analysis, and methods from systems biology prediction of evolutionary trajectories for directed evolution database look-up to find suitable components.

Synthetic Biology CAD Tools

Vector Editor representation of an annotated plasmid sequence
Screen grab from TinkerCell software. A genetic NOR gate is pictured, with the accompanying basic model summary, plot and parameter input forms

Synthetic Biology CAD tools are programs which help to create novel biological constructs. At the most basic, these programs are essentially enhanced DNA editors which provide a user interface to facilitate easier manipulation of the basic “parts” which comprise biological devices. Some of the more advanced programs have a variety of functions including visualization, asserting validity of constructs, and simulations of metabolic networks. In general, CAD programs for synthetic biology should comply with SBOL (Synthetic Biology Open Language) to facilitate use with the Parts Registry and sharing of parts with other researchers.


Basic Design and Alignment Tools

In the majority of CAD programs for biology, the basic program is a GUI for editing and annotating DNA sequences. The interface often provides a way to edit the sequence for parts and devices, in addition to annotating various regions of the DNA. Most programs have, at the very least, a sequence/part editor which will output the information according to various standards for exchanging biological parts, i.e. SBOL. Many also contain visualization features which show the parts assembled into a vector or plasmid in a compact way, as in VectorEditor[1] or Ape. Others also include improved design features such as codon optimization (i.e. Gene Designer 2.0[2]). More advanced transcription/translation optimizer software is also available commercially (GeneOptimizer), and includes considerations such as mRNA secondary structure and GC content in choosing the most productive device design.

Many of the basic DNA editors also allow for the design of primers for traditional cloning. In light of more recent advances in large-scale cloning techniques, some newer programs such as Gibthon provide automated design of primers for Gibson cloning and other new cloning strategies.

BLAST (Basic Local Alignment Search Tool) is a web-based utility which aligns genetic sequences to a reference sequence. This tool is a basic requirement for almost all synthetic biology research, as it is used to verify that the sequencing results of a given part or device match the expected composition for the design. Moreover, BLAST is the tool of choice to detect particular similar/homologous or identical sequences (including non-continuous sequences) within a user defined set of genome sequences from publicly available nucleotide and protein data banks.

Other more advanced alignment programs (such as Chromas or Geneious) will align multiple sequences directly from the trace files which show signal intensity output from sequencing software. The program Geneious is particularly useful in generating a complete and organized view of the genome of choice. From annotating genome sequences, to keeping track of everything related (primer design, genetic modifications, sequence analysis, etc.) to a particular genetic engineering project, these sort of multi-purpose stand-alone software tools are becoming very popular among the synthetic biology community.


Assembly Tools

A complex part (genetic toggle switch) comprised of simple parts (promoters, repressors, reporter) which can assembled and validated in Eugene

Several of the more advanced CAD programs provide features which aid in the assembly of simple biological parts into more complex features and devices. In some cases, the framework provides a way to compile various simple parts into more complex features with error checking to validate the composition of a component. For example, the complex device at right (genetic toggle switch[3]), which is composed of several simple parts (i.e. promoter), can be error-checked using the Eugene Language[4], which strictly defines synthetic biology devices, part types, parts and properties, to validate a functional composition. More advanced algorithms automate the assembly of components by checking the entire set of permutations containing a given group of parts for valid constructs, returning only those designs which are likely to be functional for the desired task. There are also downloadable tools such as Genome Compiler or Gene Composer and web-based tools such as DNAWorks [5] or GeneDesign [6] which are designed to facilitate the assembly of much larger devices from simple and complex parts.

Database Tools

Several software programs are designed for maintaining records of BioBricks or other synthetic constructs. These programs are primarily focused on providing accessibility to collections of parts which are available. One example is the Joint BioEnergy Institute's JBEI GD-ICE program, which is a web-based tool for creating and maintaining a "Inventory of Composable Elements" for a lab group. The tool is primarily designed for creating private databases within a smaller group of researchers, but JBEI also maintains a public database of parts. Clotho also has built-in capability for maintaining a local database of biological parts within a lab group or institution.

Addgene is non-profit plasmid repository that features a free online cloning vector analysis tool. Their mission is to maintain a high-quality library of published plasmids for use in research and discovery, and for preservation and distribution [7]. Their platform conveniently links plasmids with their corresponding research articles. The BioBricks Foundation is presently partnering Addgene to distribute plasmids that have been contributed under the BioBrick™ Public Agreement.

Pathway prediction/construction tools

FMM can reconstruct metabolic pathways form one metabolite to the other one, thus this tool provides essential support for synthetic biologist. This user-friendly freely available web service works by combining KEGG (metabolic pathway database) maps and KEGG LIGAND information to form combined pathway maps, identifying the corresponding genes and organisms, giving an output in which different pathways can be compared. Although it is limited to characterized pathways in the KEGG framework, it can provide a convenient starting point for many investigations. A more advanced method, BNICE [8], predicts novel pathways on the basis of somewhat broader reaction rules of the Enzyme Comission classification system. Because BNICE is not restricted to entries from a specific database, it can also predict unknown pathways that are potentially chemically feasible. Another prediction system based on enzymatic reactions, DESHARSKY [9], uses the choice of host organism as starting point for pathway prediction. Its algorithm searches for all possible pathways that connect the metabolic network of the organism to a target compound, after which the thermodynamic favourability and the energy loss in transcription and translation are calculated. A comprehensive review of these and various other pathway prediction tools has been published recently [10].

Full Featured Tools

There are a several stand-alone CAD applications which combine various of these tools into one single package. From importing large sets of parts in spreadsheet format (i.e. Clotho) to simulating the metabolite levels from a network containing synthetic devices (i.e. Tinker Cell[11]), these integrated packages aim to provide the entire toolbox of CAD capabilities to synthetic biologists. In addition to these full featured packages, some programs are designed solely for the purpose of modeling metabolic networks (i.e. SynBioSS[12]).

j5 is a web-based tool that has multiple design features. It features automated assembly of scar-free devices from multiple biological parts. j5 can perform a variety of assembly protocols, including Gibson, Golden Gate, and circular polymerase extension cloning (CPEC). j5 also showcases engineering-related features such as cost optimization, enforcing design specification rules, and automated construction of combinatorial libraries.[13]

SnapGene Viewer is a software that allows to create, browse, edit and share richly annotated DNA sequence files up to 1 Gb in length. Sequence data may be directly entered, or imported from record from GenBank, or opening an annotated sequence stored in one of many common file formats. It has built-in automatic annotation of common features, such as identification of open reading frame (ORI) with a single mouse click.

GenoCAD has designed framework that can automatically manage the constraints associated with the different standards, this will help the community better leverage ongoing standardization efforts. It uses context-free grammar (CFG) [14] to model the structure of genetic constructs making it possible for users to quickly assemble from a rich library of genetic parts, constructs compliant with any of six BioBrick assembly standards [15]. GenoCAD's design strategy of synthetic genetic constructs in the form of grammatical models allows two different ways in which it can be used: a user can design a synthetic construct by successively selecting design rules to transform the structure of the design; or a user can upload a DNA sequence designed outside GenoCAD to validate its consistency with the grammatical model.

Standardizing Representation of Synthetic Biology Parts

TinkerCell representation of parts in a lactose-inducible GFP part

If images representing biological parts are not formalized and every CAD software developer creates their own symbols and representations, this would generate much confusion and increase the CAD learning curve for the synthetic biology community. Standard biological representations of parts is critical for the advancement of synthetic biology. The Synthetic Biology Open Language is an open-source standard for representing designs consisting of both DNA sequence information and higher level annotation of parts with defined roles and behaviors [16]. The core specification of this system has been developed as an RFC [17]. Several different synthetic biology CAD software programs use this format. Representation at this higher level of parts can be visualized and simulated in some of these systems (e.g., TinkerCell).

The Eugene Language[4] is an open-source human-readable language designed to facilitate automatic creation of new devices from a collection of parts. Eugene includes a standardized format for specifying devices and parts as well as constraints on how they can be assembled into higher level devices (i.e. genetic toggle switch). Eugene also features functions for automatic generation of functional assemblies into complex devices. Eugene does not support visualization of constructs.

iGEM Software Tools Development

The iGEM competition for development of software tools is designed to promote creation of publicly available CAD programs for synthetic biology. Similar to the Registry for Standard Biological Parts], the software tools entered into the competition must adhere to certain standards of interoperability and data format in order to facilitate reuse and ease of collaboration among researchers. There are several categories developers can pursue, including specific modular CAD frameworks (i.e. Clotho) as well as sharing data and interfacing with the Parts Registry. iGEM hosts a repository of these open source software packages from past competitions, which is freely available.

One exciting tool is the MoClo Planner, a multi-touch interface for supporting the design of complex and useful biological constructs. It draws information from the MIT Registry of Biological Parts, PubMed, and the iGEM archive. Its design implements Golden Gate Modular Cloning (MoClo) [18], a novel laboratory method that allows the efficient creation of multi-gene constructs from a library of biological parts. Using this method, biological parts are permuted and joined together in a tiered fashion to create new synthetic biology constructs. The MoClo method includes: browsing a library over 2200 biological parts; selecting biological parts based on their function, genetic sequence, and other biological characteristics; computing possible permutations of parts in predefined arrangements; and designing primers and fusion recognition sites.

Future Directions

Although there is a vast collection of useful synthetic biology CAD programs, there is a pressing need for improved standardization and modularity. This includes finding consensus for defining individual components or parts, and the implementation of restrictions intended to simplify the process of building synthetic networks while making these more robust and interchangeable. An existing standard is the standard assembly [19], which has made DNA assembly simpler. In the future, it is anticipated that standards will also exist for describing the dynamics of a part; for example, standard promoter parts might contain a "strength" value, describing its efficiency in recruiting RNA polymerase under some standard environmental condition [20]. Standardization is also important in naming such future values as well as parts to always maintain a computer-readable format such as the Resource Definition Language [21] [22].

The current state of understanding for how DNA parts come together to make a functional biological device is lacking. Advances are coming swiftly with the advent of high-throughput technologies, but Computer Aided Design programs have yet to catch up. Specifically, it is not fully understood how a part changes its function when placed in different devices, so it has proven difficult to create a fully functional, complete language for combining parts efficiently while maintaining their expected functionality. Whereas we are currently capable of modeling metabolic networks to study the effects of a single step in the pathway of synthesis of a relevant material (i.e. biofuel), one can envision a time in the future where the software tools will advance to the point of being able to create de novo networks for the synthesis of completely new products (i.e. non-protein/nucleic acid polymers) within the context of a cell. In the coming years, synthetic biology CAD programs will be able to facilitate the rapid advancement of completely new engineered biological devices [10].

References

Error fetching PMID 21390321:
Error fetching PMID 19874625:
Error fetching PMID 16756672:
Error fetching PMID 20639523:
Error fetching PMID 22718978:
Error fetching PMID 21559524:
Error fetching PMID 12000848:
Error fetching PMID 10659857:
Error fetching PMID 16481661:
Error fetching PMID 19298678:
Error fetching PMID 18410688:
Error fetching PMID 21364738:
Error fetching PMID 17804435:
Error fetching PMID 20167639:
Error fetching PMID 22266781:
Error fetching PMID 18776195:
  1. Error fetching PMID 22718978: [VectorEditor2012]
    Design, implementation and practice of JBEI-ICE: an open source biological part registry platform and tools.
  2. Error fetching PMID 16756672: [GeneDesigner2006]
    Gene Designer:a synthetic biology tool for constructing artificial DNA segments
  3. Error fetching PMID 10659857: [Togglepaper2000]
    Construction of a genetic toggle switch in Escherichia coli
  4. Error fetching PMID 21559524: [Eugene2011]
    Eugene--a domain specific language for specifying and constraining synthetic biological parts, devices, and systems
  5. Error fetching PMID 16481661: [Genedesign2006]
    GeneDesign: rapid, automated design of multikilobase synthetic genes.
  6. doi:10.1038/505272a Nature 505, 272 (16 January 2014) [Addgene2014]
    Repositories share key research tools
  7. http://bioinformatics.oxfordjournals.org/content/21/8/1603 [Hatzimanikatis2005]
    Exploring the diversity of complex metabolic networks
  8. Error fetching PMID 18776195: [Rodrigo2008]
    DESHARKY: automatic design of metabolic pathways for optimal cell growth.
  9. Error fetching PMID 22266781: [Medema2012]
    Computational tools for the synthetic design of biochemical pathways.
  10. Error fetching PMID 19874625: [TinkerCell2009]
    TinkerCell: modular CAD tool for synthetic biology
  11. Error fetching PMID 20639523: [SynBioSS2010]
    SynBioSS designer: a web-based tool for the automated generation of kinetic models for synthetic biological constructs
  12. doi:10.1021/sb2000116 [j52011]
    j5 DNA Assembly Design Automation Software
  13. Error fetching PMID 17804435: [CFG]
    A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts.
  14. Error fetching PMID 20167639: [Cai2010]
    GenoCAD for iGEM: a grammatical approach to the design of standard-compliant constructs.
  15. Error fetching PMID 21390321: [Galdzicki2011]
    Standard biological parts knowledgebase
  16. http://dspace.mit.edu/handle/1721.1/66172 [SBOLRFC]
    Synthetic Biology Open Language (SBOL) Version 1.0.0
  17. Error fetching PMID 21364738: [MoClo]
    A modular cloning system for standardized assembly of multigene constructs.
  18. Error fetching PMID 18410688: [Shetty2008]
    Engineering BioBrick vectors from BioBrick parts.
  19. Error fetching PMID 19298678: [Kelly2009]
    Measuring the activity of BioBrick promoters using an in vivo reference standard.
  20. http://hdl.handle.net/1721.1/45537 [Galdzicki2009]
    Provisional BioBrick Language(PoBoL)
  21. http://openwetware.org/wiki/The_BioBricks_Foundation:Standards/Technical/Exchange [standards]
    Synthetic Biology Open Language (SBOL)
  22. Error fetching PMID 12000848: [DNAWorks2002]
    DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis
All Medline abstracts: PubMed | HubMed