v2.0



  • Raw phylogenetic profiles [ 9 MB ]
  • Phylogenetic profile linkage pairs (with strengths) [ 34 MB ]
  • Phylogenetic profile clusters [ 132 MB ]
  • Treeview files [ .atr file | .gtr file | .cdt file ]
  • Rosetta links fusion details [ 1.1 MB ]
  • Rosetta stone linkage pairs (with strengths) [ 542 KB ]
  • Rosetta links (reverse, show proteins fused in T. gondii) [ 13 MB ]
  • Rosetta linked clusters [ 1.7 MB ]

  • Note: Strength (confidence) of phylogenetic profile pairs increases away from zero, whereas the strength of Rosetta linked pairs increases towards zero.


  • Clusters of families: Proteins families as revealed by clustering (sequence similarity; single-linkage clustering, 30% seq. similarity over at least 30% of seq. length; 30/30 rule applied to both candidates; using blastclust from NCBI). [ 497 KB ]
    See also: example of an alignment file: Cluster 3, 37 KB


  • NOTES

    Notes are available here.



      Protein name (ID only, Eg: 80.m02289):   


    • Protein name (ID only, Eg: 80.m02289):

    • Threshold:  Greater than or equal to (>=, use for phylo)  |   Less than or equal to (use for Rosetta)  | 

      Value   

    • Output type: HTML    TXT   

    • Data sets:
      High conf. Rosetta links subset (lowest conf. score = 1e-5)  
      } (greater conf. as values approach zero):
      Complete Rosetta links set rosetta (lowest conf. score = 0.194)  
      High conf. PhyloProfile subset (lowest conf. score = 0.863)  
      } (lower conf. as values approach zero):
      Complete PhyloProfile set (lowest conf. score = 0.462)




      The script will fetch interaction partners for every protein that is annotated with the chosen GO category (for instance, the default GO cateogry GO:0030604 contains only one protein - 33.m01313, and therefore interaction partners for only 33.m01313 will only be shown).



      High conf. Rosetta links subset      Complete Rosetta links set rosetta
      High conf. PhyloProfile subset      Complete PhyloProfile set [large data set, might take time]





      Input proteins as a list, one protein on each line. Max allowed = 50 proteins (lines). The program will check links if there are links between the proteins in the list (all vs. all) from the Rosetta and phylogenetic profile sets.

      Example:
      80.m02289
      44.m02654
      59.m03482






    Use these features to visualize patterns of conservation for a given gene over 196 completely sequenced genomes.
    Protein list



    Zoom level:   1x   2x   3x   4x (default)

    Display gene ID:   On (default)   Off

    Orientation:   Horizontal (default)   Vertical (cell spacing ignored)

    Space between cells:  

    Display genome order based on:
    NCBI Taxonomy (default; see)
    User-defined

    For user-defined taxonomic list, use the text box below to paste in your genome order using this NCBI list. Maintain numbers, genomes, commas','. For instance, if you want to put E. coli first, your list will have 60|Escherichia coli K12, at the top.

      

    Confidence bins: (currently not available)
    Users can define their own confidence bins (and the associated color), or use the default. Custom bins can be assigned as follows:
    Start at the lowest, and end at the highest

    Note that it is up to the user to structure the bins logically, the program cannot determine this. Follow this chart for choosing bin color (the chart contains HTML RGB codes for individual colors). If you choose to define your own bins the 'zoom' option will be ignored.

    Use default
    User defined   

       


    Note: The time required for the scrip to draw the image is proportional to the number of rows you select to display!
    Profiles have been modified to display only presence/absence. Occasionally, a gene might be absent from T. gondii itself, indicating that we could not recover any BLAST information for that gene.

    If you want to run this script on your computer (you should have perl installed), download the script and the reference directory: Download directory tarball. Open the script in a text editor before you run it (the params can be set inside the script). Remember, you are still bound by the agreements displayed on the ToxoDB website.

      Show hits in reference genome (T.gondii)
      Show hits in Eukaryotes
      Show hits in Bacteria
      Show hits in Archaea
      Row start:       Row end:  

         
    Reference image:





      This script fetches presence absence information only from 196 completely sequenced genomes, and not the entire NCBI NR database (the list of organisms is available here). If you don't select any group, you will get a list of proteins which did not match any protein, including themselves, when compared using BLAST)

      Present in Archaea [any]
      Present in Bacteria [any]
      Present in Eukaryotes [any]
      Present in Plants [any]
      Present in Mammals [any]
      Present in Apicomplexa (other than T. gondii) [any]
      Present in Toxoplasma gondii


      Show annotation along with IDs   

    References

    1. Date S.V. and Marcotte E.M. 2003. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotech. 21: 1055-1062

    2. Marcotte, E.M., Pellegrini, M., Ng, H-L., Rice, D.W., Yeates, T.O., and Eisenberg, D. 1999a. Detecting Protein Function & Protein-Protein Interactions from Genome Sequences. Science 285: 751-753

    3. Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T., and Eisenberg, D. 1999b. A Combined Algorithm for Genome-Wide Prediction of Protein Function. Nature 402: 83-86

    4. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. 1999. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96: 4285-4288

    Questions/comments? Contact: Shailesh Date ()