In order to reduce sequence redundancy, sequence assembly software was used to join together EST sequences representing the same gene into a single contig or consensus sequence. The unisequence set for each pathogen therefore represents a set of unique gene sequences, each one consisting of either a single EST (singleton) or a contig sequence made from a group of ESTs. For more information on the source of ESTs for each organism click on the links below.
Colletotrichum gloeosporioides f. sp. aeschynomene
To ascertain putative gene products for each unisequence, the NCBI database was queried for hits against the unisequences using the blastx algorithm (Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ : J Mol Biol 1990 Oct 5;215(3):403-10). The top hits (with e-values less than 1 x 10-5) for each unisequence have been entered into the database. On the basis of these hits a putative product / function was assigned to each unisequence. There is no experimental evidence for these assignments, so some of them are speculative. Based on these assignments, the unisequences were classified by function according to a scheme adapted from that used by MIPS (Munich Information Centre for Protein Sequences). Click here to view the functional classification scheme.
Database and Website Implementation
The relational database was implemented using MySQL version 3.23.28 for Solaris. Web-based database searching was implemented with CGI-Perl scripts (Perl 5.6.1) using Perl modules DBI and DBD-Mysql. To allow BLAST searching of the unisequence dataset, the BLAST suite of programs were downloaded from the NCBI BLAST ftp site. A web page and CGI-Perl script was written to implement the user interface with the BLAST programs. The website and database is hosted on a Sun Ultra 10 workstation running Apache 2.0.36. Once transcriptome data is available for the unisequences this will be added. Any comments or suggestions are welcome, especially if you spot any errors.
06/08/2001 Version 1.0 of the database online.