FSTVAL Overview



Although the huge number of genes has been predicted from the whole genome sequence information, functions of the most genes are still unknown. To determine a gene function, insertional mutagenesis by T-DNA or transposones has been generally used. FST-based insertion mutant databases of rice and Arabidopsis are freely available for searching a mutant in which they are interested (GABI-Kat, Salk Institute, FLAG, OTL). Additionally, large numbers of mutants still have been developed to obtain enough coverage of whole genomes. Flanking sequence of T-DNA or transposon from mutants must be recovered to confirm their positions along the chromosomes. The analysis inevitably involves a lot of processes which could be laborious and time-consuming.

Here, we have developed the Flanking Sequences Tag Validator (FSTVAL), a user friendly, first open access tool to manage bulk flanking sequences tag (FSTs) through the Web. The FSTVAL automatically evaluate the FSTs and find best mapping position of the FST against genome sequence, as well as graphical image and frequency graph are provided. Genome sequences of 16 plant organisms are available as references sequences.

Input and Processing

There are two steps to analyze FSTs in FSTVAL; the first step is validating FSTs, the second step is mapping FSTs to genome. A user can upload the FASTA format, FASTQ format or PHD format FSTs. SeqIO module in BioPyhton is used for making the FASTA format file from the PHD format file. In addition, the user can upload the border sequences (ex, left or right border region from T-DNA), the adaptor sequence from adaptor-ligation PCR methods, and the binary vector sequence. And ‘minimum sequence length (MSL)’ is also chosen for FSTs validation; default MSL is 30bp. BLASTN analysis is used for masking vector sequence and finding position of the border and adaptor with expectation value of 10.0, 10.0, and 1e-10, respectively.

Binary vector and T-DNA tandem repeats were identified by sequence homology and it is indicated as a “vector”. Low quality sequences and sequences shorter than 'MSL' are indicated as a “Low (low quality sequence)” and a “NA (not acceptable sequence)”, respectively. The FSTs longer than 'MSL' are indicated as a “A (acceptable sequence)”. The acceptable FSTs are matched to genome using BLASTN with expected values of 5e-4. The highest scoring region is selected as an integration site of a the FST. This FSTs are divided into two types, genic region and intergenic region.


The FSTVAL provides the analysis result as a table. Additionally, masked sequences can be downloaded which are denoted as capital and lower-case according to a genomic sequence and a T-DNA border or an adaptor sequence, respectively. Then, the BLAST result is also presented as a table which shows the insertional type such as genic (exon, intron, 5’upstream-1000, 3’downstream-300), or intergenic. If the insertions occurred in a genic region, the interrupted gene id (which is linked to the GBrowser at its web site) and its description are shown. Also, if the FST is matched into intergenic region, the nearest gene from the insertion position is provided. We further established the distridution map and frequency graph of the insertions along the chromosomes using Python Image Library (PIL) and Matplotlib. The FSTVAL also provide 0.1-2kb regions sequence extending 5’ and 3’ from insertion position (that is indicated as five starts).

Organisms available in the FSTVAL

Scientific name Common name Source Version Reference
Arabidopsis lyrata Lyrate rockcress JGI JGI release v1.0 Hu TT et al. (2011)
Arabidopsis thaliana Aradidopsis TAIR TAIR10.0 Swarbreck D et al. (2008)
Brachypodium distachyon Purple false brome JGI JGI v1.0 8x assembly, MIPS/JGI v1.0 annotation International Brachypodium Initiative (2010)
Carica papaya Papaya ASGPB December 2007 Ming R et al. (2008)
Chlamydomonas reinhardtii Green algae JGI v4.3 release Merchant SS et al. (2007)
Cucumis sativus Cucumber JGI Phytozome v7.0 Huang S et al. (2009)
Glycine max Soybean JGI Glyma1 release Schmutz J et al. (2010)
Oryza sativa Rice MSU MSU release 6.1 Ouyang S et al. (2007)
RAP-DB IRGSP/RAP build 5 Rice Annotation Project et al. (2007)
Physcomitrella patens Moss JGI version 1.6 Rensing SA et al. (2008)
Populus trichocarpa Western poplar JGI version 2.0 Tuskan GA et al. (2006)
Ricinus communis Castor bean plant JCVI TIGR/JCVI release v0.1 Chan AP et al. (2010)
Sorghum bicolor Cereal grass JGI v1.0 release (Sbi1 assembly, Sbi1.4 gene set) Paterson AH et al. (2009)
Theobroma cacao Cacao USDA-ARS Release 1.0 Argout X et al. (2011)
Vitis vinifera Grape vine Genoscope March 2010 release Jailon O et al. (2007)
Volvox carteri Volvox JGI Phytozome v7.0 Prochnik SE et al. (2010)
Zea mays Maize Maizesequence Release 5a Schnable PS et al. (2009)
Brassica rapa Chinese cabbage BRAD Release v1.1 Cheng F et al. (2011)

Download FSTVAL

Copyright (c) 2010-2012, MyongJi Univ. (or GreenGene BioTech Inc.), All rights reserved.
This source code is distributed under the terms of the GNU General Public License.
If you want to FSTVAL source code, send email to llmon7711@gmail.com.

Overview | Contact | Last updated on May. 10, 2012.