Welcome to the Weizmann MST server (Version 1.0).

This server is designed to help find unique regions in microbial 16S DNA sequences for the purpose of designing probes for MST (Microbial Source Tracking) analysis.

The input is a DNA sequence (or a key code given from a previous run, just to check results).

The sequence is compared to a database of bacterial 16S sequences culled from RDP (http://rdp.cme.msu.edu/), which are of fecal origin and whose host is known, using the FASTA search algorithm.

The resulting sequence hits are then clustered using CD-Hit at the level of 90% identity, and the longest representative sequence of each cluster is taken.

The representative sequences are aligned using Muscle, and the resulting alignment is trimmed to the region aligning with the input sequence.

The alignment is checked with a sliding window to find the most unique regions in the initial input sequence, and candidate probes are suggested.

The outputs are the potential probe sequences, a graphic representation of the percent difference of the sequences (the output of the sliding window), and the alignment of the representative sequences, with both the host species name, and the bacterial taxa assignment, where available.

The resulting sequences should be checked further for uniqueness, either in NCBI or RDP, as our database of sequences was filtered for those whose source and host organism was known, and may not include all fecal sequences.

Full Details of Version 1.0 are available in the paper:

The Development of a Novel qPCR Assay-Set for Identifying Fecal Contamination Originating from Domestic Fowls and Waterfowl in Israel. Shoshanit Ohad, Shifra Ben-Dor, Jaime Prilusky, Valeria Kravitz, Bareket Dassa, Vered Caspi, Yechezkel Kashi and Efrat Rorman. Submitted.

Updates and changes will be listed on this page.

Questions about the server and requests for the bacterial fecal 16S database can be addressed to: bioinfo@weizmann.ac.il


Algorithm References:

FASTA: Pearson, W.R., and Lipman, D.J. (1988). Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85, 2444-2448.

CD-Hit: Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150-3152.

Muscle: Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792-1797.