SPatt (Statistic for Patterns) is a suite of C++ programs designed for the computation of pattern occurrences p-value on text. Assuming the text is generated according to Markov model, the p-value of a given observation is its probability to occur. The lower is the p-value, the more unlikely is the observation. For example, this tools can be used to find patterns with unusual behaviour in DNA or proteins sequences.
The DNA motif/pattern
Here is the command to run ("-S" for the provided sequence, "-p" for the pattern, for the alphabet descriptor, "-m" for the Markov model order, "-1" means independent and identically uniformely distributed, "--over" to compute over-representation p-value):
spatt -S phage_lambda.fasta -p "GCTGG|CCAGC" -a "ACGT" -m -1 --over
and here is its (truncated) result:
distribution: P(N=0)=9.698565e-42 P(N=1)=9.130414e-40 P(N=2)=4.298067e-38 P(N=3)=1.348945e-36 P(N=4)=3.175459e-35 [...] P(N=206)=2.629107e-23 P(N=207)=1.210827e-23 P(N=208)=5.549911e-24 P(N=209)=2.531807e-24 P(N>=210)=2.090885e-24 pattern=GCTGG|CCAGC Nobs=210 P(N>=Nobs)=2.090885e-24
This result indicates that the observation of "at least 210 occurrences of
Computing the distribution of pattern in random sequences is a challenging and computationally intensive task for which it exists many concurrent approaches. The goal of SPatt is to implement make available the most relevant ones in a single easy-to-use package. Here is a list of the current features implemented in SPatt:
If you need more details on the statistical framework and the methodology implemented in SPatt, please check the Statistics section. If you want to learn how to get SPatt on your system, please check the Download section. For a step by step tutorial, check the Tutorial section. Finally to learn more about the contributors to SPatt and how to cite SPatt, check the Credits section.