Sequence Motif Search

The sequence motif search options finds occurrences of amino acid or nucleotide sequence fragments in an entry's FASTA sequence.

Three types of syntax can be used to search for sequence patterns:

  • Simple: Search for sequence fragments using IUPAC one-letter codes for amino acids like MQTIF. Use the symbol ‘X’ to allow any amino acid at a position. E.g., a query for SH3 domains using the sequence -X-P-P-X-P (where X is a variable residue and P is Proline) can be expressed as: XPPXP.
  • PROSITE: Complex queries can be expressed using PROSITE patterns. For details, see the definitions
  • RegEx: Regular expressions are supported as an alternative representation of complex queries. For instance:
    • Ranges of variable residues are specified by the {n} notation, where n is the number of variable residues. To query a motif with seven variables between residues W and G and twenty variable residues between G and L use the following notation: W.{7}G.{20}L
    • Variable ranges are expressed by the {n,m} notation, where n is the minimum and m the maximum number of repetitions. For example the zinc finger motif that binds Zn in a DNA-binding domain can be expressed as: C.{2,4}C.{12}H.{3,5}H
    • The '^' operator searches for sequence motifs at the beginning of a protein sequence. The following two queries find sequences with N-terminal Histidine tags ^HHHHHH or ^H{6}
    • Square brackets specify alternative residues at a particular position. The Walker (P loop) motif that binds ATP or GTP can be expressed as: [AG]....GK[ST] (A or G are followed by 4 variable residues, then G and K, and finally S or T)

Searches can query protein sequences, DNA sequences, or RNA.

Display of the numbering for the sequential sequence match region (corresponding to PDBx/mmCIF file numbering) is available in the “Display Results as Polymer Entities” option.



Please report any encountered broken links to info@rcsb.org
Last updated: 2/1/2021