PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 100%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771427
1978330
1979636
1980440
19811050
19821969
19831382
19841294
198512106
19869115
198710125
198843168
198949217
199080297
1991108405
1992104509
1993406915
19947511666
19955542220
19965982818
19978593677
199812324909
199914236332
200015837915
200116569571
2002166211233
2003235013583
2004310916692
2005330119993
2006385423847
2007440028247
2008412032367
2009423836605
2010443341038
2011433745375
2012465850033
2013508055113
2014599961112
2015529066402
2016598372385
2017626578650
2018619484844
2019685691700
2020804599745
20217410107155
20225292112447