PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
19771123
1978326
1979228
1980230
1981737
19821855
1983560
19841171
1985879
1986887
1987996
198820116
198933149
199030179
199137216
199252268
1993154422
1994324746
19952661012
19963001312
19974531765
19985432308
19997103018
20008193837
20018514688
20029125600
200313246924
200418418765
2005206710832
2006233713169
2007251015679
2008234418023
2009240320426
2010240522831
2011217025001
2012229427295
2013239829693
2014278432477
2015247134948
2016268237630
2017282740457
2018281343270
2019303646306
2020361949925
2021291452839
2022212554964