A protein’s chemical function is largely dependent on the amino acids it is built upon. Yet, despite the existence of 64 unique codons, which theoretically allows for 63 unique amino acids in addition to a stop codon, there are only 20 standard amino acids used in organisms. For researchers, this limits the chemical diversity of the proteins that they can build. Thus, it is of interest and benefit to expand the genetic code with non-standard amino acids (nsAAs) that have chemical properties distinct from the standard amino acids. To do this, tRNAs with redundant codons must be charged with new amino acids. This requires engineering tRNA synthetases, the proteins that charge tRNAs, to recognize and accept nsAAs as substrates. Ultimately, this will provide the biological tools to build new biomaterials and pharmaceuticals.
As this is a fast-growing field of research, the rate of development for new tRNA synthetases and nsAAs has increased dramatically, and the large amount of literature has become more difficult to categorize. A database for genetic code expansion is therefore necessary to provide better searchability and reference for researchers. Additionally, having an organized dataset would allow for more advanced analysis of nsAAs, such as through machine learning. The aim of GCEdb is to catalogue these nsAAs from literature.