Here we present CaviDB, a novel database of cavities and their features in known protein structures. It integrates the results of common cavity detection software with protein features derived from the sequence, structural, and functional analyzes. CaviDB aims to provide a comprehensive database that can be used for various aspects of drug design and discovery but also contributes to a better understanding of the fundamentals of protein structure-function relationships.
Cavities prediction and categorization
CaviDB provides users with structural and sequential features for protein cavities characterization. The cavities predictions were made using Fpocket (Le Guilloux et al., 2009), in its default setting, for all the Protein Data Bank (Berman et al., 2000; PDBe-KB consortium, 2020), and all the AlphaFold database entries (Varadi et al., 2022). We retrieved and annotate all the properties linked to each cavity, as well as the residues conforming to them.
The cavity categorization into druggable was made considering druggable scores greater than 0.5 as proposed in previous works (Schmidtke and Barril, 2010).
Cavities Features calculation
The pKa values (pH=7) of the ionizable residues and their shifts (pKa predicted - pKa expected) were calculated using Propka (Olsson et al., 2011). The net pKa shift values per cavity were calculated as the sum absolute pKa shifts of each ionizable residue belonging to a cavity.
The information about the contacts per-site predicted by Propka, is represented agregated by cavity: Side chain hydrogen bonds (SCHBOND), Backbone hydrogen bonds (BBHBOND), and Coulombic bounds. The contacts per cavities shown in the network plot were built taking into account the sites belonging to one and another cavity, having at least one contact between them. The binding energies matrices were built by calculating the sum of absolute binding energies between the residues in contact in each pair of cavities.
The properties per site were calculated using CIDER (Holehouse et al., 2017), modlamp (Müller et al., 2017), and Biopython (Chapman and Chang, 2000), and assigned to each cavity to those to which residues belong using the mean value of those features in the cavity.
Global Protein Features calculation and annotation
The global protein features were calculated as described in the section before. Each CaviDB was annotated with biological relevant database identifiers, such as CATH (Sillitoe et al., 2015) and Pfam (Finn et al., 2016), through SIFTS (Velankar et al., 2013) in order to help users with posterior analysis.