
bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown
Опубликована: Дек. 22, 2024
Abstract Antibody next-generation sequencing (NGS) datasets have become crucial to develop computational models addressing this successful class of therapeutics. Although antibodies are composed both heavy and light chains, most NGS depositions provide them in unpaired form, reducing their utility. Here we introduce PairedAbNGS, a novel database with paired heavy/light antibody chains. To the best our knowledge, is largest resource for natural sequences 58 bioprojects over 14 million assembled productive sequences. We make accessible at http://naturalantibody.com/paired-ngs as valuable tool biological machine-learning applications. Using dataset, investigated chain variable (V) gene pairing preferences found significant biases beyond usage frequencies, possibly due receptor editing favoring less autoreactive combinations. Analyzing available structures from Protein Data Bank, studied conserved contact residues between particularly interactions CDR3 region one FWR2 opposite chain. Examination amino acid pairs key sites revealed deviations acids distributions compared random pairings, chain’s contacting chain, indicating specific might be proper pairing. This observation further reinforced by preferential IGHV-IGLJ IGLV-IGHJ preferences. hope that resources findings would contribute improving engineering drugs.
Язык: Английский