How JOINSOLVER® Works |
|||||||||||||||||||||||||||
| Introduction JOINSOLVER® was developed specifically to analyze the CDR3 region of the immunoglobulin genes in human B cells. The strategy of JOINSOLVER® is to search for D germline sequences flanking VH and JH germline genes. Additionally, it searches for P and N type additions in the VHD and DJH junctions. The human D germline gene database employed includes all D segments from the IMGT databank as well as the reverse and DIR germline genes. Assignment of the 5' boundary of the CDR3H JOINSOLVER® initially interrogates the sequence to find the beginning of the CDR3H region as codon 93. This codon was used to define the beginning of the CDR3 based on the results of structural analyses of VHDJH rearrangements (2, 3) as recommended (3-5). To identify this codon, JOINSOLVER® searches for the sequence, “TAT TAC TGT”, which comprises codons 90 to 92 of the VH region (after Kabat et al (1) and is a conserved motif in most of the human VH germline genes. If a “TAT TAC TGT” motif is not found, the search is reinitiated with one base-pair change allowed in the sequence. If a “TAT TAC TGT” with one nucleotide change is not found, then homologies with the germline genes are used to find the most likely start of the CDR3H region. If the start of the CDR3H region is not yet identified, JOINSOLVER® marks the CDR3H as not found and defers finding the CDR3H region until after V and J matching. Assignment of the 3' boundary of the CDR3H After the VH end of the CDR3H is defined, JOINSOLVER® screens for the JH border of the CDR3H. A “C TGG GG” motif demarks the 3’ end of the CDR3H region and is conserved in all JH sequences. A similar algorithm is used to find the “C TGG GG” at the 3’ end of the CDR3H. Assignment of the VH, D and JH segments Once the CDR3H region is identified, VH, JH and D assignment is carried out. The V region is matched to a database of germline genes from the “TAT TAC TGT” back 3’→5’ toward the beginning of the sequence, and forward in the 5’→3’ direction to the end of the germline gene. The JH region is matched from the “C TGG GG” back to the beginning of the germline gene and forward until the end of the sequence or the end of the germline gene is identified. The VH and JH regions are scored with an alignment score that assigns a +5 to a nucleotide match and -4 for a mismatch between the unknown sequence and the germline (7). The end of the VH region is identified when the given unknown sequence matches the complete VH germline gene or has a mismatch after the “TAT TAC TGT” with the highest scoring VH germline. The beginning of the JH region is defined when the unknown sequence has 1 mismatch before the “C TGG GG” with the highest scoring JH region or the sequence matches the complete J H germline gene. In the event that the CDR3H was inititally not found, JOINSOLVER® looks for matches between the V and J germline databases and the unknown sequence. The unknown sequence is aligned to the highest scoring germline genes. The CDR3H region is defined as the region from codon 93 and the “C” of the “C TGG GG” motif. The VH end and JH start are defined the same way as if the CDR3H region had been found first. After VH and JH assignment, D segment assignment is carried out using a consecutive match scoring system. All matches to the D germline genes are scored and sorted based on the VH-JH distance (the distance in nucleotides between the end of the VH segment and the beginning of the JH segment). The longest matches are aligned and returned to the user. A Monte Carlo simulation was used to determine the minimal length of a D segment match necessary to ensure that the match was unlikely to be a random occurrence. As shown in the table below, the match length required for identification depends on the VH-JH distance. Seven to 11 consecutively matching base pairs are necessary to identify a D segment with sufficiently high probability that it is unlikely to be from random chance.
|
|||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||