-
I run predictions for the A, B, and C genes using genotypes of 7103 individuals derived from three different array platforms: GSA v3, Oncoarray, and Omni2.5Exome. These genotypes include a 500kb flanking region and are mapped to the hg19 assembly. I used the corresponding (or closest choice -v2- in the case of GSA) prefit classifer. To add an extra level of certainty to the resulting HLA predictions, I process every genotype across every other platform's prefit classifier to see if the best results align with their corresponding platform. Furthermore, I also utilized the four race-specific prefit classifiers mentioned in the original paper. Below is a representative example of the R code I used for each combination of platform and gene: ` Choose the pre-fit classifier we will usemodel_name <- "European-HLA4-hg19.RData" Load the model list from the specified filemlst <- get(load(model_name)) Load HLA-A genotyping datageno <- hlaBED2Geno("hla_A.bed", "hla_A.fam", "hla_A.bim") Load HLA-A pre-fit classifier into memorymodel <- hlaModelFromObj(mlst$A) Run the predictionhla_a <- hlaPredict(model, geno, cl=8) # Use 8 threads for parallel computation Here I present a table showing the mean for the highest probability* scores for every prefit classifier (grouped into four categories -4C-) across every analysis platform (array_platform):
Then, the same aproach, but for the matching scores: As you can see, on avareage, the best probability scores match with their corresponding array platforms, which aligns with expectations. However, the results of the matching scores are quite intriguing to me and prompt me to ask the following questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Q: Do these matching scores make any sense? Are these matching scores within the normal range for this kind of data? What does really "matching score" mean? Q: If a probability score is better with a different platform than the corresponding one, which HLA prediction should be prioritized? Q: Considering the ethnic background of a particular individual (e.g., African or Hispanic), how should this influence our choice? Suppose we obtain better probability results with paper-based, race-specific prefit classifiers than with the array-specific multiethnic ones. How should this impact our decision-making process? In your data, it is suggested to aggregate the prediction results from three array platforms (GSA, Omni2.5, Oncoarray), using |
Beta Was this translation helpful? Give feedback.
Q: Do these matching scores make any sense? Are these matching scores within the normal range for this kind of data? What does really "matching score" mean?
A: Yes, the matching scores make sense here. It is considered internally as a matching when comparing a pair of SNP alleles to a missing genotype, so you will see higher matching scores when the SNP overlapping between the array-specific model and tested platform is lower.
“Matching” is a measure describing how the observed SNP profile matches the haplotypes observed in the training set (so missing SNP genotypes "always" match any pair of SNP alleles with higher probabilities). Matching proportion is not directly related to confidence…