Introduction
Infertility is a significant global health issue, affecting approximately 15% of couples worldwide (1). Clinical infertility is defined as the inability of a couple to conceive after 12 months of unprotected intercourse (2). Male infertility factors account for around half of all infertility cases (1). Teratozoospermia, characterized by abnormal sperm morphology, is a leading cause of male infertility, often involving defects in the sperm head and tail (3). Spermatogenesis is a complex and dynamic process regulated by intricate mechanisms involving accurate gene expression at different developmental stages (4). Understanding the molecular signaling pathways underlying spermatogenesis is critical for developing interventions to address infertility (5). Protein kinases, which possess a conserved catalytic domain, transfer phosphate groups from ATP to specific amino acids containing a free hydroxyl group, often found on both serine and threonine amino acids (known as serine/threonine kinases). The phosphorylation is vital for proper protein functioning in sperm (6, 7). Understanding the role of kinases is crucial for developing targeted therapeutic strategies to address male infertility. The knowledge of the differential gene expression between normal men and patients with male infertility is essential for understanding the molecular basis of male infertility. Microarray technology is a powerful tool for detecting changes in gene expression between normal and infertile men (8). This study addressed a critical knowledge gap in male infertility research. Although previous studies have examined individual kinases, there remains a lack of comprehensive, integrated transcriptomic analysis specifically focused on kinase gene expression alterations in teratozoospermia. To fill this gap, three publicly available transcriptomic datasets were analyzed comparing teratozoospermia and normozoospermia samples to identify differentially expressed kinase genes. It was hypothesized that dysregulated expression of specific kinases contributes to the abnormal sperm morphology, characteristic of teratozoospermia. Furthermore, the biological pathways involving these kinases were explored and their potential as diagnostic markers associated with male infertility was evaluated.
Methods
Data collection: The gene expression omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/ geo) was searched using the following keywords: "teratozoospermia", "homo sapiens", and "expression profiling by array". Datasets were selected based on the following criteria: (i) inclusion of both teratozoospermia and normozoospermia samples with at least 5 samples per group; (ii) high-quality metadata describing sperm morphology defects; (iii) human-specific expression profiling by microarray; and (iv) availability of raw data for reprocessing. After completing an extensive search, three GSE profiles (GSE6967, GSE6968, and GSE6872) belonging to GSE6969 superseries were selected. These datasets contained teratozoospermia samples and transcripts expression was profiled using the GPL2507 (Sentrix Human-6 Expression BeadChip), GPL2700 (Sentrix HumanRef-8 Expression BeadChip), and GPL570 ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array) platforms. The KinHub database (http://www.kinhub.org/kinases.html) was used to compile a comprehensive list of human kinase genes.
Microarray data processing: Data processing and integration were performed using the R statistical programming language. After combining three GEO datasets, the ComBat method through SVA package was used to correct for batch effects (9). To address platform heterogeneity beyond batch correction, probe-to-gene mapping was performed using platform-specific annotation files (Bioconductor annotation packages for Affymetrix and Illumina arrays), ensuring consistent gene-level summarization across datasets. Principal component analysis (PCA) and boxplots were utilized to confirm the elimination of batch effects (10). Given the modest sample size (n=22 teratozoospermia vs. n=22 normozoospermia), stringent statistical thresholds and integrated multiple datasets were applied to enhance statistical power, although it should be emphasized that larger cohorts would further strengthen the findings. As the final output, a unified expression matrix was generated by combining data from the three datasets.
Identification of differentially expressed genes: Differentially expressed genes (DEGs) were extracted from the unified expression matrix by comparing "teratozoospermia" and "normozoospermia" sample groups using the R package limma (11). A log2 fold change threshold of ≥|1| and an adjusted p-value of<0.01 were applied to determine statistical significance. degs belonging to the kinase gene family were prioritized in this study. the venny 2.0 tool (https: bioinfogp.cnb.csic.es tools enny ndex2.0.2.html) was used to intersect the kinase gene list from the kinhub database with the set of degs.
Validation of DEGs: Receiver operating characteristic (ROC) analysis was performed to assess the diagnostic value of gene expression differences between teratozoospermia and normozoospermia samples (12). To ensure robustness, ROC curves were generated using leave-one-out cross-validation (LOOCV) on the unified dataset, with the area under the ROC curve (AUC) calculated for each candidate gene to evaluate their diagnostic potential. This statistical analysis was conducted using the Prism software (version 9.1.0; GraphPad, US) (13).
Classification, gene ontology, and pathway enrichment analyses: The list of DEG kinases was comprehensively cross-referenced against the KinBase database (http://kinase.com/web/current/ kinbase/) which is a curated repository of eukaryotic protein kinase sequences and classifications. Pathway and enrichment analyses were conducted to identify biological mechanisms associated with differentially expressed kinase genes. The kyoto encyclopedia of genes and genomes (KEGG) database was used for pathways analysis, while gene ontology (GO) analysis examined enriched biological process (BP), cellular component (CC), and molecular function (MF). The GO terms were visualized using the adjusted p-value, the count of participating genes, and the Enrichr combined score. The Enrichr and SRplot tools were utilized to perform the pathway and enrichment analyses on the DEG kinases (14, 15).
Results
Data collection and expression analysis: The overall analysis pipeline is summarized in figure 1. The details of the three GEO datasets used in this study are provided in table 1. A total of 536 human kinase genes were identified from the KinHub database. The integrated analysis of the three GEO datasets resulted in the identification of 1,292 DEGs between teratozoospermia (n=22) and normozoospermia samples (n=22). By comparing the DEGs with the list of human kinase genes, 34 differentially expressed kinase genes were identified (10 upregulated and 24 downregulated) (Table 2A and 2B). In table 2, "LogFC" refers to the log2 fold change in gene expression, representing the ratio of expression levels between teratozoospermia and normozoospermia samples. A positive LogFC indicates upregulation of the gene in teratozoospermia, while a negative value indicates downregulation compared to normozoospermia. This metric allows for the quantification and comparison of gene expression changes across the studied groups. Among the differentially expressed kinase genes, ROR1 displayed the most significant upregulation (LogFC=2.89, adjusted p=0.00000000014), whereas STK39 showed the most significant downregulation (LogFC=-2.12, adjusted p=0.000000000018). Both kinases also demonstrated strong diagnostic value, with AUC values of 0.96 and 0.98 for ROR1 and STK39, respectively. These high LogFC values and low p-values highlight ROR1 and STK39 as potential key drivers of kinase-mediated defects in sperm morphology, consistent with prior reports of their roles in cellular signaling pathways relevant to spermatogenesis.
ROC analysis: ROC analysis evaluated the accuracy of the selected DEG kinases in predicting diagnostic status. The expression levels of the 34 differentially expressed kinase genes showed significant diagnostic value, as indicated by their AUC values (Table 2). High AUC scores (>0.9) across most genes indicate robust discriminatory power, suggesting these kinases could serve as reliable transcriptomic markers for early teratozoospermia detection in clinical semen analysis (Figure 2).
Classification and validation of DEG kinases: Table 3 provides a detailed breakdown of the various groups and families represented within the set of identified DEG kinases according to the KinBase database. Specifically, the DEG kinases were found to span multiple major kinase groups, including the MAP kinase cascades, calcium/ calmodulin dependent protein kinases, casein kinase 1, tyrosine kinase, and tyrosine kinase-like groups. Within these broader groups, the kinases were further classified into families. This broad representation underscores the involvement of diverse signaling cascades in teratozoospermia, where disruptions in MAP kinase or tyrosine kinase families may impair sperm motility and
acrosome reaction, as evidenced in related infertility studies.
Gene ontology and KEGG pathway enrichment analysis: The Enrichr database identified enriched KEGG pathways and gene ontology terms associated with the differentially expressed kinase genes, using a p-adjusted value threshold of<0.01. the results of the most significant and predominant top 10 go terms and enrichment pathways, based on the number of genes involved in, are presented using bubble plots for the enriched go terms and chord plots for the kegg pathways, including bp, cc, and mf as shown in figures 3 and 4, respectively. the significant enrichment of specific kegg pathways and go terms suggests that altered kinase activity may directly contribute to the intracellular signaling disruptions seen in teratozoospermia. notably, enrichments in pathways like mapk signaling and go terms related to cytoskeletal organization provide mechanistic insights into how kinase dysregulation could lead to abnormal sperm head tail structures, linking transcriptomic changes to phenotypic outcomes.<>
Discussion
The main goal of this study was to identify kinases potentially involved in teratozoospermia by integrating multiple gene expression datasets. Generally, 34 differentially expressed kinases linked to critical signaling pathways were identified that regulate spermatogenesis and sperm function. Spermatogenesis is a complex process requiring the coordinated expression of numerous genes; disruptions in these genes can lead to morphological sperm abnormalities and male infertility (16-18). Phosphorylation serves as a key regulatory mechanism within many signaling pathways, including transcription, cell cycle progression, and apoptosis (6, 19). Several kinases are known to play essential roles in sperm development and function. For instance, TSSK3 is involved in spermiogenesis and sperm tail formation, FYN kinase affects sperm head and acrosome shaping, and mutations in TSSK1B have been linked to asthenoteratozoospermia (20-23). Understanding how kinase-mediated signaling pathways influence sperm morphology could open avenues for novel diagnostic and personalized therapeutic strategies.
The 34 kinases identified in our study participate in major biological pathways, notably GnRH, Hedgehog, TGF-β, MAPK, and Wnt signaling. These pathways collaboratively regulate critical stages of spermatogenesis. For example, GnRH signaling via ERK1/2 is vital for gonadotrope differentiation, Hedgehog signaling controls germline development, and MAPK governs spermatogonial renewal and sperm maturation. Wnt signaling also contributes to post-transcriptional regulation of sperm maturation, with disruptions leading to infertility characterized by malformed and immotile sperm (24-30). Our findings enhance understanding of the complex signaling disruptions underlying teratozoospermia. By mapping dysregulated kinases to these pathways, a framework for future research is provided to explore the regulation of gene expression and functional consequences for sperm development. These insights may guide the development of targeted diagnostics or personalized treatments. Despite new insights, this study has key limitations. The small sample size may limit statistical power and generalizability. Using public datasets and different microarray platforms introduces variability and potential confounding. mRNA levels may not reflect the actual protein function, requiring proteomic validation. Functional roles of kinases need experimental confirmation. Also, transcript data from testicular biopsies may not perfectly represent mature sperm function. Future larger, proteomic, and functional studies are needed to validate and build on these findings.
Conclusion
This study identified 34 differentially expressed kinases associated with teratozoospermia, implicating major signaling pathways such as GnRH, Hedgehog, TGF-β, MAPK, and Wnt in the pathogenesis of abnormal sperm morphology. While these findings provide promising candidates for understanding molecular mechanisms, their clinical utility remains to be validated through rigorous experimental and clinical studies. Future research should focus on validating our integrative transcriptomic findings by using immunohistochemistry, qPCR, and Western blot analyses in human testicular tissue samples. These methodologies will enable the assessment of protein expression, localization, and phosphorylation status, providing critical functional insights into kinase involvement in teratozoospermia and translating these insights into diagnostic or therapeutic applications.
Acknowledgement
The authors would like to express their sincere gratitude to the Department of Genetics at Royan Institute for Reproductive Biomedicine, Tehran, Iran and Headquarters for Development of Stem Cell Sciences and Technologies, Vice Presidency for Science and Technology, Presidential Administration, Tehran, Iran, and National Institute for Medical Research Development (NIMAD), Tehran, Iran for their support and contribution to this research. Their expertise and resources were instrumental in the successful completion of this study.
Conflict of Interest
The authors declare that they have no competing interests.
0.01.>