Spermatogenesis is a unique process in male gender to produce male haploid germ cells from diploid progenitor cells. Spermatogenesis includes two sequential division of meiosis that convert one diploid spermatogia cell to four haploid cells. Spermatogenesis is a specialized process of differentiation of haploid round spermatid cells to the highly specialized sperm cell, the spermatozoon. Sperm function is to deliver the paternal genome to the oocyte.
Identification of protein molecules involved in sperm function, fertilization and early embryo development increase our knowledge about sperm biology and it will be applied in reproductive medicine and treatment of some inborn genetic diseases to generate a healthier offspring. The importance and the easy accessibility of sperm cells have favored the study of its composition and mechanisms involved in its differentiation and function (1 - 5). The protein content of the sperm was one of the first cells to be studied. It was the pioneering work done by Friedrich Miescher in 1874 that led to the isolation and identification of protamine. Recently, use of mass spectrumetrybased (MS) proteomics technology has further contributed to the identification of the proteome that make up spermatozoa (6 - 12).
In the current short review we focused on the proteins of the spermatozoa identified by MS proteomics technology. As a methodological approach we considered for inclusion all the articles retrieved from PubMed search with the keywords "human", "sperm", "spermatozoa", "spermatozoon", combined with the key word "proteome", "proteomics" or "mass spectrometry". We analyze the collected human sperm proteome by the Database for Annotation, Visualization and Integrated Discovery (DAVID) software. Using the DAVID software we particularly focused on the enriched biological themes, gene ontology (GO) terms, and discovered enriched functional-related gene groups (13).
Proteome definition: The proteome has been defined as the protein complement of the genome. However, the definition of proteome has changed since it was first defined by Wilkins et al. in 1995 (14). Today, the term 'proteome' has developed to be: "The proteome of an individual is defined by the sum and the time dynamics of all protein species occurring during the life-time of this individual". This definition of proteome includes the protein expression of the individual protein, the isoforms of a protein and post-translational modifications of a protein (15).
Techniques used in human sperm proteome mapping: There are several initial reports using MS proteomics technology to identify a limited number of proteins from the human sperm using twodimensional gel electrophoresis (2-DE) coupled to MALDI-TOF-MS analysis (16 - 23). An extensive human sperm proteome analysis using 1D-SDS-PAGE combined with electrospray liquid chromatography tandem mass spectrometry (GeLC-MS/ MS) approach identified 1,760 proteins (24). However, no protein list was published. The only far-reaching human sperm proteome analysis available to date is work done by Baker et al. (25). Using GeLC-MS/MS technique, they were able to map 1,056 unique proteins from the human spermatozoa. Literature review of the distribution of techniques used for mapping human spermatozoa showed that two studies had used GeLC-MS/MS. Additionally; 2-DE had been used in 8 studies to map human sperm proteome. To our best knowledge, no other techniques had been used for proteome profiling of human spermatozoa, including multidimensional protein identification technique (MudPIT) or combined fractional diagonal chromatography (COFRADIC) technique. A more extensive human sperm proteome could be obtained by combining different MS proteomics techniques. We have shown that different MS proteomics techniques are able to identify a unique set of proteins (26).
How many proteins are expressed in human sperm?: One of the big questions in the proteome analysis has been how big the human proteome size is? The near-complete sequencing of the human genome has yielded the total gene estimates that, at first glance, seem surprisingly low; of the order of 30000 open reading frames (27, 28). However, when a gene is expressed it is subjected to alternative splicing mechanisms and post-translational modifications. It is estimated each gene could produce between 5 to 6 mRNAs by an alternative splicing mechanism and each of these mRNA species is in turn translated into proteins that are processed in various ways, generating on the order of 8–10 different modified forms of each polypeptide chain. Thus, the human genome may potentially produce on the order of (30000×6×10) 1.8 million different protein species (29). Defining each and every one of these proteins is what global collaborations, such as the Human Proteome Organization (HUPO) is set to undertake.
The question 'how many proteins, the most highly differentiated and unique cell type in the human body, the spermatozoa, contain?' is often posed in the literature (25, 30). Of course, it is quite difficult to predict the size of the human spermatozoa proteome from the existing proteomics data, knowing the current limitation of MS proteomics technology (26, 31-33). However, Baker et al (30) used the current proteomics data available from yeast proteome to predict the number of protein species of the human spermatozoa to be 2000-2500. As Baker et al. also point out, this is much lower than the identified proteome of bovine sperm (~ 4000) (34). However, Baker et al. argue that the high number of protein identified in the bovine sperm proteome is caused by false positive identification (30).
Collected human sperm proteome analyzed by DAVID: The collected human sperm proteome were functionally categorized based on Gene Ontology (GO) annotation terms using the Database for Annotation, Visualization and Integrated Discovery (DAVID) program package (13, 35 - 37). For any gene or protein list, DAVID software tools are able to identify enriched biological themes, particularly GO terms, discover enriched functional-related gene groups, visualize genes or proteins on BioCarta and KEGG pathway maps, explore gene or protein names in batch, link genedisease associations, etc. Approximately 1,300 proteins of the human sperm cell, sum of 2-DE and GeLC-MS/MS techniques, were analyzed by DAVID software.
Biological function of human sperm proteome: Table 1 shows the ten most important catalogue outputs for biological function analysis by DAVID software DAVID software was only able to catalogue 793 of the submitted proteins. This means that biological functions of about 500 proteins out of the collected human sperm proteome are still unknown. As it is shown in the Table 1, the most important biologically functional proteins in the human sperm proteome belong to catabolic processes (16%), including proteins for the breakdown of carbon compounds with the liberation of energy used for sperm movement. DAVID categorized glucose catabolic processes and oxidative phosphorylation which is necessary for the homeostasis. In the table, we also find proteins belonging to spermatogenesis (3.6%) and spermiogenesis (0.9%).
Cellular component of human sperm proteome: Table 2 shows the top ten outputs of cellular localization of the collected human sperm proteome from DAVID software. The software was able to map 850 of the identified proteins. Around 450 of submitted proteins to DAVID were categorized as unknown localization.
Surprisingly, the most enriched groups from the collected human sperm proteome belong to cytoplasma (59%, 7.9E-48). It is well known that the human sperm lost most of its cytoplasm during spermiogensis process. A large number of proteins were categorized to be from mitochondria. Mitochondrial protein is not astonishing since the neck of human sperm is rich in mitochondria. Additionally, protein enriched parts belonging to the tail of human sperm were identified as cytoskeleton (12.6%, 1E-9) and flagellum (1.5%, 6.6E-8).
As it is shown in Table 2 no transmembrane proteins were categorized from the collected human sperm proteome which are important types of proteins for the oocyte and sperm interaction. This probably is caused by MS proteomics techniques used for the proteome mapping of human sperm. It is a well-known fact that the hydrophobic proteins, such as transmembrane proteins, rarely appear in gel-based techniques (26). Using gel-free techniques, such as MudPIT, will improve the deeper coverage of human sperm proteome. However, MudPIT is not a straightforward technique and it needs some expertise (38, 39).
Functional categorization of the collected human sperm proteome: The most statistically significant functional annotation by DAVID software were the actylated proteins (36.7, 2.2E-89) and phosphoprotein (47.4%, 1.8E-16) groups. This is to our knowledge that the most post-translated proteins identified so far were identified by using techniques such as 2-DE and GeLC-MS/MS (26, 40). However, the exact function of these large numbers of post-translational modifications is unknown (personal communication with Baker M, author of the largest human sperm proteome published to date (25)).
Metabolic pathway enriched in the collected human sperm proteome: One of the functions of DAVID software is to show the enriched KEGG pathways. The most significant metabolic pathway which were enriched in the collected human sperm proteome were proteasome (3%, 2E-22), fatty acid metabolism (1.6%, 3.3E-8), TCA cycle (1.4%, 5.8E-8), Glycolysis/Gluconegenesis (1.9%,7.7E-8) and pyruvate metabolism (1.4%,1.9E-6). Observing the enrichment of fatty acid and pyruvat metabolism is not surprising since sperm is under hypoxic condition.
Sperm: a silent cell?: One of the discussions in sperm cell biology is whether any protein synthesis takes place in the sperm cells or not? (30, 41). Martinez-Heredia et al (21) identified transcrip-tion factor proteins in the proteome mapping of human sperm using 2-DE technique, in the pI range 5-8. Additionally, in the analysis of the collected data on human sperm proteome by DAVID software we are able to localize protein in the eukaryotic translation elongation factor 1 complex (Table 2). However, a confirmation of these proteins by Western blotting technique is necessary in order to show that a protein synthesis actually takes place in sperm cells.
Although, the sperm protein content was one of the first cells to be analyzed, there is still a limited number of identified human sperm protein compared to other samples, such as brain proteome (7792 proteins) or the human neuroblastoma cell line SH-SY5Y proteome (3707 proteins), (42, 43). A deeper coverage of the human sperm proteome can be obtained using gel-free techniques such as MudPIT or COFRADIC (44, 45). It is well-estab-lished today that gel-free techniques have a better performance for the identification of basic, acidic and hydrophobic proteins than gel-based techniques (39, 46, 47). Chu et al (48) were able to identify very basic proteins using the MudPIT technology from C. elegans sperm proteome which is impossible to identify by gel-based techniques. Additionally, it should be kept in mind that a proteome is much more complex than a genome. The absence of a particular protein from any MS proteomics list does not necessarily mean that it is not present in the spermatozoa of that species. An alternative explanation is that the proteomic coverage could have been incomplete, the protein had been in too low abundance or the protein in question might have been missed by chance. Although, the human sperm proteome is small and less complex than other cells, the functions of many of the identified proteins of human sperm are still unknown at the present. Immunolocalization can be readily used to obtain some clues to their function through determining their location within the sperm and the expression pattern of the corresponding proteins. Also knockouts, knockdowns and conditional knockdowns should further contribute to the identification of their function. As the sperm proteome from different species becomes available, the comparison of conserved proteins and domains would also provide important clues towards the essential conserved functions and evolution of sperm proteins.
Authors declare no conflict of interest.