C-value Paradox / Engima
- The term C-value refers to the amount of DNA contained within a haploid nucleus (e.g., in a gamete or one half the amount in a diploid somatic cell) of a eukaryotic organism.
- In some cases (notably among diploid organisms), the terms C-value and genome size are used interchangeably, however in polyploids the C-value may represent two genomes contained within the same nucleus.
- Greilhuber et al. (2005) have suggested some new layers of terminology and associated abbreviations to clarify this issue, but these somewhat complex additions have yet to be used by other authors. C-values are reported in picograms.
Origin of the term –
- Many authors have incorrectly assumed that the “C” in “C-value” refers to “characteristic”, “content”, or “complement”.
- Even among authors who have attempted to trace the origin of the term, there had been some confusion because Hewson Swift did not define it explicitly when he coined it in 1950.
- In his original paper, Swift appeared to use the designation “1C value”, “2C value”, etc., in reference to “classes” of DNA content (e.g., Gregory 2001, 2002); however, Swift explained in personal correspondence to Prof. Michael D. Bennett in 1975 that “I am afraid the letter C stood for nothing more glamorous than ‘constant’, i.e., the amount of DNA that was characteristic of a particular genotype” (quoted in Bennett and Leitch 2005).
- This is in reference to the report in 1948 by Vendrely and Vendrely of a “remarkable constancy in the nuclear DNA content of all the cells in all the individuals within a given animal species” (translated from the original French).
- Swift’s study of this topic related specifically to variation (or lack thereof) among chromosome sets in different cell types within individuals, but his notation evolved into “C-value” in reference to the haploid DNA content of individual species and retains this usage today.
C-value paradox history –
- In 1948, Roger and Colette Vendrely reported a “remarkable constancy in the nuclear DNA content of all the cells in all the individuals within a given animal species”, which they took as evidence that DNA, rather than protein, was the substance of which genes are composed.
- The term C-value reflects this observed constancy.
- However, it was soon found that C-values (genome sizes) vary enormously among species and that this bears no relationship to the presumed number of genes (as reflected by the complexity of the organism).
- For example, the cells of some salamanders may contain 40 times more DNA than those of humans.
- Given that C-values were assumed to be constant because DNA is the stuff of genes, and yet bore no relationship to presumed gene number, this was understandably considered paradoxical; the term C-value paradox was used to describe this situation by C.A. Thomas, Jr. in 1971.
- The discovery of non-coding DNA in the early 1970s resolved the C-value paradox.
- It is no longer a mystery why genome size does not reflect gene number in eukaryotes: most eukaryotic (but not prokaryotic) DNA is non-coding and therefore does not consist of genes, and as such total DNA content is not determined by gene number in eukaryotes.
- The human genome, for example, comprises only about 1.5% protein-coding genes, with the other 98.5% being various types of non-coding DNA (especially transposable elements) (International Human Genome Sequencing Consortium 2001).
- It is unclear why some species have a remarkably higher amount of non-coding sequences than others of the same level of complexity. Non-coding DNA may have many functions yet to be discovered.
- Though now it is known that only a fraction of the genome consists of genes, the paradox remains unsolved.
- The term “C-value enigma” represents an update of the more common but outdated term “C-value paradox” (Thomas 1971), being ultimately derived from the term “C-value” (Swift 1950) in reference to haploid nuclear DNA contents.
- The term was coined by Canadian biologist Dr. T. Ryan Gregory of the University of Guelph in 2000/2001.
- In general terms, the C-value enigma relates to the issue of variation in the amount of non-coding DNA found within the genomes of different eukaryotes.
The C-value enigma, unlike the older C-value paradox, is explicitly defined as a series of independent but equally important component questions, including:
- What types of non-coding DNA are found in different eukaryotic genomes, and in what proportions?
- From where does this non-coding DNA come, and how is it spread and/or lost from genomes over time?
- What effects, or perhaps even functions, does this non-coding DNA have for chromosomes, nuclei, cells, and organisms?
- Why do some species exhibit remarkably streamlined chromosomes, while others possess massive amounts of non-coding DNA?
Variation among species –
- C-values vary enormously among species. In animals they range more than 3,300-fold, and in land plants they differ by a factor of about 1,000 (Bennett and Leitch 2005; Gregory 2005).
- Protist genomes have been reported to vary more than 300,000-fold in size, but the high end of this range (Amoeba) has been called into question.
- Variation in C-values bears no relationship to the complexity of the organism or the number of genes contained in its genome, an observation that was deemed wholly counterintuitive before the discovery of non-coding DNA and which became known as the C-value paradox as a result.
- However, although there is no longer any paradoxical aspect to the discrepancy between C-value and gene number, this term remains in common usage.
- For reasons of conceptual clarification, the various puzzles that remain with regard to genome size variation instead have been suggested to more accurately comprise a complex but clearly defined puzzle known as the C-value enigma.
- C-values correlate with a range of features at the cell and organism levels, including cell size, cell division rate, and, depending on the taxon, body size, metabolic rate, developmental rate, organ complexity, geographical distribution, and/or extinction risk.
Calculating C-values –
- By using the data in Table 5.1, relative weights of nucleotide pairs can be calculated as follows: AT = 615.3830 and GC = 616.3711.
- Provided the ratio of AT to GC pairs is 1:1, the mean relative weight of one nucleotide pair is 615.8771 (±1%).
- The relative molecular weight may be converted to an absolute value by multiplying it by the atomic mass unit (1 u), which equals one-twelfth of a mass of 12C, i.e., 1.660539 × 10-27 kg. Consequently, the mean weight of one nucleotide pair would be 1.023 × 10-9 pg, and 1 pg of DNA would represent 0.978 × 109 base pairs.
- The formulas for converting the number of nucleotide pairs (or base pairs) to picograms of DNA and vice-versa are:
- Genome size (bp) = (0.978 x 109) x DNA content (pg)
- DNA content (pg) = genome size (bp) / (0.978 x 109)
- 1 pg = 978 Mb
- The current estimates for human female and male diploid genome sizes are 6.406 × 109 bp and 6.294 × 109 bp, respectively.
- By using the conversion formulas given above, diploid human female and male nuclei in G1 phase of the cell cycle should contain 6.550 and 6.436 pg of DNA, respectively.
- The phenomenon that, frequently,
- C values do not correlate with the evolutionary complexity of species; they are large in some small organisms.
- This is presumably due to the fact that sizeable portions of the DNA do not code for proteins and either have other regulatory functions or are functionless.
|Table 5.1: Relative Molecular Weights of Nucleotides|
|Nucleotide||Chemical formula||Relative molecular weight|