Gene Regulation in Eukaryotes

In the absence of precise information about the mechanisms that regulate gene expression in eukaryotes, many models were proposed.
One of the more popular early models known as Britten Davidson model or Gene Battery model was that given by R.J. Britten and E.H. Davidson in 1969.
This model even though widely accepted, is only a theoretical model and lacks sound practical proof.
The model predicts the presence of four types of sequences.

Producer gene –

It is comparable to a structural gene in prokaryotes.
It produces pre mRNA, which after processing becomes mRNA.
Its expression is under the control of many receptor sites.

Receptor site (gene) –

It is comparable to the operator in bacterial operon.
At least one such receptor site is assumed to be present adjacent to each producer gene.
A specific receptor site is activated when a specific activator RNA or an activator protein, a product of integrator gene, complexes with it.

Integrator gene –

Integrator gene is comparable to regulator gene and is responsible for the synthesis of an activator RNA molecule that may not give rise to proteins before it activates the receptor site.
At least one integrator gene is present adjacent to each sensor site.

Sensor site –

A sensor site regulates activity of an integrator gene which can be transcribed only when the sensor site is activated.
The sensor sites are also regulatory sequences that are recognized by external stimuli, e.g. hormones, temperature.
According to the Britten Davidson model, specific sensor genes represent sequence-specific binding sites (similar to CAP-cAMP binding site in the E. coil) that respond to a specific signal.
When sensor genes receive the appropriate signals, they activate the transcription of the adjacent integrator genes.
The integrator gene products will then interact in a sequence specific manner with receptor genes.
Britten and Davidson proposed that the integrator gene products are activator RNAs that interact directly with the receptor genes to trigger the transcription of the continuous producer genes.

It is also proposed that receptor sites and integrator genes may be repeated a number of times so as to control the activity of a large number of genes in the same cell.
Repetition of receptor ensures that the same activator recognizes all of them and in this way several enzymes of one metabolic pathway are simultaneously synthesized.
Transcription of the same gene may be needed in different developmental stages.
This is achieved by the multiplicity of receptor sites and integrator genes.
Each producer gene may have several receptor sites, each responding to one activator.
Thus, though a single activator can recognize several genes, different activators may activate the same gene at different times.
A set of structural genes controlled by one sensor site is termed as a battery.
Sometimes when major changes are needed, it is necessary to activate several sets of genes.
If one sensor site is associated with several integrators, it may cause transcription of all integrators simultaneously thus causing transcription of several producer genes through receptor sites.
The repetition of integrator genes and receptor sites is consistent with the reports that state that sufficient repeated DNA occurs in the eukaryotic cells.
The most attractive features of the Britten and Davidson model is that it provides a plausible reason for the observed pattern of interspersion of moderately repetitive DNA sequences and single copy DNA sequences.

Direct evidence indicates that most structural genes are indeed single copy DNA sequences.
The adjacent moderately repetitive DNA sequences would contain the various kinds of regulator genes (sensor, integrator and receptor genes).
The latest estimates are that a human cell, a eukaryotic cell, contains 20,000–25,000 genes.

Some of these are expressed in all cells all the time. These so-called housekeeping genes are responsible for the routine metabolic functions (e.g. respiration) common to all cells.
Some are expressed as a cell enters a particular pathway of differentiation.
Some are expressed all the time in only those cells that have differentiated in a particular way. For example, a plasma cell expresses continuously the genes for the antibody it synthesizes.
Some are expressed only as conditions around and in the cell change. For example, the arrival of a hormone may turn on (or off) certain genes in that cell.

How is gene expression regulated?

There are several methods used by eukaryotes.

Altering the rate of transcription of the gene. This is the most important and widely-used strategy and the one we shall examine here.
However, eukaryotes supplement transcriptional regulation with several other methods:
- Altering the rate at which RNA transcripts are processed while still within the nucleus.
- Altering the stability of mRNA molecules; that is, the rate at which they are degraded.
- Altering the efficiency at which the ribosomes translate the mRNA into a polypeptide.

Protein-coding genes have

exons whose sequence encodes the polypeptide;
introns that will be removed from the mRNA before it is translated;
a transcription start site
a promoter
- the basal or core promoter located within about 40 bp of the start site
- an “upstream” promoter, which may extend over as many as 200 bp farther upstream
  - enhancers
  - silencers
  - Adjacent genes (RNA-coding as well as protein-coding) are often separated by an insulator which helps them avoid cross-talk between each other’s promoters and enhancers (and/or silencers).
  - Transcription start site This is where a molecule of RNA polymerase II (pol II, also known as RNAP II) binds. Pol II is a complex of 12 different proteins (shown in the figure in yellow with small colored circles superimposed on it).
  - The start site is where transcription of the gene into RNA begins.

Figure 2.44 : Eukaryotic promoter with TFIID

The basal promoter The basal promoter (Figure 2.44) contains a sequence of 7 bases (TATA-AAA) called the TATA box. It is bound by a large complex of some 50 different proteins, including
Transcription Factor IID (TFIID) which is a complex of
- TATA-binding protein (TBP), which recognizes and binds to the TATA box
- 14 other protein factors which bind to TBP — and each other — but not to the DNA.
Transcription Factor IIB (TFIIB) which binds both the DNA and pol II.
The basal or core promoter is found in all protein-coding genes.
This is in sharp contrast to the upstream promoter whose structure and associated binding factors differ from gene to gene.
Although the figure is drawn as a straight line, the binding of transcription factors to each other probably draws the DNA of the promoter into a loop.
Many different genes and many different types of cells share the same transcription factors – not only those that bind at the basal promoter but even some of those that bind upstream (Figure 2.45).
What turns on a particular gene in a particular cell is probably the unique combination of promoter sites and the transcription factors that are chosen.

Figure 2.45 : Eukaryotic promoter with Enhancer Binding Protein

An Analogy The rows of lock boxes in a bank provide a useful analogy.
To open any particular box in the room requires two keys:

your key, whose pattern of notches fits only the lock of the box assigned to you (= the upstream promoter), but which cannot unlock the box without
a key carried by a bank employee that can activate the unlocking mechanism of any box (= the basal promoter) but cannot by itself open any box.

Note : Transcription factors represent only a small fraction of the proteins in a cell.

Hormones exert many of their effects by forming transcription factors – The complexes of hormones with their receptor represent one class of transcription factor.
Hormone “response elements”, to which the complex binds, are promoter sites.

Embryonic development requires the coordinated production and distribution of transcription factors.

Enhancers

Some transcription factors (“Enhancer-binding protein”) bind to regions of DNA that are thousands of base pairs away from the gene they control (Figure 2.46).
Binding increases the rate of transcription of the gene.
Enhancers can be located upstream, downstream, or even within the gene they control.
How does the binding of a protein to an enhancer regulate the transcription of a gene thousands of base pairs away?
One possibility is that enhancer-binding proteins — in addition to their DNA-binding site, have sites that bind to transcription factors (“TF”) assembled at the promoter of the gene.
This would draw the DNA into a loop (as shown in the figure 2.46).

Figure 2.46 : Some of the transcription factors that produce the segmented body plan in Drosophila. E2 and Sp1 type of Binding Proteins

Visual evidence

Michael R. Botchan (who kindly supplied these electron micrographs) and his colleagues have produced visual evidence of this model of enhancer action. They created an artificial DNA molecule with

several promoter sites for Sp1 about 300 bases from one end. Sp1 is a zinc-finger transcription factor that binds to the sequence 5′ GGGCGG 3′ found in the promoters of many genes, especially “housekeeping” genes.
several enhancer sites about 800 bases from the other end. These are bound by an enhancer-binding protein designated E2.
1860 base pairs of DNA between the two.
When these DNA molecules were added to a mixture of Sp1 and E2, the electron microscope showed that the DNA was drawn into loops with “tails” of approximately 300 and 800 base pairs.
At the neck of each loop were two distinguishable globs of material, one representing Sp1 (red), the other E2 (blue) molecules. (The two micrographs are identical; the lower one has been labeled to show the interpretation.)
Artificial DNA molecules lacking either the promoter sites or the enhancer sites, or with mutated versions of them, failed to form loops when mixed with the two proteins.

Silencers

Silencers are control regions of DNA that, like enhancers, may be located thousands of base pairs away from the gene they control.
However, when transcription factors bind to them, expression of the gene they control is repressed.

Insulators

A problem: As you can see above, enhancers can turn on promoters of genes located thousands of base pairs away.
What is to prevent an enhancer from inappropriately binding to and activating the promoter of some other gene in the same region of the chromosome?

One answer: an insulator.

Insulators are

stretches of DNA (as few as 42 base pairs may do the trick)
located between the
- enhancer(s) and promoter or
- silencer(s) and promoter of adjacent genes or clusters of adjacent genes.

Their function is to prevent a gene from being influenced by the activation (or repression) of its neighbors.

Example:

The enhancer for the promoter of the gene for the delta chain of the gamma/delta T-cell receptor for antigen (TCR) is located close to the promoter for the alpha chain of the alpha/beta TCR (on chromosome 14 in humans) (Figure 2.47).
A T cell must choose between one or the other. There is an insulator between the alpha gene promoter and the delta gene promoter that ensures that activation of one does not spread over to the other.

Figure 2.47 : Chromosome 14 showing δ and α gene segments with promoter and enhancer.

All insulators discovered so far in vertebrates work only when bound by a protein designated CTCF (“CCCTC binding factor”; named for a nucleotide sequence found in all insulators). CTCF has 11 zinc fingers.
Another example: In mammals (mice, humans, pigs), only the allele for insulin-like growth factor-2 (IGF2) inherited from one’s father is active; that inherited from the mother is not — a phenomenon called imprinting.
The mechanism: the mother’s allele has an insulator between the IGF2 promoter and enhancer. So does the father’s allele, but in his case, the insulator has been methylated. CTCF can no longer bind to the insulator, and so the enhancer is now free to turn on the father’s IGF2 promoter.
Many of the commercially-important varieties of pigs have been bred to contain a gene that increases the ratio of skeletal muscle to fat.
This gene has been sequenced and turns out to be an allele of IGF2, which contains a single point mutation in one of its introns.
Pigs with this mutation produce higher levels of IGF2 mRNA in their skeletal muscles (but not in their liver).

This tells us that:

Mutations need not be in the protein-coding portion of a gene in order to affect the phenotype.
Mutations in non-coding portions of a gene can affect how that gene is regulated (here, a change in muscle but not in liver).