Multigene Families and their Evolution

The term “Multigene Families” is used to include groups of genes from the same organism that encode proteins with similar sequences either over their full lengths or limited to a specific domain.
DNA duplications can generate gene pairs.
If both copies are maintained in subsequent generations then a multigene family will exist.
A multigene family is a member of a family of related proteins encoded by a set of similar genes.
Multigene families are believed to have arisen by duplication and variation of a single ancestral gene.
Examples of multigene families include those that encode the actins, hemoglobins, immunoglobulins, tubulins, interferons, histones etc.
DNA duplications that involve one or more genes generate gene pairs.
If both copies are maintained in subsequent generations then a multigene family will exist in the genome.

Because most duplications occur adjacent to the original copy, a subsequent duplication encompassing both paralogs may generate a family of four chromosomal rearrangements disperse the multigene families throughout the genome.
Dispersed members of the multigene family can still be recognized by sequence comparison.
The significance of recognizing multigene families is that the members may have related functions.
Genes that are identical or nearly identical in sequence and regulation can be considered to encode isoforms rather than members of a multigene family.
In addition, genes that were derived from a common ancestral gene but have diverged extensively may not be recognized as related.
The term “Super–Family” is used to describe a group of proteins with significant sequence similarity to each other but with clearly defined multigene families.
The individual multigene families are likely to have distinct functions that select for shared sequences that vary from the global consensus sequence seen in the whole super-family.
The term “clan” is used for related protein families that share some properties but display no clear phylogenetic relationship with each other.
It covers cases of convergent evolution of proteins with similar functions but no convincing evidence of a common origin.
Comparative genomics has increasingly shown that most eukaryotic genes are derived from genes that were present in one form or another in the eukaryotic ancestor.
Subsequent gene loss or amplification led to quantitative and qualitative differences observed in distant phyla.

(1) Origins and Evolution of the Formin Multigene Family that Is Involved in the Formation of Actin Filaments

In eukaryotes, the assembly and elongation of unbranched actin filaments is controlled by formins, which are long, multidomain proteins.
These proteins are important for dynamic cellular processes such as determination of cell shape, cell division, and cellular interaction.
Yet, no comprehensive study has been done about the origins and evolution of this gene family.
We therefore performed extensive phylogenetic and motif analyses of the formin genes by examining 597 prokaryotic and 53 eukaryotic genomes.
Additionally, we used three-dimensional protein structure data in an effort to uncover distantly related sequences.
Our results suggest that the formin homology 2 (FH2) domain, which promotes the formation of actin filaments, is a eukaryotic innovation and apparently originated only once in eukaryotic evolution.
Despite the high degree of FH2 domain sequence divergence, the FH2 domains of most eukaryotic formins are predicted to assume the same fold and thus have similar functions.
The formin genes have experienced multiple taxon-specific duplications and followed the birth-and-death model of evolution.

Additionally, the formin genes experienced taxon-specific genomic rearrangements that led to the acquisition of unrelated protein domains.
The evolutionary diversification of formin genes apparently increased the number of formin’s interacting molecules and consequently contributed to the development of a complex and precise actin assembly mechanism.
The diversity of formin types is probably related to the range of actin-based cellular processes that different cells or organisms require.
Our results indicate the importance of gene duplication and domain acquisition in the evolution of the eukaryotic cell and offer insights into how a complex system, such as the cytoskeleton, evolved.

(2) Molecular evolution of a multigene family in group A Streptococci

The emm genes are members of a gene family in group ‘A’ streptococci (GAS) that encode for antiphagocytic cell-surface proteins and/or immunoglobulin-binding proteins.
Previously sequenced genes in this family have been named “emm,” “fcrA,” “enn,” “arp,” “protH,” and “mrp“; herein they will be referred to as the “emm gene family.”
The genes in the emm family are located in a cluster occupying 3-6 kb between the genes mry and scpA on the chromosome of Streptococcus pyogenes.
Most GAS strains contain one to three tandemly arranged copies of emm-family genes in the cluster, but the alleles within the cluster vary among different strains.
Phylogenetic analysis of the conserved sequences at the 3′ end of these genes differentiates all known members of this family into four evolutionarily distinct emm subfamilies.
As a starting point to analyze how the different subfamilies are related evolutionarily, the structure of the emm chromosomal region was mapped in a number of diverse GAS strains by using subfamily-specific primers in the polymerase chain reaction.
Nine distinct chromosomal patterns of the genes in the emm gene cluster were found.
These nine chromosomal patterns support a model for the evolution of the emm gene family in which gene duplication followed by sequence divergence resulted in the generation of four major gene subfamilies in this locus.

(3) Molecular Evolution of the Cecropin Multigene Family in Drosophila: Functional Genes vs. Pseudogenes

Multigene families are formed by genes originated by gene duplication that have retained a certain degree of similarity.
The different members are often arranged in a compact cluster although they might be more or less dispersed in the genome, mostly due to chromosomal rearrangements subsequent to the gene duplications.
Members of a family can be functional or nonfunctional (pseudogenes). Functional members can be very similar as the copies might have retained the same function and be redundant.
However, one of the copies may have acquired a new function and suffered a certain degree of differentiation, which would be best explained by the action of Darwinian selection (Ohta 1994).
On the other hand, pseudogenes can accumulate substitutions due to the lack of functional constraints.
Concerted evolution of the different copies of a gene, which is facilitated by their compact clustering, can restrict the functional differentiation as well as the loss of function of the copies (Walsh 1987).
Otherwise, members of a family where concerted evolution is weak or absent have a higher probability to become pseudogenes (Walsh 1995).
The cecropin multigene family of Drosophila melanogaster is a family with both functional genes and pseudogenes.
The functional genes of this family code for cecropins, which are antibacterial peptides involved in the insect humoral immune response (Kylsten et al. 1990; Tryselius et al. 1992).
In Drosophila this response is mediated by at least another eight different kinds of peptides: defensin, attacin, diptericin, drosocin, metnikowin, drosomycin, andropin, and lysozyme (Engstrom 1997; Hetru et al. 1997; Meister et al. 1997).
The humoral response constitutes together with the cellular response the immune system in insects (Hultmark 1993).
In D. melanogaster the Cecropin region was cloned and sequenced by Kylsten et al. 1990 and by Tryselius et al. 1992. In an ~7-kb region these authors detected four functional Cecropin genes (CecA1, CecA2, CecB and CecC) and two pseudogenes (Cec1 and Cec2).
All functional genes are expressed upon bacterial infection, mainly in the fat body, although at different times during development:
CecA1 and CecA2 are essentially expressed in larvae and adults while CecB and CecC are mainly expressed during the pupal stage (Hultmark 1993).