Phylogenetic aspects of serum albumin

Godin, R.E., Urry, L.A. & Ernst, S.G. (Alternative splicing of the Endo16 transcript produces differentially expressed mRNAs during sea urchin gastrulation. Dev. Biol. 179:148-159, 1996) have reported on the first albumin-like protein in an invertebrate. They sequenced a large, multidomain protein from the endodermal cells of the sea urchin gastrula stage, named Endo16 calcium-binding protein  (NCBI Reference Sequence: NP_999684.1; It contains 13 cysteine pairs with 2 single cysteines arranged in a regular pattern between each pair, resembling the cysteine pattern found in the albumin family. The authors propose that this region of Endo16 acts as a ligand-binding protein during gastrylation. The protein has similarities, particular in its disulfide pairings, to a mouse embryo osteogenic cell protein, named Ecm1,  extracellular matrix protein 1 (NCBI Reference Sequence: NP_001239582.1;; both appear to bind calcium ions.

Albumin is present in lower vertebrates such as lungfish, amphibian species, reptilian species such as cobra  and tuatara, salmonid species and lamprey but not in eel, Antarctic toothfish, carp or cartilaginous fish (Metcalf, V.J., George, P.M. & Brennan, S.O., Lungfish albumin is more similar to tetrapod than to teleost albumins: purification and characterisation of albumin from the Australian lungfish, Neoceratodus forsteri. Comp. Biochem. Physiol. B 147:428-437, 2007 and references therein).

The albumin gene and superfamily

Human serum albumin is a member of the albumin superfamily, which also includes α-fetoprotein, vitamin D-binding protein (Gc-globulin) and afamin (α-albumin). All four proteins are transport proteins with albumin as the quantitatively most important one. The plasma concentration of vitamin D-binding protein and afamin is only about 5 µM and 0.8 µM, respectively. α-Fetoprotein is an important plasma protein in the fetal state, but it is practically speaking absent in healthy, adult persons.

Recently, a fifth member of this gene family has been found. It is named the α-fetoprotein related gene, because it shows greatest similarity to this family member. However, the gene in human and other primates contains multiple mutations which turn the gene into an inactive pseudogene (Naidu, S., Peterson, M.L., Spear, B.T., Alpha-fetoprotein related gene (ARG): a new member of the albumin gene family that is no longer functional in primates. Gene, 449: 95-102, 2010.).

All the genes are single-copy genes, and the four active ones in the human are expressed in a codominant manner, i.e., both alleles are translated. The genes lie on chromosome 4, near the centromere for the long arm, at position 4q11-13. The genes for albumin (NCBI Reference Sequence: NC_000004.12 (73404255..73421412;, α-fetoprotein (NCBI Reference Sequence: NC_000004.12 (73436219..73455785);

and afamin (NCBI Reference Sequence: NC_000004.12(73481740..73504001); are tandemly arranged in the same transcriptional orientation; the inactive α-fetoprotein related gene is also oriented in the same way. In human, the distances between the genes for albumin–α-fetoprotein and for α-fetoprotein–afamin are 14.8 and 26.0 kilobase pairs, respectively. The gene for vitamin D-binding protein (NCBI Reference Sequence: NC_000004.12 (71741678..71805520);  is less tightly linked, located 1.6 megabase pairs upstream of the 5' end of the albumin gene and is in the opposite transcriptional orientation. The five genes have arisen from a common ancestor through a series of duplication events and are tightly linked in all species where this has been investigated. The albumin gene  has 16,961 nucleotides from the putative “Cap” site to the first poly(A) addition site. It is split into 15 exons by 14 intervening sequences, which are symmetrically placed with the three domains of the albumin molecule and are thought to have arisen by triplication of a single primordial domain (Minghetti, P.P., Ruffner, D.E., Kuang, W.J., Dennison, O.E., Hawkins, J.W., Beattie, W.G. & Dugaiczyk, A., Molecular structure of the human albumin gene is revealed by nucleotide sequence within q11-22 of chromosome 4. J. Biol. Chem. 261: 6747-6757, 1986).

The albumin mRNA (NCBI Reference Sequence: NM_000477.5) encodes a precursor protein (preproalbumin) of 609 amino acid residues. Cleavage of the signal peptide of 18 residues and the propeptide of six residues yields the mature protein of 585 residues.


Mutations in the albumin gene may result in the presence of two circulating forms of the protein (bisalbuminemia or alloalbuminemia, MIM # 103600) or in the virtual absence of the protein from the blood (analbuminemia, MIM # 103600).

1. Alloalbumins

At present, 73 different mutations, resulting in 71 distinct genetic variants of human serum albumin and proalbumin have been molecularly characterized at the protein and/or gene level. Two alloalbumins, Redhill and South Pacific, have two independent substitutions in the same allele (see GENETIC VARIANTS OF HUMAN SERUM ALBUMIN).

The frequency of bisalbuminemia in the general population is probably about 1:1,000, but it can be much higher in isolated populations. Mutations are often due to hypermutable CpG dinucleotides, and in addition to single-amino acid substitutions, glycosylated variants and C-terminally modified alloalbumins have been found. Some mutants show altered stability in vivo and/or in vitro. High-affinity binding of Ni++ and Cu++ is blocked, or almost so, by amino acid changes at the N-terminus. In contrast, substitution of 66Leu and 218Arg leads to strong binding of triiodothyronine and L-thyroxine, respectively, resulting in two clinically important syndromes. Variants often have modified plasma half-lives and organ uptakes when studied in mice.

Because alloalbumins do not seem to be associated with disease, they can be used as markers of migration and provide a model for study of neutral molecular evolution. They can also give valuable molecular information about albumins binding sites, antioxidant and enzymatic properties, as well as stability. Mutants with increased affinity for endogenous or exogenous ligands could be therapeutically relevant as antidotes, both for in vivo and extracorporeal treatment. Variants with modified biodistribution could be used for drug targeting. In most cases, the desired function can be further elaborated by producing site-directed, recombinant mutants.

For more information, see the review of Kragh-Hansen, U., Minchiotti, L., Galliano, M. & Peters, T. Jr., Human serum albumin isoforms: genetic and molecular aspects and functional consequences. Biochim. Biophys. Acta 1830: 5405-5417, 2013.

2. Congenital analbuminemia

Congenital Analbuminemia (CAA) is manifested by the presence of a very low amount of circulating serum albumin (see the ANALBUMINEMIA REGISTER), in the absence of hepatic dysfunction, renal or gastrointestinal losses, and redistribution into extravascular compartments.

The condition is very rare. In spite of the fact that the trait is readily detected by routine serum protein electrophoresis, only 66 cases have been so far reported world-wide and are listed in the ANALBUMINEMIA REGISTER. The cases are   numbered in chronological order of the year of first published report. The prevalence of CAA is estimated at less than 1 in 1 million, apparently without gender or ethnic predilection. In the adult population, CAA is generally thought as a benign condition, since the absence of the major blood protein is partially compensated for by an increase in serum globulin concentrations. Usually, analbuminemic individuals have few clinical symptoms of their condition other than mild oedema, fatigue, and, especially in adult females, lipodystrophy, with massive cellulite deposits on their thighs and buttocks (see ANALBUMINEMIA REGISTER). Low blood pressure, decreased proportion of extravascular albumin, strikingly prolonged albumin half-lives, and increased erythrocyte sedimentation rate were observed in several cases. The most common biochemical finding is a gross hyperlipidaemia, with a significant hypercholesterolemia. Most individuals lacking albumin can live fairly normal lives, including parenting of children, and their longevity appears not to be significantly affected. However, due to the lack of follow-up data in most cases, it is hard to draw conclusions about the long-term outcome, and the possibility that analbuminemic individuals may be at risk for atherosclerotic complications still remains an open question.

In contrast to the benign presentation of CAA after birth, the prenatal course appears less favourable. Placental edema and fetal death of siblings was frequently noted in the families of analbuminemic subjects, suggesting that albumin has a crucial role in foetal development. A recent study (Toye, J.M., Lemire, E.G. & Baerg K.L., Perinatal and childhood morbidity and mortality in congenital analbuminemia. Paediatr. Child Health 17(6): e20-3, 2012) shows that CAA is a risk factor also during the perinatal and the childhood period, confirming the hypothesis that the rarity of the trait may be attributed to the fact that only a few analbuminemic individuals survive past the neonatal state.

3. Molecular genetics of congenital analbuminemia

43 among the 66 cases reported in the Register have been so far studied at the molecular level, allowing the identification of 24 different causative defects. The results are summarised in the TABLE OF ANALBUMINEMIA CAUSING MUTATIONS. In all the cases, except for case #32 of the REGISTER, the mutation was found in the 14 exons of the albumin gene and in the intron/exon junctions.

The results shows that CAA is an autosomal recessive disorder caused by the inheritance of abnormal albumin alleles from both parents. In the heterozygous state the single normal allele is sufficient to produce about half the normal amount of albumin, and those individuals generally have albumin concentrations close to the lower limit of the normal range (about 30-35 g/L).

22 among the 24 different mutations identified in analbuminemic subjects, cause CAA at the homozygous state. They include 1 mutation in the start codon, 1 frame-shift/insertion, 6 frame-shift/deletions, 6 nonsense mutations, and 8 mutations affecting splicing. Compound heterozygosity for the remaining two molecular defects, a nonsense mutation and a splice site mutation with subsequent reading frame-shift, caused CAA in an Italian man (case #23 of the REGISTER). Thus, nonsense mutations, mutations affecting splicing, and frame-shift/deletions seem to be the most common causes of CAA.

The vast majority of the causative molecular defects are unique, i.e. they have been found in only a single individual or in members of the same family. Exceptions are Bethesda, El Jadida, and especially Kayseri.  Both the Bethesda and the El Jadida mutations were identified in two unrelated individuals. The c.412C>T Bethesda mutation lies in a CGA codon containing a hypermutable CpG dinucleotide site, suggesting that it may have occurred independently in the two subjects. To date,   the AT deletion at nucleotide positions c. 228–229 of analbuminemia Kayseri is by far the most frequent cause of CAA identified, having been found in 13 individuals, belonging to geographically distant and apparently unrelated ethnic groups. Therefore, it accounts for about one third of the cases characterised at the molecular level. In addition, the frequency of this mutation seems to be significantly higher in restricted and minimally admixed population groups than that of CAA in the average population. Examples are two First Nation communities of Cree descent living in the northwestern central plains of Saskatchewan (Canada) (cases # 27, 28, 34, and 36 and footnote in the REGISTER) and a Slovak gypsy settlement (cases # 39-41, plus other currently under investigation by Dr. S. Rosipal, personal communication).  Therefore, probably the Kayseri mutation accounts for about half of the known affected individuals world-wide.

The molecular defects are located in nine different exons (1, 3, 4, 5, 7, 8, 10, 11, and 12) and in six different introns (1, 3, 6, 10, 11, and 12), suggesting that CAA is the result of widely scattered random sequence variations. However, the increasing knowledge of the causative defects seems to bring to light the presence of regions in the albumin genethat are prone to mutations. Two of them appear to be localised in the intron 6-exon 7 junction (Vancouver and Seattle) and in the exon 11-intron 11 junction (Fondi, Tripoli, and Bartin).  Other hypermutable regions seem to be the sequence c.228–230 of exon 3 (Kayseri and Amasya), the sequence c.1610 – 1615 near the 3’ end of exon 12 (Locust Valley and Safranbolu) and the CpG sequence at position c.412 – 413 in the codon CGA for p.Arg138 in exon 4. The mutation c.412C>T causes analbuminemia Bethesda, identified in 2 unrelated individuals, whereas the mutation c.412C>G produces albumin Yanomama-2 (p.Arg138Gly) present in polymorphic (> 1%) frequency in an Amazonian Indian tribe (see GENETIC VARIANTS OF HUMAN SERUM ALBUMIN).

No evidence has been found for the presence in serum of the putative protein products produced as a consequence of the twenty-two mutations. The length of the abnormal polypeptide chains, for the sixteen cases in which a prediction can be made, would range from 31 (Codogno) to 532 (Locust Valley) amino acids. 

For more information, see the following reviews:

Minchiotti, L., Galliano, M., Caridi, G., Kragh-Hansen, U. & Peters, T. Jr., Congenital analbuminaemia: molecular defects and biochemical and clinical aspects.  Biochim. Biophys. Acta 1830: 5494-5502, 2013.

Minchiotti, L., Caridi, G., Campagnoli, M., Galliano, M., Kragh-Hansen, U. & Peters, T. Jr.,  Molecular Genetics of Analbuminaemia. In: eLS. John Wiley & Sons Ltd, Chichester., Jan. 2014.