Molecular Basis of Inheritance - Notes | Class 12 | Part 9: Human Genome Project (HGP)

Molecular Basis of Inheritance: Human Genome Project (HGP)

Human Genome Project (HGP)

The entire DNA in the haploid set of chromosomes of an organism is called a Genome.
In Human genome, DNA is packed in 23 chromosomes.
Human genome contains about 3x10⁹ bp.
Human Genome Project (1990-2003) was the first mega project for the sequencing of nucleotides and mapping of all the genes in human genome.
HGP was coordinated by U.S. Department of Energy and the National Institute of Health.

Identify all the estimated genes in human DNA.
Sequencing of 3 billion chemical base pairs of human DNA.
Store this information in databases.
Improve tools for data analysis.
Transfer related technologies to other sectors.
Address the ethical, legal, and social issues (ELSI) that may arise from the project.

2 major approaches:
- Expressed Sequence Tags (ESTs): Focused on identifying all the genes that are expressed as RNA.
- Sequence annotation: Sequencing whole set of genome containing all the coding & non-coding sequence and later assigning different regions in the sequence with functions.
Procedure of sequencing: Isolate DNA from a cell → Convert into random fragments → Clone in a host (bacteria & yeast) using vectors (e.g., BAC & YAC) for amplification → Sequencing of fragments using Automated DNA sequencers (Frederick Sanger method) → Arrange the sequences based on overlapping regions → Alignment of sequences using computer programs.
BAC = Bacterial Artificial Chromosomes.
YAC = Yeast Artificial Chromosomes.
Sanger has also developed a method for sequencing of amino acids in proteins.
DNA is converted to fragments as there are technical limitations in sequencing very long pieces of DNA.
HGP was closely associated with Bioinformatics.
Bioinformatics: Application of computer science and information technology to the field of biology & medicine.
Of the 24 chromosomes (22 autosomes and X & Y), the last sequenced one is chromosome 1 (May 2006).
DNA sequencing also has been done in bacteria, yeast, Caenorhabditis elegans (a free-living non-pathogenic nematode), Drosophila, plants (rice & Arabidopsis), etc.

Human genome contains 3164.7 million nucleotide bases.
Total number of genes = about 30,000.
Average gene consists of 3000 bases, but sizes vary. Largest known human gene (dystrophin on X-chromosome) contains 2.4 million bases.
99.9% nucleotide bases are the same in all people. Only 0.1% (3x10⁶ bp) difference makes every individual unique.
Functions of over 50% of discovered genes are unknown.
Chromosome I has the most genes (2968) and Y has the fewest (231).
Less than 2% of the genome codes for proteins.
A very large portion of the human genome is made of repeated (repetitive) sequences. These are stretches of DNA sequences that are repeated many times. They have no direct coding functions. They shed light on chromosome structure, dynamics, and evolution.
About 1.4 million locations have single-base DNA differences. They are called SNPs (Single nucleotide polymorphism or ‘snips’). This helps to find chromosomal locations for disease-associated sequences and tracing human history.