Six-Frame Translation of Human Genome



This page provides public access to our six-frame translation of the entire human genome. You are free to download these files. Descriptions of the files are given below.

Click on the links starting with Chromosome to download a gzipped FASTA file that contains all of the non-redundant open reading frames (ORFs) for the specified human chromosome.

The file called nr6frame.tar.gz contains the C++ source code of the program used to generate the FASTA files. In order for this C++ program to work, you must download the latest build of the Human Genome. We used a "Build 35" copy available from the UCSC Genome Browser. A link to the genome file (called "chromFa.zip") is available here.

Below is an example of an entry from our fasta files explaining the various components of a single entry:


  1. A unique number assigned to every ORF in the file.
  2. The number of ambiguous nucleotides (the number of 'N's) that were encountered within the coding region of this ORF.
  3. The strand the ORF was found on.
  4. The genomic coordinate for the first NT that starts this ORF off.
  5. The number of times this ORF sequence was encountered in this chromosome.
  6. The peptide sequence obtained from the translation.

In any given *.orf file, a single entry consists of 2 lines: the fasta header line, and the actual protein sequence line.

Chromosome 1
Chromosome 2
Chromosome 3
Chromosome 4
Chromosome 5
Chromosome 6
Chromosome 7
Chromosome 8
Chromosome 9
Chromosome 10
Chromosome 11
Chromosome 12
Chromosome 13
Chromosome 14
Chromosome 15
Chromosome 16
Chromosome 17
Chromosome 18
Chromosome 19
Chromosome 20
Chromosome 21
Chromosome 22
Chromosome X
Chromosome Y

CHECKSUM

nr6frame.tar.gz

If you have any question please contact: Dr. David States.