This page provides public access to our six-frame translation of the entire human genome. You are free to download these files. Descriptions of the files are given below.
Click on the links starting with Chromosome to download a gzipped FASTA file that contains all of the non-redundant open reading frames (ORFs) for the specified human chromosome.
The file called nr6frame.tar.gz contains the C++ source code of the program used to generate the FASTA files. In order for this C++ program to work, you must download the latest build of the Human Genome. We used a "Build 35" copy available from the UCSC Genome Browser. A link to the genome file (called "chromFa.zip") is available here.
Below is an example of an entry from our fasta files explaining the various components of a single entry:
In any given *.orf file, a single entry consists of 2 lines: the fasta header line, and the actual protein sequence line.
Chromosome 1
Chromosome 2
Chromosome 3
Chromosome 4
Chromosome 5
Chromosome 6
Chromosome 7
Chromosome 8
Chromosome 9
Chromosome 10
Chromosome 11
Chromosome 12
Chromosome 13
Chromosome 14
Chromosome 15
Chromosome 16
Chromosome 17
Chromosome 18
Chromosome 19
Chromosome 20
Chromosome 21
Chromosome 22
Chromosome X
Chromosome Y
CHECKSUM
nr6frame.tar.gz
If you have any question please contact: Dr. David States.