These are online tools that are helpful to compare the DNA and protein codes of organisms.
From Protein to DNA code
- Enter a protein sequence. For this case, we are going to use collagen type 1, alpha 1 of Tyrannosaurus rex. Do not put the long XXXXXXXXX parts into the translator, since these are gaps.
- Press "translate to DNA". Do not change the settings.
- A translation will show up on the screen.
You may notice that some of the nucleotides are not the usual A, T, G or C letters that are found in most genetic code. This is because the DNA code is deduced from amino acid code. For example, lysine is coded by both AAA and AAG. The translator will not know which one is correct. Therefore it will give AAR. R can mean either A or G, while N could be any of the four bases. Below is a complete table.
- A = adenine
- C = cytosine
- G = guanine
- T = thymine
- R = G A (purine)
- Y = T C (pyrimidine)
- K = G T (keto)
- M = A C (amino)
- S = G C (strong bonds)
- W = A T (weak bonds)
- B = G T C (all but A)
- D = G A T (all but C)
- H = A C T (all but G)
- V = G C A (all but T)
- N = A G C T (any)
If one wishes for less ambiguous results, they should use EMBOSS Backtranseq. Dinosaur proteins should be translated using the chicken (Gallus gallus) codon table. To upload sequences onto EMBOSS Backtranseq, download the protein file from NCBI in BLASTA format. If the protein sequence is not on NCBI, simply copy the amino acid code onto a Notepad file.
BLAST stands for Basic Local Alignment Search Tool.
see the BLAST article for more information.
Let me show you how to do it, you don't have to be a scientists to compare DNA.
- The Cretaceous Weevil DNA sequence is number L08072 on GenBank.
- Go to blast.ncbi.nlm.nih.gov. On this website you can compare ANY DNA, RNA or protein sequence with ALL KNOWN sequences.
- We want to compare DNA sequences, so click on nucleotide blast.
- It says "Enter accession number(s), gi(s), or FASTA sequence(s)"
- Enter L08072 (the number in GenBank)
- We want to compare the Weevil sequence with ALL known DNA sequences. So, don't change the settings.
- Scroll down and hit BLAST.
- Alignments are made in a few seconds.
- Scroll down to Sequences producing significant alignments:
- Here there is a list with all the matches found.
The first column gives the number and species of the DNA sequence, the Max ident column shows how many % the sequences are identical. Lets summarize what we see:
Multiple sequence alignment
This tool puts the sequences of multiple organism in rows beneath each other. Moreover it makes sure that identical regions are precisely beneath each other.
see Multiple sequence alignment for more information.
One tool to do this is Clustal Omega. This is how it works:
- Go to Clustal Omega
- STEP 1 - Enter your input sequences
- In our case we're gonna compare proteins, so set it on PROTEINS
- We are not gonna use the large empty space.
- We will upload a Notepad file containing the sequences.
- We will compare the dinosaur Collagen alpha-1(I) chain with some birds
- Tyrannosaur sequence
- click on the top on FASTA
- copy-paste everything from ">gi" to the bottom into a Notepad file.
- Go two rows further in Notepad.
- Copy-paste in this file also the FASTA sequences of these organisms:
- Save your notepad
- Upload the notepad file
- Scroll down and hit SUBMIT
- Wait a moment
- Here you see that all sequences are aligned
- Click on SHOW COLORS to make the graph even better.