CODONOME
User Guide -
October, 2012
Mac OS X and Windows versions
INDEX
INTRODUCTION
INSTALLATION
USE
1.
Creation
of a local RefSeq entries database using
RefSeq_parser database table
1.1. Downloading
and editing the RefSeq database
1.2.
Importing and creating the local RefSeq entries
database
1.3.
Counting codons
2.
Importing the expression value data
2.1.
Obtainment
codons expression values (codonome)
2.2.
Collecting
by aminoacyl-tRNA synthetase
2.3.
Obtainment
aminoacyl-tRNA synthetases expression
values
GENERAL
DEFINITIONS
5.1
File
5.2
Table
5.3
Record
5.4
Field
5.5
Layout
5.6
Browse
Mode
5.7
Find
Mode
5.8
Preview
Mode
MENU
AND COMMANDS
6.1
TRAM
6.2
File
6.3
Edit
6.4
View
6.5
Records
6.6
Scripts
6.7
Help
TROUBLESHOOTING
TECHNICAL
NOTES
7.1
Software
known limits
7.2
Bugs
report
ACKNOWLEDGEMENTS
INTRODUCTION
(Back to Index)
CODONOME collects
the
expression value of each codon (just called "codonome") and of each
aminoacyl-tRNA synthetase
(aaRS). To do this, the software is able to count the total mRNA codon
number of any organism
and to import and integrate any mRNA
expression
data source in tabulated text format.
This
guide is designed to give detailed documentation about CODONOME software.
It
shows how to install the
software, how to count mRNA codons and how to import
expression to study the codonome.
Download CODONOME for Mac OS X
or for Windows from the following address:
http://apollo11.isto.unibo.it/software/
The software minimum requirements
are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.
CODONOME is based on FileMaker
Pro
10
(FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and is released as a FileMaker
Pro 10 template, along
with a runtime
application able to run
"FileMaker Pro" at the
core of the software.
The runtime is freely
distributed, in compliance with the license of "FileMaker Pro 10
Advanced"
developer package that was used to create the program.
Standard
database commands (Find, Sort, Export records) are available within
each layout of CODONOME (see 'GENERAL DEFINITIONS'
and 'MENU AND COMMANDS' sections in this Guide).
INSTALLATION
(Back
to Index)
Once
decompressed, CODONOME is ready to be used.
Please do not
change the name of any files and folders of the CODONOME software.
You may download multiple copies
of CODONOME and run them simultaneously, provided that each "CODONOME"
folder
is located in a separate directory.
USE
(Back
to Index)
1 Creation of a local RefSeq entries
database using RefSeq_parser database table
(Back
to Index)
CODONOME software is designed to parse
RefSeq entries to
create a RefSeq database and to calculate the transcriptome (total
mRNA) codon number
of an organism. By collecting these data with expression values data
you
can know how many codons are actually used by one cell of that tissue.
Moreover you can study the consequences of that putative preferential
use in the aminoacyl-tRNA synthetases usage.
1.1
Downloading and editing the RefSeq
database
Download the RefSeq text
file of the desired species at:
ftp://ftp.ncbi.nih.gov/refseq/
(choose "mRNA_Prot" folder,
download "gbff.gz" format file, decompress
it as usual).
Then edit the downloaded file
using the Unix commands "tr" and "awk".
The file must be placed in
the same directory from which the commands are launched.
These commands are also
included in Mac OS X and in most Unix-like systems, e.g. Linux.
Editing is performed using
this instruction:
tr -ds "\n" "[:space:]" <
gbff.txt | awk '{gsub ("//LOCUS",
"\rLOCUS"); print $0;}' | tr -d "//\n" > out.txt
where "gbff.txt" should
actually be the name of the downloaded RefSeq file, and "out.txt" the name of the
edited file produced as the output.
1.2
Importing and creation of the local
RefSeq entries database
Open the CODONOME
software and switch to the RefSeq_Parser
table.
(To switch among different
database tables, use the "Layout"
menu at the upper left corner).
Choose the command "Import
records"
from the "File" menu.
Select the file to be imported choosing: "Tab-separated text" from the
"Show" pop-up menu.
The software calculates and
extracts this information in specific
calculated fields:
FIELD
DESCRIPTION
"FASTA":
the entry in FASTA format,
including accession number and
mRNA
sequence;
"LOCUS":
the entry accession number;
"bp":
the length of the entry sequence
(in
bp);
"CDS_start":
the position
of the
entry-recorded translational
start codon;
"CDS_start_Prokaryotes": the position of
the
entry-recorded translational
start codon (if the investigated
organism is a Prokaryotes);
"CDS_end":
the position
of the
entry-recorded translational
codon just
before the end codon;
"CDS_end_Prokaryotes":
the position of the
entry-recorded translational
codon just
before the end codon
(if the investigated organism is
a Prokaryotes);
"UTR5'_length":
the length
of the
mRNA 5' UTR
sequence;
"Seq":
the mRNA sequence;
"Seq_UTR5'":
the
mRNA 5' UTR sequence;
"CDS":
the CDS
sequence;
"CDS_Prokaryotes":
the CDS sequence (if the
investigated organism is a
Prokaryotes);
"Complement":
if the investigated organism is a
Prokaryotes shows "YES" if the
gene is in complement (otherwise
shows "NO");
"SYMBOLUM":
the gene
symbol;
"SYMBOLUM_Prokaryotes":
the gene symbol (if the
investigated organism is a
Prokaryotes).
AAA_N, ACC_N fields are used in
the next step.
1.3
Counting codons
The
aim of this step is to obtain a RefSeq mRNA codon count. To do this,
click on the 'Codon count' button (or 'Prokaryotes codon
count' if your chosen organism is a
Prokaryiotes).
First, the software deletes all
but the "NM" entries (so the coding sequence);
then it proceeds with the codon count until you will see the codon
numer in AAA_N, AAC_N, etc. fields.
The software automatically
enters into the Codonome table the gene symbol,
the codon count of each mRNA
and calculates the codon count sum of the whole transcriptome
and
the per mil frequency of each codon.
(See the 2.1 section for the fields
description of the Codonome table).
2 Importing the expression values
data
(Back
to Index)
CODONOME software is optimized to
parse expression data, integrating them
with the codon count to give the expression value of each codon of
both each mRNA and the whole transcriptome.
First, you need an expression
data
text file with two columns separated by the tabulator keyboard button
(TAB, ACII19):
Gene
symbol and Expression value.
[Columns Headers are not required]
[Gene symbol]
[Value]
PRY
119.8678872124652088
DRP2
48.5523996241932508
SERBP1 47.4984303755452469
...
...
For example, first you may do
your analysis with TRAM and then export the required informations in a tabulated text
file.
Name the file as "expression.tab"
and put it into the CODONOME folder.
2.1
Obtainment of codons expression
values
(codonome)
In the Codonome table, CODONOME
software gives the expression value of each
codon, called "codonome" in the following steps.
From the Codonome table choose
the 'Import
expression values' button.
The software loads the
expression value at the corresponding mRNA; then it automatically
calculates the codonome count of each mRNA,
the codonome count
sum of the whole transcriptome
and
the per mil frequency of each codonome.
The fields in Codonome table are:
FIELD
DESCRIPTION
"Gene_Symbol":
the gene symbol;
"AAA_N":
the each codon count of each mRNA;
"Expression":
the expression value of each mRNA;
"AAA_N_E":
the
count of each
codon of each
mRNA multiplied by the
expression
value (the
codonome count of each
mRNA);
"Expression_Random":
another randomly chosen mRNA
expression value;
"Random_Number":
a random number from 0 to 1 used
instead
of the expression value;
"Random_Number_Int":
a random number from 1 to 10^4
used instead of the
expression
value;
"Codon_Bias_Sum*":
the sum of each codon of the whole
transcriptome;
"Tot_Codon_Bias_Sum*":
the sum of every codon of the
whole transcriptome;
"Codon_Frequency*":
the per mil frequency of each
codon;
"Media*":
the medium
value of each codon;
"Standard_Dev*":
the standard deviation value of
each codon;
"Codonome_Bias_Sum*":
the sum of each codonome of the
whole transcriptome;
"Tot_Codonome_Bias_Sum*": the sum of all
transcriptome
codonomes;
"Codonome_Frequency*":
the per mil frequency of each
codonome.
*: at the bottom of the page
2.2
Collecting by aminoacyl-tRNA
synthetase expression values
The Codon_Synthetases table is
preset for Homo sapiens, Danio rerio, Caenorhabditis elegans,
Saccharomyces cerevisiae and Escherichia coli and contains the
following fields:
FIELD
DESCRIPTION
"Amino acid": the
three letter amino acid symbol;
"Codon":
the amino acid coding codon;
"aaRS_HUMAN":
the aminoacyl-tRNA synthetase gene symbol of
Homo sapiens
(for the stop codons the field is empty);
"aaRS_BRARE": the
aminoacyl-tRNA synthetase gene symbol of
Danio rerio;
(for the stop
codons the field is empty);
"aaRS_CAEEL": the
aminoacyl-tRNA synthetase gene symbol of
Caenorhabditis
elegans;
(for the stop
codons the field is empty);
"aaRS_YEAST": the
aminoacyl-tRNA synthetase gene symbol of
Saccharomyces
cerevisiae;
(for the stop
codons the field is empty);
"aaRS_ECOLI": the
aminoacyl-tRNA synthetase gene symbol of
Escherichia coli;
(for the stop
codons the field is empty);
"aaRS_Other": if your
organism has different gene symbols,
you can type them here.
(for the stop
codons the field is empty).
From bacteria to Homo
sapiens,
the tetrameric subunit organization of cytoplasmic phenylalanyl-tRNA
synthetase is markedly conserved: in each organism there are two
different gene that
codify, one for each of the two subunits (alpha and beta).
In the Codon_Synthetases table (and also in Synthetases table, see the
2.3
section), to
simplify, we use the incomplete gene symbol (e.g. FARS instead of FARSA
or FARSB for Homo sapiens).
Then choose the 'Group'
button from the
Codon_Synthetases table. Write the organism's Latin name (e.g.: "Homo sapiens") in the windows that appears.
CODONOME software collects the codon and the codonome per mil
frequencies
by
aminoacyl-tRNA synthetase and groups them per synthetases and switches
to
the
Synthetases table.
(See the 2.1 section for the fields
description of the Synthetases table).
2.3
Obtainment aminoacyl-tRNA
synthetases expression values
CODONOME software also gives the expression value of each
aminoacyl-tRNA synthetase.
Before importing expression values, you should search for the
phenylalanyl-tRNA synthetase record and duplicate it (command "Duplicate record"
from the "Record" menu). In
these two records, you should replace the gene symbols with the alpha
subunit gene symbol in the first record and with the beta subunit gene
symbol in the duplicated record (e.g. FARSA and FARSB instead of FARS
in Homo sapiens). Please make
no further modifications to the other fields.
You can import your expression value data file (the expression.tab
used at the 2.1 section) in
the
Synthetases table choosing
the 'Import
expression values' button.
The software automatically loads
the
expression value at the corresponding aminoacyl-tRNA synthetase.
The fields in Synthetases table are:
FIELD
DESCRIPTION
"Synthetase":
the
aminoacyl-tRNA synthetase gene symbol;
"Codon_Sum": the codon per mil
frequencies
sum of each
aminoacyl-tRNA synthetase;
"Codonome_Sum":
the codonome per mil
frequencies sum of
each aminoacyl-tRNA synthetase;
"Expression":
the expression value of
each
aminoacyl-tRNA
synthetase.
For further statistical analysis, choosing the 'Export
data' button
(in Synthetases table), you can automatically export in a file named
"biascodonome.tab" Codon_Frequency and Codonome_Frequency fields (for
each
codon) and in a file named "biascodonomepersynthetase.tab" all the
fields
in the Synthetases table.
These text files will appear in the CODONOME folder.
GENERAL
DEFINITIONS
(Back
to Index)
5.1 File
A set of database tables.
5.2 Table
A
set of records referring to
the
same subject type (e.g., the 'Genes' table).
5.3. Record
One set of
fields which represent one entry (i.e. containing all requested
data for a subject, e.g. a gene probe).
The record browser is a small
book icon at the top left of the window. You may also browse the
records faster using the cursor at the right of the small book icon.
5.4. Field
The database unit containing
a specific data type (e.g., 'Gene_name').
5.5. Layout
A
particular graphical
organization of the field of a table.
A table can be visualized into
more than one layout.
A layout may display fields from
a table or its related
fields from other tables.
A file may show data
within different layouts.
Visualization of a field is
independent from the storage of the contained data.
Browse among the layouts can be
made clicking on the 'Layout:' pop-up Menu at the upper
left corner.
You may browse the database by
clicking on the small book pages at the top left of the window, or
using the cursor at the
right of the small book icon, or by
entering a record number and
clicking on the "Return" key.
The
following information is constantly displayed in the window top bar (if not, select "Status
Toolbar" from the "View" Menu):
Records:
total number of Records in the table.
Found:
total number of the subset of Records currently selected. Clicking on the green circular button will
retrieve the complementary subset of currently omitted records.
Sorted:
sorting status of the Records (Sorted/Unsorted).
The FileMaker Pro-based
database
may be used basically in these "modes":
'Browse',
'Find', and 'Preview'.
Switching among different modes
can be obtained from the 'View'
Menu or from the pop-up
Menu bar at the
bottom left of the window.
5.6 Browse
Mode
One way to use the database.
It
allows entry, view,
browse, sort, and
manipulation of data.
It may be selected from:
the 'View' menu, or
the mode pop-up Menu bar, at the bottom left of the window.
In
the 'Browse' mode, the record sets can be browsed by clicking on the small book icon
(with the arrows to move 'back' and 'forward') in the upper left corner.
Browsing
among the tables can be done by clicking on the
'Layout' pop-up Menu at the upper left corner.
5.7 Find Mode
An
alternative mode to use the
database.
It allows searching for specific
content in the database fields, using any different combination of
criteria
(see the 'Search mode' section
below for more details).
It may be selected from:
the 'View' menu, or
the mode pop-up Menu bar, at the bottom left of the window.
The user can fill a blank
form allowing to search in
specific fields.
In the "Find" mode, the
small book icon in the upper left corner represents different "requests" that
are made for searching the database.
In FileMaker Pro 'Find' mode, the
"AND" - "OR" - "NOT" operators may be implemented in this way:
"AND" by filling criteria in different fields
located in the
same "Request",
"OR" by generating additional requests
(from
"Requests" Menu) in the same query,
"NOT" by generating additional
requests
(from
"Requests" Menu) and
clicking on the "Omit"
button (located in the window top bar).
The 'Operators' pop-up Menu appears
clicking on a field while pressing the 'ctrl'
key, allowing query of:
exact
matches, duplicate values, ranges,
wild cards and more.
Click on the 'Perform Find'
button at the top of the window to start the query.
The result of the search is the
subset of the entries matching the set search criteria.
5.8
Preview
Mode
An
alternative way to use the
database.
It visualizes a print preview of
the found records.
It may be selected from:
the "View" menu,
or the pop-up Menu bar, at the bottom left of the window.
In the "Preview" mode, the user
can
obtain a print preview of the data in the current table.
Browsing among the tables can be
done by clicking on the
'Layout:' pop-up Menu at the upper left corner.
MENU AND
COMMANDS
(Back
to Index)
6.1 "TRAM" Menu
(Back
to Index)
About
FileMaker Pro Runtime...
Information about FileMaker Pro
Runtime at the core of the software.
Preferences...
Standard preferences panel;
cache memory size can be set up to 256 Mb.
Hide TRAM
Hiding all TRAM windows.
Quit TRAM
Closing the program.
6.2 'File' Menu
(Back
to Index)
File Options...
It is
possible to set only the "Spelling" options.
Change
Password...
There is no default password set.
Page setup...
Standard page set up command.
Print...
Standard print command.
The appearance will match the
layout currently
displayed on the screen.
Import Records
This is the general "Import"
function of FileMaker Pro.
Export
Records...
Export
command for the found
records set in a given table.
Records are exported in their
current sorting mode.
User can select fields to be
exported, their relative order,
and the separation character.
Save a Copy
as...
Save
a copy of the database,
complete, compressed or as
a clone (database structure with
no record present).
6.3 'Edit' Menu
(Back
to Index)
Undo
Standard "Undo" command.
Cut
Standard "Cut" text command.
Copy
Standard "Copy" text command.
Paste
Standard "Paste" text command.
Select all
Selection of all text present
within
a selected field
(to select a field, click into
the field).
Find/Replace
Utility for searching/replacing
text
strings within fields.
Note: Use 'Find' mode (from
'View' Menu)
for full search and selection of a record set.
Spelling
Utility for check spelling of
text strings within fields.
Export Field
Contents...
Utility to export the contents of
the selected field to a file.
6.4 'View' Menu
(Back
to Index)
Browse Mode
Switch to the 'Browse Mode' (see
"General Definitions" above).
Find Mode
Switch to the 'Find Mode' (see
"General Definitions" above).
Preview Mode
Switch to the 'Preview Mode' (see
"General Definitions" above).
Go to layout
A possible way to switch between
different layouts.
View as Form
A possible way to individually display the current record of a found set of records.
View as List
A possible way to display all the
records of a found set in the form of a list.
View as Table
A possible way to display all the
records of a found set in the form of a spreadsheet-like table.
Toolbars
To switch on/off the toolbars of
the application: "Standard"
and "Text Formatting".
Status Area
To switch on/off the "Status
Area", the toolbar located at the top of the program window.
Text Ruler
To switch on/off the text ruler
of the application.
Zoom in
Used to increase layout
dimensions.
Zoom out
Used to decrease layout
dimensions.
6.5 'Records' Menu
(Back
to Index)
New Record
Creating a new empty record in
the
database.
The new Record will be the latest
of the current record set.
Duplicate
Record
Duplicating the current record in
the database.
The new Record will be the latest
of the current record set.
Delete
Record...
Deleting the current record in
the
database.
Delete Found
Records...
Deleting all currently found
records in the database.
Go to Record
Moving to the selected record by
number, previous or next.
Show All
Records
Showing all the records in the
database.
Show Omitted
Only
Showing all the records in the
database
not included in the current 'found' set.
Omit Record
Removing the selected record out
of
the current found set,
without deleting it.
Omit
Multiple...
Removing more than a record,
selected by numbers, out
of the current found set,
without deleting them.
Modify Last
Find
Returning to the last performed
search in order to edit it.
Saved Finds
Saving a set of search criteria.
Sort Records...
Sorting the current records set
according to desired criteria.
Unsort
Display the current records set
according to the order of creation of each record.
Replace Field
Contents
Replace the value of a field into
all found set of record with the value specified in the current
record, or by calculation.
Relookup Field
Contents...
This command executes a relook up
of the value of a field by reading the matched value in a related table
(the relationship has been established during database development
using a 'key' field).
Revert
Record...
Restoring the value of a field,
discarding any change,
before clicking out of that field.
6.6 'Scripts' Menu
(Back
to Index)
About
This opens the 'About' window
containing information about the TRAM software.
Guide
The page with the user Guide of the
TRAM software
(this Guide).
6.7 'Help' Menu
(Back
to Index)
Search
Search a system 'Help' for the
general commands.
TROUBLESHOOTING
(Back
to Index)
Sometimes, power failure,
hardware problems, or other factors can damage a FileMaker Pro database file.
When the runtime application
discovers a damaged file, a dialog box appears, telling the users to contact the
creator.
Even if the dialog box does not
appear, files can exhibit erratic behaviour.
If you have FileMaker Pro or FileMaker Pro Advanced installed you can recover it using the 'Recover' command.
Otherwise, to recover a damaged file:
- On Mac OS X machines,
press Command + Option (cmd-alt) while double-clicking the runtime application icon. Hold the keys down until you see
the 'Open Damaged File' dialog box.
- On Windows machines,
press
Ctrl+Shift while double-clicking the runtime application icon. Hold the
keys down until you see the Open Damaged File dialog box.
During the recovery process, the
runtime application:
1. Creates a new file;
2.
Renames any damaged file by
adding “Old” to the end of the
file name;
3. Gives the repaired file the
original name.
TECHNICAL
NOTES
(Back
to Index)
The software minimum requirements
are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.
Other specifications may be found here.
The scripts at the core of
TRAM
software are "FileMaker Pro" scripts.
TRAM is composed of a 137 MB database engine ('TRAM') and of a
template ('TRAM.TMA') with 37 data tables, with 117 relationships among
them and 434 script definitions.
Following set up including NCBI UniGene and UCSC EST localization data,
the size becomes 3.6, 2.0 GB and 742 MB for human, mouse and zebrafish
'TRAM.TMA' file, respectively.
Importing the 28 human microarray sample data file for the test of
biological model raised 'TRAM.TMA' file size to 4.3 GB.
Time required to import and process a typical microarray data file is
about 10 minutes.
Typical execution time is 1-2 hours for a 'Map' analysis and 5-10 minutes for a 'Cluster' analysis,
depending on the number of analyzed samples, which also heavily affects
the time required to refresh data when the type of data normalization
is
changed.
Large file size and relative slowness of data processing are mainly
due to systematic indexing of all data contained in TRAM, with the
advantage of very fast data browsing, navigation and search at the end
of data import and processing, which may be run in batch mode.
We
encourage any creative use, modification and noncommercial
redistribution of TRAM, as long as the original paper is cited, and
statement that the original program has been modified is provided (in
such a case).
7.1 Software known limits
(Back
to Index)
Due to FileMaker Pro
limits:
maximum TRAM file size is 8
terabytes (1024 gigabytes);
text field can contain up to 2 GB
of characters;
numbers field can contains values
up to 800 digits.
Due to TRAM limits:
in order to generate consistent transcriptome maps, TRAM currently
deletes all genes with ambiguous mapping, but this involves data loss
for few genes that are biologically present in different locations,
i.e. genes common to X and Y
chromosomes
(e.g., CSF2RA). We are
working to fix this problem.
The limit of 25 chromosomes for a genome is declared only for the
possibility to display synthetic maps with all chromosomes shown
horizontally aligned; however, it does not apply to the data
import, standard visualization mode and all data analysis.
7.2 Bugs report
(Back
to Index)
Please report any suggestion, bugs or
problems to:
Pierluigi Strippoli
[email protected]
ACKNOWLEDGEMENTS
(Back
to Index)
Thanks to NCBI for the "Entrez"
databases and to UCSC Genome
Bioinformatics for the "UCSC
Genome Browser".
Thanks to FMPexperts
List and FMForum
for suggestion and tips about
FileMaker Pro.