On this page:

Tools overview

Here is a nonexhaustive list of the programs included in MyCGR, with the main functionalities of each one.

Manipulating sequences

`mycgr_seq.x` is used to

- generate i.i.d. or markovian sequences, according to a law and a length given in parameters,
- compute empirical frequences of nucleotides in a sequence,
- cut a sequence in several sub-sequences of a given length.

Computing the CGR

`mycgr_square.x` perform various computations on the points of the CGR in the square:

- compute and display the coordinates of the points in the CGR, from a given sequence,
- count the points of the CGR in various given zones, from a given sequence,
- compute various test statistics.

Various options modify the behavior of the program, for example to use already
existing sequences to make the tests rather than to regenerate sequences for each simulation.
This option allows to compare the various tests by applying them to the same sequences.
One can also indicate parameters for the cache in order to optimize computations
in some cases. The programs `mycgr_segment.x` and `mycgr_tetra.x` offer the same functionalities for the points of the CGR on the segment and in the tetrahedron.

Distances

`mycgr_square_dist.x` implements the computation of the distances between sequences, by generalizing in the CGR
the dinucleotide-relative abundance profile. There are several options for the computation
of these distances, in particular one can choose to compute the absolute values or the
squares of the differences. Moreover, one can possibly gather the distances by species.
The results can be generated in the formats Graphviz,
Newick ^{[1]},
PHYLIP,
LaTeX.
Computations on several hundreds of long sequences and for partitions composed of many
zones take time. To solve this problem, the program can launch them in parallel on several
machines by placing all the files on a shared file system (NFS for example).
The cluster of the INRIA Rocquencourt was thus used to divide by 15 the time taken
by long computations. A database can also be used to store intermediate results more effectively.
The programs `mycgr_segment_dist.x` and `mycgr_tetra_dist.x` offer the same functionalities for the CGR on the segment and in the tetrahedron.

Drawing

`mycgr_square_draw.x` draws the CGR in the square. It reads the coordinates of points to draw on its standard input
(these coordinates are generated by the program `mycgr_square.x`).
Various options allow to:

- display a grid on a given size,
- draw a partition defined in a given file,
- show the construction of points with arrows,
- hide coordinates and/or letters corresponding to the corners,
- display, instead of points, the frequencies of points in sub-squares of a given size. The color of each sub-square is darker when the frequency is higher.

The generated files are in PostScript of Embedded-PostScript format.

Tests of the structure of sequences

`mycgr_square_test_markov.x` is used to empirically evaluate the level and power of the tests of
markovian structure of order m. In parameter, one can choose the number of experiments,
the partitions, the lengths and the types (i.i.d., Markovian, Markovian mixed) of sequences
that one wants to test. One can also make these simulations with the method of
Bonferroni ^{[2]}.
The results are generated in the LaTeX format. The equivalent programs exist for the CGR on the segment
(`mycgr_segment_test_markov.x`)
and in the tetrahedron
(`mycgr_tetra_test_markov.x`).

`mycgr_square_vn.x` empirically evaluates the level and power of the independence test,
by chosing the number of experiments, the partitions, the lengths and the type
(i.i.d., Markovian, Markovian mixed) of the sequences to test.
The equivalent programs exist for the CGR on the segment
(`mycgr_segment_vn.x`)
and in the tetrahedron
(`mycgr_tetra_vn.x`).

Construction of zones and partitions

`mycgr_square_zones.x` generates files of zones or partitions on the unit square.
This can be zones corresponding to words or to regular or random subdivisions of the square.
One can also generate random rectangles and circles. It is also possible,
to make a partition, to define one of the zones as the complementary of the others.
At last, it is also possible to define a partition by cutting out the square regularly
or randomly in a multitude of zones and then grouping them in N sets, possibly in
a nonequiprobable way. The equivalent programs exist for the CGR on the segment
(`mycgr_segment_zones.x`)
and in the tetrahedron
(`mycgr_tetra_zones.x`).

Coherent use, naming convention

Because of the many experiments made on various sizes of sequences with various
zones and various methods to compute the distances,
some naming conventions for the files became necessary.
All the files are thus placed in a tree structure of directories whose root
(`meta_root`)
is a parameter of compilation. Then the files are oganized in the following way:`meta_root/sequences` contains the orginial sequences.`meta_root/ size` contains the sequences of size

The program `mycgr_meta.x` is used to launch the other MyCGR tools with the correct options and
filenames to respect the naming conventions. This simplifies the commands
to use the tools and place the results files in the correct directories.

`mycgr.x` is a graphical interface to access to the main functionalities
of the other tools while respecting the naming conventions of the files:

- handling of original sequences and extraction of smaller sequences to use them in simulations,
- display of the CGR in the square for a given sequence, or a given partition file; one can also merge the two representations and then visualize the frequency of points in the zones of a partition. The user can save the resulting image in a file.
- handling of the files of zones defined for the segment, the square and the tetrahedron,
- browsing of the results already obtained and display each file with the appropriate tool (which can be parameterized),
- computation of distances between species, for a given length of sequence and other parameters.

Click on the images to enlarge these screenshots of `mycgr.x`: