Here is a nonexhaustive list of the programs included in MyCGR, with the main functionalities of each one.
mycgr_seq.x is used to
mycgr_square.x perform various computations on the points of the CGR in the square:
Various options modify the behavior of the program, for example to use already existing sequences to make the tests rather than to regenerate sequences for each simulation. This option allows to compare the various tests by applying them to the same sequences. One can also indicate parameters for the cache in order to optimize computations in some cases. The programs mycgr_segment.x and mycgr_tetra.x offer the same functionalities for the points of the CGR on the segment and in the tetrahedron.
mycgr_square_dist.x implements the computation of the distances between sequences, by generalizing in the CGR the dinucleotide-relative abundance profile. There are several options for the computation of these distances, in particular one can choose to compute the absolute values or the squares of the differences. Moreover, one can possibly gather the distances by species. The results can be generated in the formats Graphviz, Newick , PHYLIP, LaTeX. Computations on several hundreds of long sequences and for partitions composed of many zones take time. To solve this problem, the program can launch them in parallel on several machines by placing all the files on a shared file system (NFS for example). The cluster of the INRIA Rocquencourt was thus used to divide by 15 the time taken by long computations. A database can also be used to store intermediate results more effectively. The programs mycgr_segment_dist.x and mycgr_tetra_dist.x offer the same functionalities for the CGR on the segment and in the tetrahedron.
mycgr_square_draw.x draws the CGR in the square. It reads the coordinates of points to draw on its standard input (these coordinates are generated by the program mycgr_square.x). Various options allow to:
The generated files are in PostScript of Embedded-PostScript format.
mycgr_square_test_markov.x is used to empirically evaluate the level and power of the tests of markovian structure of order m. In parameter, one can choose the number of experiments, the partitions, the lengths and the types (i.i.d., Markovian, Markovian mixed) of sequences that one wants to test. One can also make these simulations with the method of Bonferroni . The results are generated in the LaTeX format. The equivalent programs exist for the CGR on the segment (mycgr_segment_test_markov.x) and in the tetrahedron (mycgr_tetra_test_markov.x).
mycgr_square_vn.x empirically evaluates the level and power of the independence test, by chosing the number of experiments, the partitions, the lengths and the type (i.i.d., Markovian, Markovian mixed) of the sequences to test. The equivalent programs exist for the CGR on the segment (mycgr_segment_vn.x) and in the tetrahedron (mycgr_tetra_vn.x).
mycgr_square_zones.x generates files of zones or partitions on the unit square. This can be zones corresponding to words or to regular or random subdivisions of the square. One can also generate random rectangles and circles. It is also possible, to make a partition, to define one of the zones as the complementary of the others. At last, it is also possible to define a partition by cutting out the square regularly or randomly in a multitude of zones and then grouping them in N sets, possibly in a nonequiprobable way. The equivalent programs exist for the CGR on the segment (mycgr_segment_zones.x) and in the tetrahedron (mycgr_tetra_zones.x).
Because of the many experiments made on various sizes of sequences with various
zones and various methods to compute the distances,
some naming conventions for the files became necessary.
All the files are thus placed in a tree structure of directories whose root
is a parameter of compilation. Then the files are oganized in the following way:
meta_root/sequences contains the orginial sequences.
meta_root/size contains the sequences of size size isolated from orginial sequences.
meta_root/cache contains the cache files.
meta_root/zones/segment contains the files of partitions on the segment.
meta_root/zones/square contains the files of partitions in the square.
meta_root/zones/tetra contains the files of partitions in the tetrahedron.
meta_root/results/size/dists contains the files of results of distances computations with sequences of size size. In this directory, the names of the files follow a naming convention to indicate the method used to compute the distances, the partition file used, whether the reversed complementary sequence was appended to sequences, and if the CGR was on the segment, in the square or in the tetrahedron.
The program mycgr_meta.x is used to launch the other MyCGR tools with the correct options and filenames to respect the naming conventions. This simplifies the commands to use the tools and place the results files in the correct directories.
mycgr.x is a graphical interface to access to the main functionalities of the other tools while respecting the naming conventions of the files:
Click on the images to enlarge these screenshots of mycgr.x: