Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from markov chains

Guédon Yann, D'Aubenton-Carafa Yves, Thermes Claude. 2006. Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from markov chains. Journal of Mathematical Biology, 52 (3) : pp. 343-372.

Journal article ; Article de revue à facteur d'impact
[img] Published version - Anglais
Access restricted to CIRAD agents
Use under authorization by the author or CIRAD.

Télécharger (344kB)

Liste HCERES des revues (en SHS) : oui

Thème(s) HCERES des revues (en SHS) : Psychologie-éthologie-ergonomie

Abstract : The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of firstorder Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences. (Résumé d'auteur)

Mots-clés Agrovoc : ADN, Nucléotide, Technique analytique, Modèle mathématique, intron

Mots-clés complémentaires : Séquencage

Classification Agris : L10 - Animal genetics and breeding
F30 - Plant genetics and breeding
U10 - Computer science, mathematics and statistics

Champ stratégique Cirad : Hors axes (2005-2013)

Auteurs et affiliations

  • Guédon Yann, CIRAD-AMIS-UMR AMAP (FRA)
  • D'Aubenton-Carafa Yves, CNRS (FRA)
  • Thermes Claude, CNRS (FRA)

Autres liens de la publication

Source : Cirad - Agritrop (

View Item (staff only) View Item (staff only)

[ Page générée et mise en cache le 2021-02-27 ]