Agritrop
Accueil

Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from markov chains

Guédon Yann, D'Aubenton-Carafa Yves, Thermes Claude. 2006. Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from markov chains. Journal of Mathematical Biology, 52 (3) : 343-372.

Article de revue ; Article de revue à facteur d'impact
[img] Version publiée - Anglais
Accès réservé aux personnels Cirad
Utilisation soumise à autorisation de l'auteur ou du Cirad.
document_532373.pdf

Télécharger (344kB)

Liste HCERES des revues (en SHS) : oui

Thème(s) HCERES des revues (en SHS) : Psychologie-éthologie-ergonomie

Résumé : The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of firstorder Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.

Mots-clés Agrovoc : adn, nucléotide, technique analytique, modèle mathématique, intron

Mots-clés complémentaires : Séquencage

Classification Agris : L10 - Génétique et amélioration des animaux
F30 - Génétique et amélioration des plantes
U10 - Informatique, mathématiques et statistiques

Champ stratégique Cirad : Hors axes (2005-2013)

Auteurs et affiliations

  • Guédon Yann, CIRAD-AMIS-UMR AMAP (FRA)
  • D'Aubenton-Carafa Yves, CNRS (FRA)
  • Thermes Claude, CNRS (FRA)

Autres liens de la publication

Source : Cirad - Agritrop (https://agritrop.cirad.fr/532373/)

Voir la notice (accès réservé à Agritrop) Voir la notice (accès réservé à Agritrop)

[ Page générée et mise en cache le 2024-12-18 ]