G. Nuel is a senior CNRS researcher of the Institute of Mathematics (INSMI) working in Laboratory of Probability, Statistics and Modeling (LPSM, CNRS 8001) at Sorbonne Université. Since 2018, G. Nuel is the head of the Stochastics and Biology Group. Throughout his career, G. Nuel has developed a genuine interest for biomedical applications in probability and statistics based on his strong theoretical background in mathematics. He is an expert in computational statistics (simulations, the expectation-maximization algorithm, Markov chain Monte Carlo techniques, etc.) and models with latent variables (Markov chains, hidden Markov models, Bayesian networks, etc.). He has a great interest for applications in bioinformatics, statistical genetics, cancer epidemiology, tropical diseases, and clinical research.
Pr. Gregory Nuel
LPSM, CNRS 8001
Sorbonne University
Campus Pierre et Marie Curie
Office 16-26.122
This is our historical research theme. The objective is to develop statistical methods dealing with sequence data in bioinformatics: DNA, RNA reads, protein sequences, etc. Most develop methods are based on Markov models (homogeneous or not).
In this context, we are are interested in:
Main partners involved:
Grant: Sorbonne Paris Cité (2013-2016, €200K), CIMI (2017-2018, €13K).
We focus here on the genetic factors in important age-dependent diseases like cancer, diabetes or rare genetic diseases. The challenge is to combine state-of-art survival analysis methods in the context of genetic dependence in (possibly large) pedigrees.
In this context, we are are interested in:
Main partners involved:
Grants: INSERM/IRESP DECURION (2013-2016, €120K), LNCC PhD grant (2013-2016, €100K), LNCC PhD grant (2018-2021, €100K), SATT IDF INNOV, SIRIC CURAMUS.
We focus here on the inference of causal relationships between genes in system biology using both observational and interventional experiments. Our main model is based on Gaussian Bayesian networks.
In this context, we are are interested in:
Main partners involved:
Grants: PhD grant ED386 (2014-2017, €90K), INRA (2015-2017, €40K), FRM (2018-2021, €40K)
In Genome-Wide Association Studies (GWAS) we study the relation between a phenotype (usually a binary trait, ex: affected/non-affected by a disease) and high density genotypes (ex: 500,000 SNPs).
In this context, we are are interested in:
Main partners involved:
Grant: ANR SAMOGWAS (2013-2017, €20K).
Malaria is a mosquito-transmitted tropical disease which kills more than 500,000 persons each year, 90% of the victims being african children before age 5. By following cohorts of individuals in West Africa of Southeast Asia, we focus on many epidemiological aspects of the disease.
In this context, we are are interested in:
Main partners involved:
Grants: ANR Tolimunpal (2012-2015, €20K), SCAC/IRD PhD grant (2013-2016, €50K), AMIES (2014-2015, €10K).
Ciguatera Fish Poisoning (CFP) is a foodborne illness caused by the presence of a neuro-toxin (Ciguatera Toxin - CTX) in certain reef fishes. The CTX is produced by micro-algaes like Gambierdiscus toxicus and is present in many tropical sea environment (French Polynesia, The Caribbean).
In this context, we are are interested in:
Main partners involved:
Grant: French Polynesia Territory (2016-2019, €50K).
This exciting new project with musicologists and bioacousticians aim developing hidden Markov models for the analysis of audio signals. Applications range from ethno-musicology (Madagascar, India) to sea mammals communications (whales, dolphins).
In this context, we are are interested in:
Main partners involved:
In 1977, Dempster, Lair and Rubin introduced the Expectation-Maximization algorithm providing a general approach for the likelihood maximization of any model with latent variables. The purpose of this course is to understand this incredibly useful algorithm and to be able to apply/implement it. We first start by considering the Gaussian mixture model and introduce the EM algorithm as natural extension of its stochastic form. We then consider various application example where the EM algorithm applies: various mixtures, censoring, generalized linear models, linear mixed models. The second part of the course is dedicated to individual project: each student chooses any of the 40,000 paper citing the Dempster et al. (1977) and understand/implement the EM part of the paper.
Prerequisite: L3 degree in probability/statistics, basic programming skills.
Locations: Applied Maths M1, Univ. Paris Descartes, 2009-2012; Applied Maths M2, Univ. Paris Descartes, since 2012.
Graphical networks are very general directed probabilistic graphical models which are widely used both in theoretical and applied contexts. Important particular cases include Markov and hidden Markov models. In this course, we focus on the algorithmic computation of joint and conditional distributions in Bayesian networks using the belief propagation algorithm. We first introduce the notion of Bayesian network, potential and evidence. Then we intuitively introduce junction trees by considering variable elimination. Finally, we define formally the concept of message propagation and apply it to various models. The whole course is illustrated with many practical examples (from constrained Markov chains to genetic pedigrees) and we also provide a educational R library allowing to implement/experiment Bayesian networks and message passing algorithms.
Prerequisite: L3 degree in probability/statistics, basic programming skills.
Locations: Biostatistics M2, Univ. Paris Descartes since 2010; Applied Maths M2 CIPMA (Cotonou, Benin), 2013; Probability M2, UPMC, since 2014.
A large proportion of the data of interest in bioinformatics are sequences (DNA, short reads, proteins, etc). The purpose of the course is to provide basic knowledge on the statistical modeling of sequences as well as introducing the notion of empirical significance of a bioinformatic observation. We start by recalling the notion of statistical control of an experiment, null hypothesis, and p-value. We then present independent and Markov homogeneous models for sequences, parameter training, and model selection. Finally, we apply our empirical approach to various typical bioinformatics problems: motifs (PSSM or regex), local score of one sequence, alignment.
Prerequisite: L3 degree in bioinformatics, basic programming skills.
Locations: Bioinformatic M1, Univ. Paris Diderot, since 2010; International Bioinformatics M2, UPMC, since 2013.
Hidden Markov models are the simplest possible extension of mixture models which relax the independence assumption of the latent classes by introducing a Markov dependency for the hidden process. HMMs are used in many applications: detection of homogenous regions in bioinformatics, analysis of time series, linguistic, automatic audio transcription, etc. In this course, we use the Gaussian mixture as a base model which is then extended to its corresponding HMM. We present the Forward/Backward algorithm (particular case of belief propagation in Bayesian network) and show how to use it to obtain various posterior distributions of interest. Finally we quickly present EM-based training of HMMs. The whole course is illustrated with simulations and R illustrations.
Prerequisite: L3 degree in science, basic programming skills.
Locations: Spring Meeting in Probability (Tunis, Tunisia), 2015; Bioinformatic M2, Univ. Paris Diderot, since 2015.
Lucas Ducrot (PhD) Clinical Models for the Genetics of the Li-Fraumeni Syndrome (started in 2021)
In collaboration with P. Benusiglio (APHP, Sorbonne University, Paris, France)
Funding (3 years): ISCD, Sorbonne University
Bastien Chassagnol (PhD) Changepoint models for detecting Gene x Environment interactions in Cancer (started in 2020)
In collaboration with P.-H. Wuillemin (LIP6, Sorbonne University, Paris, France) and M. Guedj (Servier, France)
Funding (3 years): Servier
Modibo Diabate (postdoc) Changepoint models for detecting Gene x Environment interactions in Cancer (started in 2020)
In collaboration with O. Bouaziz (MAP5, University of Paris, France)
Funding (1 year): LNCC
Anis Mansouri (M2) Automatic construction and phasing of the haplotypes of the immunoglobulin G CH2-CH3/CHS heavy domains (2020)
In collaboration with C. Dechavanne (University of Paris, IRD, Paris, France)
Funding (6 months): IRD
François Gardavaud (PhD) Optimizing tomodensitometry sequences for chest imaging (since 2019)
In collaboration with F. Cornelis (Tenon Hospital, Sorbonne University, Paris, France)
Funding (5 years): APHP
Flaminia Zane (PhD) Understanding Gene Networks Deregulation in Pre-Death phase (started in 2018)
In collaboration with M. Rera (IBPS, Paris, France)
Funding (3 years): IBPS
Arthur Carcano (M2) Changepoint Models for Survival After Cancer Diagnosis (2018)
In collaboration with O. Bouaziz (MAP5, University Paris-Descartes, France)
Funding (3 months): ENS
Flaminia Zane (M2) Inference of Gene Regulation Networks and Application to the Ciguatera Fish Poisoning (2018)
Funding (6 months): Caristo-Pf
Cédric Cossou (M2) Automatic Music Transcription using a Key-Mode Hidden Markov Model (2017)
In collaboration with O. Adam (Institut d'Alembert, Paris, France) and D. Cazau (ENSTA-Bretagne)
Funding (6 months): Institut d'Alembert
Alexandra Lefebvre (M2) The Claus-Easton Model for Predicting Breast Cancer Risk from Family History (started in 2016)
In collaboration with A. de Pauw (Institut Curie, Paris, France)
Funding (6 months): Institut Curie
Vivien Goepp (PhD) Penalized Estimation of Piecewise Constant Hazard models in Survival Analysis and Applications (started in 2016)
In collaboration with O. Bouaziz (MAP5, University Paris-Descartes, France)
Funding (3 years): Université Paris-Descartes
Maltzahn Niklas (M1) Estimating the Frailty in Survival Analysis (2016)
In collaboration with O. Bouaziz (MAP5, University Paris Descartes, Paris, France)
Funding (3 months): Copenhagen University
Aldéric Fraslin (M1) Survival Analysis of Breast Cancer Incidence in the MGEN Cohort (2016)
In collaboration with O. Bouaziz (MAP5, University Paris Descartes, Paris, France)
Funding (3 months): MAP5
Malith Jayaweera (M2) Implementing a New Version of the PostCP R Package (2016)
In collaboration with G. Rigail (UEVE-INRA, Evry, France)
Funding (4 months): Google Summer Code
Xiaoqiang Wang (Postdoc) Applications of Hidden Markov Model in Structural Biology and Musical Transcription (2016)
Funding (3 months): Shandong University (Weihai, China)
Pascal Fieth (PhD) Inference of Causal Gaussian Bayesian Networks using Parallel Tempering (2016)
In collaboration with A. Hartmann (University of Oldenburg, Germany)
Funding (3 months): University of Oldenburg
Gilles Monneret (PhD) Development of Causal Models and Applications to the Inference of Genes Regulation Networks (started in 2014)
In collaboration with A. Rau and F. Jaffrezic (INRA, Jouy-en-Josas, France)
Funding (3 years): Université Pierre et Marie Curie
Ikram Allam (PhD) Developpement d'un alphabet structural pour l'analyse des structures 3D des protéines (started in 2013)
In collaboration with A.-C. Camproux (Université Paris Diderot, Paris, France)
Funding (3 years): Sorbonne Paris-Cité (SA-Flex project)
Eric Adjakossa (PhD) Analyse longitudinale multivariée et application à des données immunologiques sur le paludisme (2013-2017)
In collaboration with M. N. Hounkonnou (CIPMA, Cotonou, Benin)
Funding (3 years): SCAC and Institut pour la Recherche et le Développement (IRD)
Tinhinan Belaribi (PhD) Prédiction du risque Cancer en fonction des antécédents familiaux (2013-2016)
In collaboration with D. Stoppa-Lyonnet (Institut Curie, Paris, France)
Funding (3 years): Ligue Nationale contre le Cancer (LNCC)
Eric Adjakossa (M2) Analyse longitudinale multivariée et application à des données immunologiques sur le paludisme (2013)
Funding (3 months): Internal
Tihninan Belaribi (M2) Prédiction du risque Cancer en fonction des antécédents familiaux (2013)
Funding (6 months): Internal
Rémi Bancal (M2) Plan d'expérience multifactoriel et adaptatif pour l'inférence de réseaux de gènes par Knock-out (2012)
In collaboration with A. Rau and F. Jaffrezic (INRA, Jouy-en-Josas, France)
Funding (6 months): Internal
Yousri Slaoui (M2) Caractérisation Génétique d'une Population Africaine, Imputation et Malaria (2012)
Funding (6 months): Internal
Aurélien Amchin (M2) Alternative Null Hypothesis for the Significance of Pairwise Alignment (2011)
Funding (5 months): Internal
Vittorio Perduca (Postdoc) Study of Arithmetic of Propagation in Bayesian Networks (2011-2012)
Funding (1 year): Fondation Sciences Mathématiques de Paris (FSMP)
The Minh Luong (Postdoc) Characterization of Tumoral DNA using SNPs Microarrays (2011-2013)
Funding (2 years): Université Paris Descartes
Djénéba Thiam (PhD) Longitudinal Data Analysis and Application to Immunology Data in a Malaria Study (2010-2014)
In collaboration with A. Garcia (IRD, Paris, France)
Funding (3 years): Université Paris Descartes
Vittorio Perduca (M2) Paternity Test and Bayesian Networks (2010)
Funding (3 months): Internal
Djénéba Thiam (M2) Censoring in Longitudinal Data Analysis and Application to Immunology Data in a Malaria Study (2010)
Funding (5 months): Internal
Imen Hammami (PhD) Statistical properties of Parasite Density Estimators in Malaria and Field Applications (2009-2013)
In collaboration with A. Garcia (IRD, Paris, France)
Funding (4 year): Université Paris Descartes
Stefan Wolfsheimer (Postdoc) Posterior Distribution of Score-Based Alignments (2009-2010)
Funding (2 years): Université Paris Descartes
Ahmed Fourati (M2) Exact propagation in Bayesia networks and application to the detection of genotyping error in pedigree analysis (2009)
Funding (2 months): Internal
Qiu Jiqiong (M1) Loi jointes et conditionnelles de nombres d'occurrences de motifs dans des chaînes de Markov (2009)
Funding (4 months): Internal
Marine Jeanmougin (M2) Evaluation et comparaison des tests statistiques d’expression différentiel entre deux conditions ; application au cancer du sein et à l’analyse simultanée génome/transcriptome (2009)
In collaboration with M. Guedj (LNCC, Paris, France)
Funding (5 months): LNCC
Phuong Le (M1) Alignement et chaînes de Markov cachées: estimation de fonctions de score (2008)
Funding (4 months): Internal
Houcine Ben Boussada (M2) libhmm: une librairie C++ pour l'estimation de chaînes de Markov cachées (2008)
Funding (2 months): Internal
Aline Gauliard (M1) Recherche de motifs structuraux fonctionnels dans les familles de SCOP (2008)
Funding (3 months): Internal
Yousri Slaoui (Postdoc) Longitudinal Analysis of Malaria Parasite Density Data (2007-2008)
Funding (1 year): Université Evry Val d'Essonne
Loïc Yengo (M1) Estimation of the scoring function in pairwise alignment (2007)
Funding (3 months): Internal
Allal Houssani (M2) Chaînes de Markov cachées pour l'estimation de fonction de score (2006)
Funding (5 months): Internal
Alexandre Jacob (L3) Statistiques de Motifs dans des Séquences Biologiques Segmentées (2005)
Adrien Gaillard (M2) Minimisation multidimensionnelle sous contraintes et application aux grandes déviations de niveau 2 pour les occurrences de mots dans des chaînes de Markov (2005)
Funding (3 months): Internal
Nathanaelle Brasseur (M2) Calcul de la fonction de répartition d'une forme quadratique en variables normales par inversion numérique de la fonction caractéristique (2005)
Funding (3 months): Internal
David Gomes (L3) Etude du spectre des matrices markoviennes estimées sur des séquences d'ADN (2004)
Maxime Huvet (M2) Détection ab initio de motifs biologiques dans les génomes (2004)
Funding (6 months): Internal
Mickaël Guedj (M2) Déséquilibre de liaison et association à la maladie dans les études de SNPs cas-témoins à grande échelle (2004)
Funding (6 months): Internal
Sabrina Serin (M1) Etude des origines biologiques de la présence de courts inverses-complémentaires dans les séquences d'ADN (2003)
Funding (6 months): Internal
Maxime Huvet (L3) Influence de l'évolution dans l'étude des motifs de fréquences exceptionnelles dans les séquences d'ADN (2002)
Adrien Richard (M1) WWbar: un outil pour étudier les courts inverse-complémentaires dans les séquences d'ADN (2002)
Funding (6 months): Internal
Genetic, structural, and functional characterization of POLE polymerase proofreading variants allows cancer risk prediction (2020)
N. Hamzaoui, F. Alarcon, N. Leulliot, R. Guimbaud, B. Buecher, C. Colas, C. Corsini, G. Nuel, B. Terris, P. Laurent-Puig and others
Genetics in Medicine, 22 (9), 1533-1541.
Long-term consequences of one anastomosis gastric bypass on esogastric mucosa in a preclinical rat model (2020)
M. Siebert, L. Ribeiro-Parenti, N. D. Nguyen, M. Hourseau, B. Duchêne, L. Humbert, N. Jonckheere, G. Nuel, J.-M. Chevallier, H. Duboc, D. Rainteau, S. Msika, N. Kapel, A. Couvelard, A. Bado, M. Le Gall
Scientific Reports 10, 7393.
Associations between an IgG3 polymorphism in the binding domain for FcRn, transplacental transfer of malaria-specific IgG3, and protection against Plasmodium falciparum malaria during infancy: A birth cohort study in Benin (2017)
C. Dechavanne, S. Dechavanne, I. Sadissou, A. G. Lokossou, F. Alvarado, M. Dambrun, K. Moutairou, D. Courtin, G. Nuel, A. Garcia, F. Migot-Nabias, C L. King
PLOS Medecine, 10.1371/journal.pmed.1002403.
A surrogate marker of piperaquine-resistant Plasmodium falciparum malaria: a phenotype–genotype association study (2016)
B. Witkowski, V. Duru N. Khim, L. S. Ross, B. Saintpierre, J. Beghain, S. Chy, S. Kim, S. Ke, N. Kloeung, R. Eam, C. Khean, M. Ken, K. Loch, A. Bouillon, A. Domergue, L. Ma, C. Bouchier, R. Leang, R. Huy, G. Nuel, J.-C. Barale, E. Legrand, P. Ringwald, D. A. Fidock, O. Mercereau-Puijalon, F. Ariey, D. Ménard
The Lancet Infectious Diseases. DOI: 10.1016/S1473-3099(16)30415-7
Acquisition of natural humoral immunity to P. falciparum in early life in Benin: impact of clinical, environmental and host factors (2016)
C. Dechavanne, I. Sadissou, A. Bouraima, C. Ahouangninou, R. Amoussa, J. Milet, K. Moutairou, A. Massougbodji, M. Theisen, E. J. Remarque, D. Courtin, G. Nuel, F. Migot-Nabias, A. Garcia
Scientific Reports, 6:33961. DOI: 10.1038/srep33961
DECURION: a new model for predicting breast and ovarian cancer risks based on family history using French population incidences and Institut Curie’s database (2016)
T. Belaribi, D. Le Gal, O. Bouaziz, F. Alarcon, S. Eon-Marchais, N. Andrieu, M.- G. Dondon, A. de Pauw, D. Stoppa-Lyonnet, and G. Nuel
Poster, ASCO Annual Meeting, Chicago, USA.
A new statistical method for curve group analysis of longitudinal gene expression data illustrated for breast cancer in the NOWAC postgenome cohort as a proof of principle (2016)
E. Lund, L. Holden, H. Bøvelstad, S. Plancade, N. Mode, C.-C. Günther, G. Nuel, J.-C. Thalabard, M. Holden
BMC Medical Research Methodology 16:28. DOI: 10.1186/s12874-016-0129-z
A R package to simulate constrained phenotypes under a disease model H1. waffect (pronounced 'double-u affect' for 'weighted affectation') is a package to simulate phenotypic (case or control) datasets under a disease model H1 such that the total number of cases is constant across all the simulations. The package also makes it possible to generate phenotypes in the case of more than two classes, so that the number of phenotypes belonging to each class is constant across all the simulations. waffect is used to assess empirically the statistical power of Genome Wide Association studies.
A R package to estimate posterior probabilities in change-point models using constrained HMM. The functions are used for change-point problems, after an initial set of change-points within the data has already been obtained. The function postCP obtains estimates of posterior probabilities of change-point and hidden states for each observation, and confidence intervals for the positions of the change-point. The function postCPsample obtains random samples of sets of change-points using the output of the postCP function.
SPatt (Statistic for Patterns) is a suite of C++ programs designed for the computation of pattern occurrences p-value on text. Assuming the text is generated according to Markov model, the p-value of a given observation is its probability to occur. The lower is the p-value, the more unlikely is the observation. For example, this tools can be used to find patterns with unusual behaviour in DNA or proteins sequences.
floors climbed for Science
hours of academic teaching
article pages published
Lines of R code written