Publications scientifiques

[hal-05230510] Seed Inference in Interacting Microbial Communities Using Combinatorial Optimization

The behaviour of microorganisms and microbial communities can be abstracted by models combining a description of their metabolic capabilities as metabolic networks, and suitable computational or mathematical paradigms that further integrate simulation conditions. A major component of the latter is the composition of the environment or growth medium that can be referred to as seeds. Predicting the seeds from the metabolic network and an expected behaviour is an inverse problem that can be addressed with linear programming or logic paradigms such as Answer Set Programming (ASP). Here, we formalise seed prediction for microbial communities, taking into account that their members may interact positively through metabolite transfers, which may reduce the need for external seed metabolites. We address the problem with ASP and add a hybrid component ensuring the satisfiability of linear constraints. We explore the subset-minimality solving heuristic of the Clingo solver and develop two heuristics supporting priority of seeds over transfers. We present a proof of concept of seed inference in small-scale communities, and assess the scalability of the three heuristics at genome-scale. Overall, our work introduces a hybrid logic-linear model for seed inference in interacting microbial communities, and new heuristics for the exploration of the solution space with subset minimality optimisations.

ano.nymous@ccsd.cnrs.fr.invalid (Chabname Ghassemi Nedjad) 29 Aug 2025

https://inria.hal.science/hal-05230510v1

[hal-05348017] Measuring shade use of dairy cattle at pasture with an on-cow light sensor: a case study

Grazing cows preferentially access shade to shield against the sun. However, the conditions that provide cows with optimal shade access and use (e.g. no competition for access to shade) are still unknown. Continuous monitoring of shade use by grazing cattle could help to understand how and when cows use shade resources. The aim of this study was to validate a method based on a light sensor (HOBO Pendant MX2202) attached to the back (on the transverse processes of the lumbar vertebrae) of 7 dairy cows at pasture to continuously record their use of natural shade for research purposes. Live behavioral observations of shade use and cow posture were recorded in summer (June to September, between 9 am and 6 pm). Based on the behavioral observation data, we determined thresholds in lux to discriminate between cows in shade and cows in sun on a randomly-generated training dataset representing 15 % of the initial dataset. This process was repeated 100 times, generating 100 thresholds and threshold performances. Data loss due to sensor loss or battery discharge was 9 %, which is acceptable. The thresholds ranged from 15,688 to 40556 lx: sensitivity ranged from 92.0 % to 99.8 % and specificity ranged from 88.7 % to 99.9 %, showing that the performances were robust to threshold variation within this range. This study demonstrates that an efficient threshold to discriminate cows in shade from cows in the sun can be determined via a relatively short (about 12 h) series of live observations. As performances seem to be slightly lower for lying cows than for standing cows (mean false-positive rate is 7.4 % for lying cows versus 1.8 % for standing cows), future studies should consider the posture (which can also be monitored continuously with other sensors such as accelerometer installed on the legs or on the neck collar of the cows).

ano.nymous@ccsd.cnrs.fr.invalid (Lydiane Aubé) 05 Nov 2025

https://hal.inrae.fr/hal-05348017v1

[hal-05410799] Data Paper: HotPig, a behavioural dataset of pigs under heat stress

The widespread use of videos in modern indoor livestock facilities coupled with the availability of efficient and low-cost computer vision algorithms provides strong incentives for continuously monitoring farm animal behaviour. Deciphering how pigs behave when experiencing prolonged heat stress is particularly important for animal welfare, as it helps us to better understand how animals use various thermoregulation and heat dissipation mechanisms. Data were collected on 24 pigs that were video-monitored day and night under two contrasted conditions: thermoneutral (TN, 22 °C) and heat stress (HS, 32 °C). All pigs were housed individually and had free access to an automatic feeder delivering pellets four times a day, and to water. After acquisition, videos were processed using YOLOv11, a real-time object detection algorithm that uses a convolutional neural network (CNN), to extract the following behavioural traits: drinking, willingness to eat, lying down, standing up, moving around, curiosity towards the littermate housed in the neighbouring pen, and contact between the two animals (cuddling). A minute frequency sampling rate was applied (each minute corresponds to 150 frames processed) for a continuous period of 16 days, spanning the two different thermal conditions (9 days on TN, 6 days on HS, 1 day back to TN). Consistency with the automatic electronic feeder’s data (also provided) was thoroughly checked. The dataset allows quantitative criterion to be analysed to decipher inter-individual differences in animal behaviour and their dynamic adaptation to heat stress. This dataset can be used to train any machine learning methods for behaviour prediction from videos in conventional growing pigs.

ano.nymous@ccsd.cnrs.fr.invalid (Louis Bonneau de Beaufort) 11 Dec 2025

https://hal.inrae.fr/hal-05410799v1

[hal-05444004] Les technologies numériques en élevage : de la mesure à l’évaluation comportementale du bien-être de chaque animal

Le bien-être des animaux est une notion difficile à définir car se référant à un phénomène complexe, intrinsèquement liée à la perception qu’a l’individu de son environnement. Ne pouvant être mesuré directement, le bien-être est évalué à partir de la détermination et la quantification d’indicateurs spécifiques. Ces indicateurs, dont les variations sont associées à différents états de bien-être, doivent être combinés en fonction du contexte d’évaluation. Le comportement animal, reconnu comme une des clés pour l’évaluation du bien-être, peut changer face aux variations de l’environnement d’élevage, telles que l’accès au pâturage, influençant à la fois la routine et la dynamique de l’occupation de l’espace des animaux. L'analyse de ces changements comportementaux permet de définir de nouveaux indicateurs, facilitant l’évaluation de l’impact positif ou négatif de ces modifications environnementales sur le bien-être des animaux. L’intégration des technologies de capteurs, de modèles mathématiques et de l’intelligence artificielle ouvre de nouvelles perspectives pour un suivi longitudinal des activités, des dynamiques spatiales et d’autres paramètres d’intérêt tout au long du cycle de vie des animaux. Par exemple, les algorithmes de classification supervisée ont permis d’associer les données brutes fournies par des capteurs aux comportements d’intérêt, tandis que les algorithmes non supervisés devraient révéler de nouveaux indicateurs en lien avec le bien-être des animaux. Cet article met en lumière les opportunités offertes par les technologies numériques émergentes. Nous nous concentrons sur l’évaluation comportementale et son rôle crucial dans l’évaluation du bien-être, en présentant trois études de cas : 1) pour distinguer les problèmes liés à la santé, au stress thermique et à la reproduction chez les vaches laitières, 2) pour prévoir la boiterie chez la vache laitière et 3) pour étudier des émotions chez les porcs. Enfin, nous soulignons l’importance d’une collaboration interdisciplinaire étroite entre éthologistes, physiologistes, mathématicien(ne)s et informaticien(ne)s pour favoriser le développement de ce domaine émergent que nous désignons sous le terme d’« éthologie numérique ».

ano.nymous@ccsd.cnrs.fr.invalid (Masoomeh Taghipoor) 06 Jan 2026

https://hal.inrae.fr/hal-05444004v1

[hal-05419350] MetaNetMap: automatic mapping of metabolomic data onto metabolic networks

MetaNetMap is a Python tool dedicated to mapping metabolite information between metabolomic data and metabolic networks. The goal is to facilitate the identification of metabolites from metabolomic data that are present in one or more metabolic networks to facilitate further modelling, taking into consideration that data from the former likely has distinct identifiers from the latter.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 17 Dec 2025

https://inria.hal.science/hal-05419350v1

[hal-05435147] On Logic-based Self-Explainable Graph Neural Networks

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Alessio Ragno) 30 Dec 2025

https://hal.science/hal-05435147v1

[hal-05264391] Method: An accurate method for detecting drinking bouts in dairy cows based on reticulorumen temperature

This study evaluated the performances of three methods for detecting drinking bouts in dairy cows using reticulorumen temperature (RT): the 'FixT' method based on a fixed RT threshold, the 'Cow-dT' method based on a cow-day-specific RT threshold, and the 'FallST' method based on RT fall slope. We observed the drinking behaviours of 28 dairy cows equipped with reticulorumenal sensors over 96 h to create a reference dataset. A total of 730 drinking bouts were observed. We matched detected drinking bouts against observed drinking bouts to obtain the number of true-positives, false-negatives, and falsepositives, and then calculated the detection performances of the three methods in terms of sensitivity (Se), positive predictive value (PPV), and F-score. The performances of the three RT-based methods (Se ≥ 90%, PPV > 96% and F-score ≥ 93%) were better than those from previous work using collarattached accelerometers, but slightly lower than methods using drinking troughs connected to electronic identification systems or methods combining accelerometers with geomagnetic sensors or with ultrawideband location. The FallST method showed slightly better performance (highest F-score) than the FixT and Cow-dT methods. The FallST method accurately detected drinking bouts lasting more than 30 s and at least 30 min apart, with a detection time accuracy of 10 min. The models using RT curve parameters failed to predict characteristics of the drinking bouts. In conclusion, the method developed here can accurately detect drinking bouts in dairy cows using RT, but without further characterisation of the drinking bouts (e.g. duration).

ano.nymous@ccsd.cnrs.fr.invalid (L. Aubé) 17 Sep 2025

https://hal.inrae.fr/hal-05264391v1

[hal-04997560] Data paper: A goat behaviour dataset combining labelled behaviours and accelerometer data for training Machine Learning detection models

This paper presents a dataset of accelerometer data and corresponding video-annotated behaviours from eight indoor dairy Alpine goats. Animals were equipped with 3D-accelerometers attached to their ears for 24 consecutive hours and recorded at a frequency of 5 Hz. Video recordings for this period were also obtained. Activities associated with positional, feeding and social behaviours were annotated over two daylight periods, for a total of 11 hours per goat, by a trained observer assuring high precision and consistency. This dataset can be used independently or complement an existing dataset for training supervised Machine Learning models for the detection of goat behaviour. It contributes to improving the robustness of such models by incorporating behavioural signals specific to indoor-housed goats.

ano.nymous@ccsd.cnrs.fr.invalid (Sarah Mauny) 19 Mar 2025

https://hal.inrae.fr/hal-04997560v1

[hal-05385353] WAIT4 – un projet de recherche alliant technologies numériques et IA pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis des transitions agroécologique et climatique

Le projet WAIT4 exploite les opportunités offertes par les technologies numériques pour mesurer différentes composantes du bien-être animal en temps réel ; il met en œuvre des approches d’IA pour intégrer les données hétérogènes, par nature et en temporalité, qui sont ainsi collectées. L’objectif est de définir de nouveaux indicateurs et la fréquence pertinente avec laquelle les mesurer, afin d’identifier les variations du bien-être de l’animal. Différentes espèces (porcins, petits et gros ruminants), en systèmes conventionnels, biologiques ou agropastoraux, et sous des climats contrastés sont abordées. L’ambition est de détecter des déviations précoces des changements de bien-être et de santé en réponse à des changements de pratiques et face aux aléas climatiques. Le projet met en œuvre des actions concertées associant des d'instituts français de recherche (INRAE, CEA, INRIA, INSA), et un dialogue avec les porteurs d’enjeux grâce à l’appui du LIT Ouesterel pour faciliter l’appropriation et la diffusion des résultats. Le projet WAIT4 (2023-2027), coordonné par INRAE, est financé par France 2030 dans le cadre du PEPR Agroécologie et Numérique.

ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 27 Nov 2025

https://hal.inrae.fr/hal-05385353v1

[hal-05380224] NINSAR Project: Defining Agroecological Routes Using Robots

The poster presents the doctoral research of Mohammad Naim, conducted within the French national project NINSAR (New ItiNerarieS for Agroecology using cooperative Robots), and outlines how the thesis contributes to this broader research programme. The NINSAR project, as framed in the poster title and structure, is positioned as a national effort to define agroecological routes using robotics, integrating technological innovation with ecological, social, and economic sustainability goals. Within this context, the thesis investigates how autonomous agricultural systems can be designed, evaluated, and adopted without compromising core agroecological principles. The thesis analyzes the transition from Agriculture 4.0 to Agriculture 5.0 through the thirteen agroecological principles defined by the High Level Panel of Experts, assessing how emerging robotic and data-driven systems can support more sustainable production models. It evaluates three major categories of robotic field operations (data collection, soil and crop management, and navigation/communication) and links them to four principle-level agroecological indicators, finding strong contributions to soil health and synergy and weaker support for recycling. The work also conducts an empirical study of French farmers using the Technology Acceptance Model 2, identifying perceived usefulness as the central predictor of adoption, complemented by ease of use and social influence. A complementary technical study clusters 71 agricultural robots into five functional categories, illustrating the increasing specialization of robotic platforms and cost differences between electric and endothermic systems. The thesis further extends to the economic and industrial dimension of the NINSAR project by engaging manufacturers through semi-structured interviews to construct business model canvases aimed at identifying viable pathways for scaling agroecological robots. Taken together, the poster shows that Naim’s thesis forms a core component of NINSAR by integrating agronomic, technological, social, and economic analyses to support the development of robotics aligned with agroecological transition goals.

ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 24 Nov 2025

https://hal.science/hal-05380224v1

[hal-05368332] Modeling the emergent metabolic potential of soil microbiomes in Atacama landscapes

<div><p>Background Soil microbiomes harbor complex communities from which diverse ecological roles unfold, shaped by syntrophic interactions. Unraveling the mechanisms and consequences of such interactions and the underlying biochemical transformations remains challenging due to niche multidimensionality. The Atacama Desert is an extreme environment that includes unique combinations of stressful abiotic factors affecting microbial life. In particular, the Talabre Lejía transect is a natural laboratory for understanding microbiome composition, functioning, and adaptation.</p></div> <div>Results<p>We propose a computational framework for the simulation of the metabolic potential of microbiomes, as a proxy of how communities are prepared to respond to the environment. Through the coupling of taxonomic and functional profiling, community-wide and genome-resolved metabolic modeling, and regression analyses, we identify key metabolites and species from six contrasting soil samples across the Talabre Lejía transect. We highlight the functional redundancy of whole metagenomes, which act as a gene reservoir, from which site-specific adaptations emerge at the species level. We also link the physicochemistry from the puna and the lagoon samples to metabolic machineries that are likely crucial for sustaining microbial life in these unique environmental conditions. We further provide an abstraction of community composition and structure for each site that allowed us to describe microbiomes as resilient or sensitive to environmental shifts, through putative cooperation events.</p></div> <div>Conclusion<p>Our results show that the study of multi-scale metabolic potential, together with targeted modeling, contributes to elucidating the role of metabolism in the adaptation of microbial communities. Our framework was designed to handle non-model microorganisms, making it suitable for any (meta)genomic dataset that includes high-quality environmental data for enough samples.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Constanza M Andreani-Gerard) 17 Nov 2025

https://inria.hal.science/hal-05368332v1

[hal-05178193] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

Serving as a powerful proxy in remote sensing studies, spectral indices can generate meaningful environmental interpretation from either raw or atmospherically corrected spectral data, and characterise and quantify some important properties of various objects on Earth’s surface. However, while numerous spectral indices have been developed over time, since the very launch of civilian satellites until now, some critical issues in their usage, such as comparability, remain scarcely studied, which may lead to incorrect, inconsistent, and unreliable results. In this study, we collected 471 spectral indices of various environment components (vegetation, water, and soil) that might be leveraged for soil studies, and traced their popularity in scientific publications over the past decades. The bibliometric analysis revealed a growing interest and utilisation of spectral indices as Earthobserving satellite technology advanced. Based on both literature and, for sake of complementation and illustration, some targeted regional-scale case studies, we discuss the issues of naming confusion, comparability, applicability, accuracy trade-offs, and reproducibility of using spectral indices. Overall, this overview provides an extensive list of spectral indices, both soil indices and soil-related indices, that can be useful for characterising these environment components by remote sensing. It draws attention to some misuses and confusions that must be avoided to prevent scientific pitfalls. The comparisons between different spectral indices, sensors, and correction methods, highlight the confusing effects that the misuse and non-standardised practices of the spectral indices useful for soil, may have on soil property mapping and monitoring. Insights to the judicious and appropriate usage of spectral indices in the remote sensing of soil are provided.

ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 24 Jul 2025

https://hal.inrae.fr/hal-05178193v1

[hal-05340010] Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification

Deep learning models have emerged as a promising alternative to conventional approaches for plant disease identification, a critical challenge in agricultural production. However, the existing plantdisease datasets are insufficient to address the complexities of realworld agricultural scenarios, such as multi crop disease, unseen, few-shot, and domain shift adaptation. Additionally, the lack ofstandardized evaluation protocols and benchmark datasets hinders the fair evaluation of models against these challenges. To bridge this gap, we introduce Deep-Plant-Disease, the largest and mostdiverse dataset with novel text data designed to enhance model generalization in multi crop disease identification. We revisit and reformulate the task by establishing a standardized evaluation framework that supports consistent benchmarking and guides future research. Through experiments, we further validate the robustness and adaptability of models trained on our dataset, highlighting their effective transferability to real-world agricultural challenges.

ano.nymous@ccsd.cnrs.fr.invalid (Abel Yu Hao Chai) 31 Oct 2025

https://inria.hal.science/hal-05340010v1

[hal-05343366] Forest Cover in the Congo Basin: Consistency Evaluation of Seven Datasets

<div><p>Tropical forests play an essential role in the carbon and water cycles of terrestrial ecosystems, but they are increasingly threatened by human activities and climate change. For places where ground observations are scarce, like in Equatorial Africa, remote sensing is a key source of information for monitoring the temporal and spatial dynamics of forests over large areas. Several Earth Observation-based global maps were developed in recent decades using different definitions of the land-use/land-cover (LULC) classes. While such products are widely used for monitoring land use and planning land management, the consistency of these LULC maps for the Congo Basin has never been analyzed and quantified at the ecosystem level. Here, we selected seven of the most-used global maps and analyzed their consistency over the Congo Basin. After reclassification into forest/non-forest masks and spatial resampling, we assessed the agreement and disagreement percentage across the different tropical ecoregions of Africa, from moist forest to miombo, including savanna. The datasets showed differences in forest area as a function of spatial resolution, with higher forest area levels at coarser resolutions (e.g., from 74.1% to 88.5% forest cover when upscaling the GLCLU from 30 m to 1 km over the Congo Basin). A higher agreement between the datasets was found for forest area over moist forest (between 88.18% and 99.38%) in comparison to savanna (32.82%-99.84%) and miombo (53.83%-99.7%). These discrepancies led to large differences in forest cover, varying from a net loss of 205,704 km 2 to a net gain of 50,726 km 2 over 2001-2019 depending on the dataset used. This study draws attention to the uncertainty associated with these products with regard to forests, particularly in regions of biological importance, such as the miombo and savanna regions, which remain poorly understood. Indeed, the two major uncertainties affecting the quality of LULC products are related to the different spatial resolutions and biological definition of "forest" adopted by each product.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Solène Renaudineau) 03 Nov 2025

https://hal.science/hal-05343366v1

[hal-05322783] Whole genome sequencing dataset for a Vitis vinifera diversity panel

Vitis vinifera is a significant agricultural species across continents and a genomic model for perennial crops. A diversity panel of 279 cultivars from the Vassal-Montpellier Grapevine Biological Resources Centre, which represents the diversity of the three main genetics pools of this species, has served as a foundation for genome-wide association studies using genotyping-by-sequencing approaches. Part of this panel (74 cultivars) has recently been sequenced at the whole genome level. Here, we release whole-genome sequencing of the remaining 205 cultivars of the panel, using the short-read NovaSeq6000 S4 PE150 technology to achieve complete genomic coverage. To ensure consistency with prior analyses and confirm genetic identities, we performed variant calling and SNP comparison with previously published data. During this stage, we identified two mislabeled samples, which were excluded from the dataset, resulting in a final set of 72 samples from the public data. Additionally, nine representative cultivars spanning major genetic groups underwent long-read sequencing using PacBio Revio technology. All sequences have been deposited at the ENA under project PRJEB95058 for the short-read data and project PRJEB100755 for the long-reads. Variant data have been deposited in the publicly accessible GIGWA SNP database. This expanded genomic dataset establishes a comprehensive foundation for advanced genomic analyses in V. vinifera, including genome-wide association mapping, structural variant characterization, and genetic diversity assessment. The long-read sequences provide high-quality genomic resources for structural variation analysis and pangenome construction. The integration of short-and long-read sequencing technologies enhances the usefulness of this resource for understanding grapevine genomic architecture and supporting genetic improvement initiatives.

ano.nymous@ccsd.cnrs.fr.invalid (Gautier Sarah) 20 Oct 2025

https://hal.science/hal-05322783v1

[hal-05435107] Leveraging internal representations of GNNs with Shapley values

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Ataollah Kamal) 30 Dec 2025

https://hal.science/hal-05435107v1

[hal-05350945] Considering farmers’ needs in agroliving labs : a case study

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Mélanie Broin) 06 Nov 2025

https://hal.science/hal-05350945v1

[hal-05435121] Diffusion for Explainable Unsupervised Anomaly Detection

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Elouan Vincent) 30 Dec 2025

https://hal.science/hal-05435121v1

[hal-05318560] cMFA for multi-omics data integration in microbial community models

Understanding microbial community functions is challenging due to complex interactions and assembly mechanisms; however, advances in sequencing have enabled the collection of multi-omics data, including population counts and metabolomic or metatranscriptomic data. Our main objective is to develop a mathematical model capable of integrating time series of multiomics data at a community scale. We introduce the community metabolic flux analysis (cMFA) method, which generalizes metabolic flux analyses (MFA) , using a list of time series data of experimentally measured production and consumption rates of metabolites and microorganism growth . We aim to infer, for each member of the microbial community, the intracellular distribution of metabolic fluxes by solving the inference problem. We evaluated the cMFA method on synthetic data from dynamic models of increasingly complex microbial communities, based on metabolic models of different mutants of Escherichia coli using dynamic flux balance analysis . Synthetic metatranscriptomic data were obtained from internal metabolic fluxes in the dynamic model. Different regularization terms were tested, including different levels of sparsity, for the selected penalty weight . To evaluate the robustness of the method, multiplebenchmarks were tested. These included assessments of the robustness of the method to data noise, incomplete meta-transcriptomic data, inaccurate prior knowledge of metabolic import rates and larger microbial community. We are currently working with real data, including data on denitrification and cheese production

ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 16 Oct 2025

https://hal.science/hal-05318560v1

[hal-05304541] Generation of metabolomic-informed models of metabolism for microbial communities

The generation of genome-wide metabolic networks has become a routine analysis for individual organisms or communities communities. However, these automatically generated metabolic networks are incomplete because they are constructed by based on the combination of gene annotation and reactions available in generic available in generic databases (Metacyc, BIGG, ModelSEED...). These are oriented towards well-known organisms or organisms or model organisms and miss out on important functions secondary metabolism. We propose to combine metabolomic data analysis, metabolic modelling and annotation metabolic modelling and annotation mining to build high-quality models of high quality models of microbial metabolism with the long-term aim of better understanding of microbial communities. In terms of application of the methods to plant microbial communities, we hope that the plant microbial communities, we hope that the newly developed models will provide a better understanding of the process of microbial recruitment by the plant: metabolic functions involved, micro-organisms associated with these functions.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 08 Oct 2025

https://inria.hal.science/hal-05304541v1

[hal-05110984] Advancing agroecology and sustainability with agricultural robots at field level: A scoping review

Agricultural robots show a growing potential to improve resource management and reduce the environmental impacts of farming. However, the evaluation of robots’ contribution to support sustainable farming is still lacking. This study specifically reviewed the operationalization of four agroecological principles at the field level: recycling, soil health, biodiversity and synergy. To this aim, a scoping review was conducted on the Scopus database, with a query within titles, abstracts, and author keywords mentioning robots, and agroecology or sustainability. The body of literature was screened to include only open field robots. The resulting 78 documents were coded inductively on three macro areas: (1) academic background, (2) robot operations, (3) contribution to agroecology principles, whether explicitly or implicitly mentioned. The results highlight that robots operationalize agroecology principles through non-chemical and selective weeding to preserve diversity and soil health, lighter designs that reduce soil compaction, and advanced data collection systems to optimize resource use and synergy. Solar-powered robots represent early steps toward recycling, but this principle remains understudied. The discussion expands on the potential of robotics in other innovative approaches for sustainable agriculture, such as agroforestry, conservation agriculture, and novel farming system design. Key challenges include ensuring farmers are enabled to master data collection and management, as well as integrating high-tech robotics with low-tech solutions. These efforts are critical for leveraging agricultural robotics to advance agroecology and sustainability across diverse farming systems.

ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 13 Jun 2025

https://hal.science/hal-05110984v1

[hal-05304536] Generation of metabolomic-informed models of metabolism for microbial communities

The generation of genome-wide metabolic networks has become a routine analysis for individual organisms or communities communities. However, these automatically generated metabolic networks are incomplete because they are constructed by based on the combination of gene annotation and reactions available in generic available in generic databases (Metacyc, BIGG, ModelSEED...). These are oriented towards well-known organisms or organisms or model organisms and miss out on important functions secondary metabolism. We propose to combine metabolomic data analysis, metabolic modelling and annotation metabolic modelling and annotation mining to build high-quality models of high quality models of microbial metabolism with the long-term aim of better understanding of microbial communities. In terms of application of the methods to plant microbial communities, we hope that the plant microbial communities, we hope that the newly developed models will provide a better understanding of the process of microbial recruitment by the plant: metabolic functions involved, micro-organisms associated with these functions.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 08 Oct 2025

https://inria.hal.science/hal-05304536v1

[hal-05283043] Assessing fruit tree vigor in peach and apple orchards through wood segmentation in ground-based RGBimages

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Khac-Lan Nguyen) 25 Sep 2025

https://hal.science/hal-05283043v1

[hal-05340126] Accurate MAG reconstruction from complex soil microbiome through combined short- and HiFi long-reads metagenomics

Background: Advances in high-fidelity long-read (HiFi-LR) sequencing technologies offer unprecedented opportunities to uncover the microbial genomic diversity of complex environments, such as soils. While short-read (SR) sequencing has enabled broad insights at gene-level diversity, the inherently limited read length constrains the reconstruction of complete genomes. Conversely, HiFi-LR sequencing enhances the quality and completeness of metagenome-assembled genomes (MAGs), enabling higher-resolution taxonomic and functional annotation. However, the cost and relatively low throughput of HiFi-LR sequencing can limit genome recovery, particularly at the binning stage, where coverage depth is critical. Results: Here, we present a novel hybrid strategy that differs from classical hybrid assemblies, where SR and LR reads are jointly used at the assembly step. Instead, we use high-depth SR data to improve the binning of HiFi-LR contigs. Using both SR and HiFi-LR metagenomic data generated from a tunnel-cultivated soil sample, we demonstrate that SR-derived coverage information significantly improves the binning of HiFi-LR assemblies. This results in a substantial increase in the number and quality of recovered MAGs compared to using HiFi-LR data alone and an incomparable improvement compared to SR data alone. Conclusion: Our findings highlight the power of combining SR and LR in highly diverse environments, such as soil, not for hybrid assembly per se, but to enhance the downstream binning process. The combination of SR and LR data substantially improves the downstream binning process and overall genome recovery. Importantly, this approach underscores the potential of leveraging the vast amount of publicly available Illumina metagenomic datasets. Completing existing SR resources with PacBio HiFi sequencing can maximise assembly contiguity and binning accuracy using massive amounts of SR data already generated. This highlights a practical and forward-looking strategy for microbiome research, where novel LR technologies will bring new value to previous short-read efforts.

ano.nymous@ccsd.cnrs.fr.invalid (Carole Belliardo) 31 Oct 2025

https://inria.hal.science/hal-05340126v1

[hal-05301772] Long-term evolution of forest cover in the Pacific coast of Ecuador (1960–2019): a comparison of Land Use/Land Cover (LULC) remote sensing products

Ecosystem services provided by forests are increasingly threatened by anthropogenic and climatic disturbances. International initiatives to reduce greenhouse gas emissions from forest disturbances, such as Reducing Emissions from Deforestation and Degradation+ (REDD+), require robust quantifications of the dynamics and extent of Land Use/Land Cover (LULC). However, no study present yet a comparative synthesis of existing LULC products and long-term landscape evolution on the Pacific Slope and Coast of Ecuador (EPSC). In addition, previous studies on the evolution of the forest cover in the EPSC were achieved on small regions and short time-scales, never analysing before the 1990s. In this context, we conducted a long-term study of landscape dynamics at the scale of the EPSC on the last 6 decades (1960-2019). In addition, we propose a comparative synthesis of the main land use databases from remote sensing. To do this, we compared six LULC databases (HILDA+, ESA-CCI, MODIS, GLCLUC, TMF, GFC) derived from remote sensing using the Ecuadorian Ministry of Environment and Water (MAATE) LULC dataset as a reference. This comparison was performed with confusion matrices. Three metrics are calculated from the confusion matrices: Accuracy, F1-score and MCC. HILDA+ and TMF products showed the best agreement with the MAATE map (F1-score of 0.63 and 0.65, respectively). HILDA + captured net forest cover losses better than TMF (65% vs 27% of the net losses recorded by MAATE). Of the six databases analysed, HILDA+ was identified as the product with the best correlation with the Ministry’s LULC maps. Therefore, HILDA+ was chosen to analyse deforestation since 1960 in the EPSC. The major limitation encountered using HILDA+ is the coarse spatial resolution of 1 km. Yet, four deforestation phases were identified in the EPSC over 1960–2019. They reflect the historical, social, political, and climatical context of each ecosystem. Over the entire period (1960-2019), forest cover decreased by 43.9%. Since the 1960s, tropical rainforest areas declined by a third. Dry and transitional tropical forests lost more than half their area.

ano.nymous@ccsd.cnrs.fr.invalid (Valentine Sollier) 14 Oct 2025

https://hal.science/hal-05301772v1

[hal-05261543] cMFA for multi-omics data integration in microbial community models

Understanding microbial community functions is challenging due to complex interactions and assembly mechanisms; however, advances in sequencing have enabled the collection of multi-omics data, including population counts and metabolomic or metatranscriptomic data. Our main objective is to develop a mathematical model capable of integrating time series of multiomics data at a community scale. We introduce the community metabolic flux analysis (cMFA) method, which generalizes metabolic flux analyses, using a list of time series data of experimentally measured production and consumption rates of metabolites and microorganism growth. We aim to infer, for each member of the microbial community, the intracellular distribution of metabolic fluxes. This is a high-dimensional constrained linear regression problem, informed by mass conservation constraints and metatranscriptomic data, encoded in the penalty term. The difficulty here is in accurately inferring latent internal rates from a few observations of exchange fluxes. We evaluated the cMFA method on synthetic data from dynamic models of increasingly complex microbial communities, based on metabolic models of different mutants of Escherichia coli using dynamic flux balance analysis (dFBA). Synthetic metatranscriptomic data were obtained from internal metabolic fluxes in the dynamic model. Different regularization terms were tested, including different levels of sparsity, for the selected penalty weight . To evaluate the robustness of the method, multiple benchmarks were tested. These included assessments of the robustness of the method to data noise, incomplete meta-transcriptomic data, inaccurate prior knowledge of metabolic import rates and expanding the study to a larger microbial community . Currently, we are working with real data ,including data on denitrification and cheese production .

ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 15 Sep 2025

https://hal.science/hal-05261543v1

[hal-05281103] Evaluating the potential of Sentinel-2 data to assess the coarse fragment cover of the soil surface within a Spanish vineyard

The presence of coarse fragments (CF) on the soil surface is a critical factor influencing the assessment of key soil properties such as hydraulic conductivity and C stocks, as well as erosion processes [1–3]. This study investigates the potential of Sentinel-2 (S2) data to estimate soil surface CF cover for an 82-ha trellis-trained vineyard (Burgos, Spain), with ~3 m-inter-row spacing. CF cover (%) was estimated using the point-count method via SamplePoint [4], based on nadir photos taken ~1 m height above ground level, at 60 points repeatedly during three field campaigns. Based on two S2 time series (Jan 2023–Feb 2024 and Jan–Apr 2023 (vine dormancy)), six spectral indices computed within a 30 m-buffer were clustered through hierarchical agglomerative clustering (HAC) and principal component analysis (PCA), which led to the selection of the Non-photosynthetic vegetation soil separation index (NSSI). Assessment of NSSI relevance relied on correlating NSSI values, extracted from S2 images closest to field campaign dates, with the average CF cover, with and without applying an NDVI threshold of 0.4. A Random Forest algorithm was then used to predict CF cover, with 70% calibration 30% validation split repeated over three random iterations. Two approaches were tested, with and without NDVI threshold: (1) S2 bands only, and (2) S2 bands + NSSI + NDVI. NSSI was moderately correlated with CF cover (R² = 0.47–0.60), while best correlated with NDVI threshold (R² = 0.48–0.77). Calibration performance was good across all models (R²&gt;0.6; RMSE&lt;16.75%; RPD&gt;1.62; RPIQ&gt;2.23), even though validation results were variable. NDVI thresholding alone did not improve validation, but adding NSSI+NDVI as predictors enhanced validation accuracy. The best performance was obtained by combining data from all campaigns using S2 bands + NSSI + NDVI without any NDVI threshold (R² = 0.42; RMSE = 17.53%; RPD = 1.55; RPIQ = 2.05).

ano.nymous@ccsd.cnrs.fr.invalid (Hayfa Zayani) 24 Sep 2025

https://hal.science/hal-05281103v1

[hal-05281203] Detection of soil management practices using Sentinel-1 time series: the challenges raised by the diversified management sequences in vineyards

Characterization of soil management strategies in the complex agroecosystems of vineyards is crucial to evaluate their impact on vineyards soil health, particularly in the context of increasing soil threats posed in semi-arid environments [1], [2], [3], [4]. This study evaluates the potential of Sentinel-1 (S1) radar times series to detect soil management practices in Spanish vineyards. Two trellis-trained vineyards plots (ca. 4 ha each) located in the Toledo province (Spain) were studied, each subjected to a distinct soil management practice: conventional tillage (TILL) and a cover cropping system (CC), respectively. A farmer survey was conducted to thoroughly document the sequence, timing, and spatial distribution of management operations carried out between October 2020 and August 2024. A methodology based on S1 radar signal change detection was applied to detect soil surface roughness associated with these practices. The survey data served as a reference to evaluate the accuracy of the S1-derived detections. Results revealed a very high degree of variability in vineyards management practices, in terms of type, spatial distribution and frequency within these fields. Despite such diversified management sequence, satellite-based detection was effective on average, over more than 60% of the plot surface area, for tillage for both TILL and CC plots, weed control and rolling for only CC plot. Additionally, mechanical pruning was successfully detected in the TILL plot. Our further research will explore the integration of S1 radar data with S2 optical imagery to refine this detection and assessment of soil management practices in viticultural systems.

ano.nymous@ccsd.cnrs.fr.invalid (Hayfa Zayani) 24 Sep 2025

https://hal.science/hal-05281203v1

[hal-05260643] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 15 Sep 2025

https://hal.inrae.fr/hal-05260643v1

[hal-05285601] Improving the prediction of soil organic carbon content using field-acquired hyperspectral data by accounting for soil moisture and surface roughness

Soil surface conditions such as moisture, roughness, and vegetation complicate accurate Soil Organic Carbon (SOC) prediction by altering spectral reflectance. Most studies consider these factors separately and under controlled conditions. Soil roughness has rarely been included [1,2], and typically not alongside soil moisture, which has mostly been studied in laboratory settings [3]. Common methods to reduce moisture effects on spectra, such as external parameter orthogonalization (EPO) and direct standardization (DS), rely heavily on lab-based datasets [3]. To address this, we assessed the influence of soil moisture and surface roughness as co-variables in models predicting SOC content from reflectance spectra of bare Luvisols near Versailles, France. Spectral data were collected under natural light at 76 points, along with volumetric soil moisture (θ) and 7 roughness indicators from photogrammetry [4]. SOC was predicted using Partial Least Squares Regression (PLSR) and Random Forest (RF), with 4-fold cross-validation repeated 10 times. Six wavelength-selection (WS) strategies were tested: two from satellite simulations (EnMAP, Sentinel-2), two from model variable importance (PLSR, RF), one expert-based, and one using all wavelengths. Moisture and roughness were added individually. In-field spectra enabled reasonably accurate predictions, with RF outperforming PLSR (SOC RMSE: 1.6–1.8 g.kg⁻¹). WS methods improved accuracy only when co-variables were added. Moisture had little effect, while roughness improved prediction quality in most cases, especially shadow percentage for PLSR and the semivariogram sill parameter for RF. These results highlight the benefit of including surface roughness to improve large-scale SOC prediction from remote sensing.

ano.nymous@ccsd.cnrs.fr.invalid (Hugues Merlet) 26 Sep 2025

https://hal.science/hal-05285601v1

[hal-05285538] Spatial prediction of soil properties using Sentinel-2 temporal mosaics of non-vegetated soils in a semi-arid region: A comparative evaluation of Google Earth Engine and THEIA platforms in Sminja

This study investigates the potential of Sentinel-2 (S2) temporal mosaics (TM) of non-vegetated soils for enhancing soil property mapping in the semi-arid Sminja Plain, Tunisia (480 km²). Utilising data from 2019 to 2023 across all seasons (autumn, spring, summer, and winter), we generated TM through the Google Earth Engine (GEE) and THEIA platforms. This comparative evaluation highlighted the importance of platform selection and seasonal considerations in remote sensing-based soil property predictions. Non-vegetated soils were isolated using thresholds of NDVI &lt; 0.35 and NBR2 &lt; 0.09 to maximise non-vegetated soil extraction. Key soil properties analysed through a dataset of 215 sample locations regularly spread over the area included electrical conductivity (EC), soil organic carbon (SOC), pH, base saturation (BS), granulometric fractions, and soil moisture content. Random forest (RF) models with K-fold cross-validation assessed the predictive performance, evaluated using RMSE, RPD, and RPIQ metrics. Results indicate that both GEE and THEIA platforms effectively predicted (with THEIA having a very slight edge) most of the soil properties (SOC, CaCO₃, Ca, base saturation, granulometric fractions, and soil moisture content) with RPIQ values exceeding 1.7, while predictions for pH, EC, K, Na, and P₂O₅ were poorly reliable with RPIQ &lt; 0.8. This pinpointed the limitations of the generated RF models for certain soil properties in such environments. Seasonal variations slightly influenced model accuracy, underscoring the importance of platform selection and temporal considerations in remote sensing-based soil property prediction. These findings offer valuable insights for sustainable land management and agricultural planning in semi-arid regions.

ano.nymous@ccsd.cnrs.fr.invalid (Mukhtar Adamu Abubakar) 26 Sep 2025

https://hal.science/hal-05285538v1

[hal-05260506] Bare soil mosaicking optimisation for soil organic carbon prediction in Centre-Val de Loire

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 15 Sep 2025

https://hal.inrae.fr/hal-05260506v1

[hal-05285648] Spectral models learn the context, not the soil: rethinking soc prediction from lab to drone measurements under field conditions

Predicting soil organic carbon (SOC) using spectral data remains a challenge in digital soil mapping, particularly under field-scale conditions where environmental factors (e.g., vegetation, moisture) can mask or distort soil reflectance [1]. These unstable conditions are a major obstacle to model generalization across space and time. In this study, we evaluated the ability of SOC prediction models, built from reflectance spectra from lab to field and drone measurements, to account for and generalize across varying environmental conditions, over a unique field plot structure located in Nouzilly (France). The experimental design consists of 3 replicates (block design) of 4 to 5 tillage practices (modality) within a single 11.25 ha field. Two sampling campaigns (Oct 2024, May 2025) provided SOC (0-5 cm) and spectral data from lab, field, and UAV platforms at 75 sampling points. Co-variables such as moisture content and soil surface roughness were also collected. To assess model generalizability to new spatial and temporal conditions, we applied several data-splitting strategies: random splits, leave-one-block-out, leave-one-modality-out, and time-based splits between the October and May datasets. Our results show that tillage modality alone induced significant SOC variability at the soil surface, with mean SOC ranging from 12.1 g/kg under conventional tillage to 16.7 g/kg under minimum tillage. Seasonal differences between October and May also contributed substantially to SOC variability, further complicating model generalization. In this context, co-variables related to soil roughness and moisture had no significant impact on improving model accuracy. Model performance was highly sensitive to data-splitting strategy. Random splits gave overly optimistic results (R² = 0.75, RPIQ = 2.7 for field spectra), whereas leave-one-modality-out failed to generalize to unseen tillage practices, with most models showing R² &lt; 0. Leave-one-block-out yielded reliable performance for laboratory spectra but failed for UAV and field data, especially under reduced environmental variability (e.g., in May or after NDVI-based filtering), with R² dropping from 0.72 to 0.28 for October UAV measurements. These findings suggest that models often rely on indirect or ephemeral environmental features rather than direct or intrinsic spectral behaviour of bare soil resulting in unstable performance and poor transferability across space and time, even for similar soils.

ano.nymous@ccsd.cnrs.fr.invalid (Hugues Merlet) 26 Sep 2025

https://hal.science/hal-05285648v1

[hal-05265025] Enhanced Genome Assemblies of French-Bred Dactylis glomerata and Medicago sativa: Achieving High-Quality Tetraploid Genomes

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Marie Pegard) 17 Sep 2025

https://hal.inrae.fr/hal-05265025v1

[hal-05263786] Genetic and phenotypic diversity of lucerne (Medicago sativa) for optimising its role as a living mulch in agroecological systems

<div><p>Lucerne (Medicago sativa), a drought-tolerant forage legume, is increasingly used in agroecological systems as a service plant, particularly in intercropping with annual crops such as cereals. As a perennial crop, lucerne forms a living mulch that potentially offers multiple benefits, including weed control, nitrogen enrichment, and support for reduced tillage practices, thus saving energy consumption and preserving biodiversity. However, recent studies have highlighted a significant challenge: current lucerne varieties, selected for forage production, are overly competitive and negatively impact the productivity of interplanted crops such as wheat.</p><p>This study aimed at analysing the genetic and phenotypic diversity within the M. sativa complex to identify traits that enhance lucerne's effectiveness as a living mulch, focusing on competition for light and nitrogen among lucerne, wheat, and weeds, and later their genetic determinism.</p><p>Thirty diverse lucerne accessions, representing different subspecies, autumn dormancy levels, and plant architectures (ranging from prostrate to erect forms), were evaluated. In the first phase of the study, the effects of lucerne dormancy and growth habit on wheat dominance during early stages and weed abundance were assessed. Later in the season, at the wheat heading stage, the impact of lucerne height and lodging on wheat biomass and nitrogen status was evaluated. In the second phase, forty plants of each accession were genotyped using Genotyping-by-Sequencing (GBS) and phenotyped in a nursery for plant height, growth habit, and lodging susceptibility. Genetic variance was estimated using the REML model, and broadsense heritability was calculated. Genome-Wide Association Studies (GWAS) with a Multi-Locus Mixed Model (MLMM) were used to identify candidate QTLs.</p><p>The results suggest that lucerne varieties with slow growth, moderate height, and low lodging are most effective as living mulches. However, no variety in the panel exhibited all these desirable traits, requiring dedicated breeding programmes to create this ideotype. A total of 100K SNPs covering all eight lucerne chromosomes were identified. Genetic structure analysis revealed distinct separation between wild and cultivated forms, as well as between the falcata and sativa subspecies. A high heritability of the traits was observed, varying from 0.47 for lodging to 0.69 for growth habit. Variation around the correlations between traits suggested that it is possible to combine favourable traits in a single variety. QTLs were obtained; they could be used to speed up the genetic progress in breeding programmes.</p><p>These findings provide a foundation for the genetic improvement of lucerne as a living mulch, contributing to effective and sustainable agricultural practices. Further analyses are ongoing to determine how to introduce molecular markers in the selection of lucerne varieties adapted to living mulch use.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Zineb El Ghazzal) 16 Sep 2025

https://hal.inrae.fr/hal-05263786v1

[hal-05263772] Uncovering the role of phenotypic traits in shaping the genetic structure of alfalfa

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Irving Arcia Ruiz) 16 Sep 2025

https://hal.inrae.fr/hal-05263772v1

[hal-05308809] Développement d’une unité d’anonymisation visuelle automatisée par détection humaine via IA

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Manon Gauthier) 10 Oct 2025

https://hal.inrae.fr/hal-05308809v1

[hal-05262597] How can digital technology use and innovation contribute to sustainable transformation of business models in the agri-food sector?

The expectations of digital technologies in sustainable agricultural development are considerable. However, applying these technologies in agri-food value chains can have downsides, which are still barely studied. The main objectives of this systematic literature review were to discover the state of the art of the research in the use of digital technologies in business models contributing to sustainability in the agri-food sector, and to make recommendations for future research and management practice. In order to bring concepts together, develop a theoretical framework and advance knowledge, performing a literature review is conducive. This review worked with the commonly-used PRISMA-method to develop a systematic literature review. From this review, an overview of factors of digitalisation in business models of agri-food value chains were distinguished. Key themes that were found in the literature were the effects of COVID-19 on digitalisation and business resilience, the sustainability of business models in economic sense, and the importance of communication technologies in agri-food value chains. This paper argues that even though digital technologies can enhance social interaction, the human element can be lost in the process. Even if one business makes successful use of digital technologies, other actors in local and international value chains might not profit. The paper recommends for future research and management practice to use a framework that looks through a value co-creation and open innovation perspective to both the business model level and the interaction between (sustainable) business models in local and global food systems.

ano.nymous@ccsd.cnrs.fr.invalid (Laura Eline Slot) 16 Sep 2025

https://hal.inrae.fr/hal-05262597v1

[hal-05193281] Developing a microfluidic qPCR chip to quantify microbial taxa with a potential biocontrol activity against grapevine downy mildew

Grapevine downy mildew, caused by the oomycete Plasmopara viticola, is responsible for significant economic losses each year and for a large proportion of the fungicides used in viticulture. In order to limit the use of these chemical pesticides, which are incompatible with the development of sustainable viticulture, biocontrol solutions based on cultivated simplified communities of microorganisms (SimComs) are gradually emerging. In the present study, we designed several SimComs for the control of downy mildew, using a collection of microorganisms isolated from grapevine leaves by a culturomic approach. The SynComs composed of bacteria, yeasts and filamentous fungi, either described to have either a biocontrol activity against plant pathogens or abundant on grapevine leaves. We tested the hypothesis that including abundant species in the SimComs would help the microbial community colonize the leaves. Materials and methods A quantitative PCR microfluidic chip (Fluidigm Biomark) was developed to monitor the establishment of SinComs on grapevine leaves. Larger number of reaction are allowed by microfluidic PCR compared to classical qPCR, resulting in quicker and cheaper price per sample. It also has the advantage of providing absolute abundance data compared to metabarcoding approaches that only estimate the relative abundance of microbial taxa. Results So far, specific primers for 34 microbial taxa (out of the 42 selected for inclusion in the SimComs) have been designed in single-copy housekeeping genes. We are currently sequencing the genomes of the remaining microbial taxa to complete the primer design. We applied the microfluidic chip to DNA samples extracted from grapevine leaf discs inoculated with SimComs and were able to detect most of the inoculated microorganisms, including some microbial taxa that significantly reduced the intensity of downy mildew symptoms under laboratory conditions. The microfluidic chip was then applied to environmental DNA collected in vineyard from spore sensor, in order to detect and quantify the targeted protective microorganisms. By doing so, we were able to detect the presence of several microorganisms, including some microbial taxa with proven biocontrol activity against plant pathogens such as Bacillus pumilus, Aureobasidium pullulans and Epicoccum nigrum. Conclusion : These preliminary results shed light on the potential of microfluidic chip as a new molecular diagnostic tool to monitor specific microbial communities present naturally or artificially after SimCom inoculation in the field.

ano.nymous@ccsd.cnrs.fr.invalid (Manon Chargy) 30 Jul 2025

https://hal.science/hal-05193281v1

[hal-05161584] Cross-Species Predictions of Chromatin Annotations using Neural Networks

A better knowledge of functional annotations of livestock species can be a lever to link genome to phenome. The genomes of most livestock species have already been sequenced. However, data describing gene regulation mechanisms and chromatin state are insufficient. In contrast, abundant human and mouse data allowed the training of powerful deep learning algorithms. Here, we propose to use 3 artificial neural networks (Deepbind, DeepSEA and Enformer), trained with human and mouse data, to predict annotations on the pig, cattle, chicken and European seabass genomes. The predictions are then compared with experimental data to evaluate the cross-species performance of the neural networks. First, human-trained neural network predictions performed on the mouse reference genome showed varying levels of accuracy depending on the experiment, with the higher performance for H3K4me3 (auPRC=0.624). Second, the predictions on the pig, cattle and chicken genomes showed similar (lower mean auPRC=0.385+/-0.233) and better performances than those on the seabass genome (mean auPRC=0.144+/-0.096). Third, the evaluation of the impact of genomic features on the predictions highlighted better performances for CpG island and 5'UTR than other features. Finally, the comparison of predictions between different pig breeds with high genetic diversity demonstrated that genetic variability does not affect the performance, but rather observations. To conclude, we showed that the 3 neural networks evaluated can be used to predict annotations on non-mammalian genomes with similar performances (chicken), but not on genomes of organisms phylogenetically too distant (seabass).

ano.nymous@ccsd.cnrs.fr.invalid (Noémien Maillard) 14 Jul 2025

https://hal.inrae.fr/hal-05161584v1

[hal-05235924] LIPH4SAS : The French nationnally distributed research infrastructure for livestock phenotyping

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Jean Pierre Bidanel) 02 Sep 2025

https://hal.inrae.fr/hal-05235924v1

[hal-05344529] An advanced stochastic framework for the simulation of transgenerational hologenomic data

A holobiont is made up of a host organism together with its microbiota. In the context of animal breeding, the holobiont can be viewed as the single unit upon which selection operates. Therefore, integrating microbiota data into genomic prediction models may be a promising approach to improve predictions of phenotypic and genetic values. Nevertheless, there is a paucity of hologenomic transgenerational data to address this hypothesis, and thus to fill this gap, we propose a new simulation framework. Our approach, an R Implementation of a Transgenerational Hologenomic Model-based Simulator (RITHMS) is an open-source package, builds upon simulated transgenerational genotypes from the MoBPS package and incorporates distinctive characteristics of the microbiota, notably vertical and horizontal transmission as well as modulation due to the environment and host genetics. In addition, RITHMS can account for a variety of selection strategies and is adaptable to different genetic architectures. We simulated transgenerational hologenomic data using RITHMS under a wide variety of scenarios, varying heritability, microbiability, and microbiota heritability. We found that simulated data accurately preserved key characteristics across generations, notably microbial diversity metrics, exhibited the expected behavior in terms and correlation between taxa and of modulation of vertical and horizontal transmission, response to environmental effects and the evolution of phenotypic values depending on selection strategy. Our results support the relevance of our simulation framework and illustrate its possible use for building a selection index balancing genetic gain and microbial diversity. RITHMS is an advanced, flexible tool for generating transgenerational hologenomic data that incorporate the complex interplay between genetics, microbiota and environment.

ano.nymous@ccsd.cnrs.fr.invalid (Solène Pety) 03 Nov 2025

https://hal.science/hal-05344529v1

[hal-05161904] Generation of metabolomic-informed models of metabolism for microbial communities

The generation of genome-wide metabolic networks has become a routine analysis for individual organisms or communities communities. However, these automatically generated metabolic networks are incomplete because they are constructed by based on the combination of gene annotation and reactions available in generic available in generic databases (Metacyc, BIGG, ModelSEED...). These are oriented towards well-known organisms or organisms or model organisms and miss out on important functions secondary metabolism. We propose to combine metabolomic data analysis, metabolic modelling and annotation metabolic modelling and annotation mining to build high-quality models of high quality models of microbial metabolism with the long-term aim of better understanding of microbial communities. In terms of application of the methods to plant microbial communities, we hope that the plant microbial communities, we hope that the newly developed models will provide a better understanding of the process of microbial recruitment by the plant: metabolic functions involved, micro-organisms associated with these functions.

ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 15 Jul 2025

https://inria.hal.science/hal-05161904v1

[hal-05447268] Metage2Metabo-PostAViz: exploring and visualising the wealth of metabolic modelling predictions to compare microbial communities

The study of microorganisms’ metabolic functions and their interactions within microbial communities are of major interest in many applicative domains such as agriculture or health. Current computational approaches contribute to this objective through the reconstruction of metagenomes from sequenced microbiomes. Assembled genomes are the basis for metabolic network inference relying on gene-protein-reaction relationships. The metabolic potential of a microbial community can then be simulated with dedicated models, such as a Boolean approximation of metabolic activity in Metage2Metabo (M2M). When studying multiple microbial communities, for instance considering microbiome data for large cohorts of individuals, the amount of information related to the metabolic potential of each community and their members can be overwhelming. Analysing it in light of additional data (sample metadata, genome taxonomy…) further requires the implementation of statistical analysis, limiting the usability of the models. M2M-PostAViz is an interface-based application for the exploration and comparison of Metage2Metabo’s predicted metabolic potentials for many samples, in light of available metadata. An interactive interface enables exploring predictions, filtering and visualizing data. Users can focus analyses on the role of microbes or taxonomic groups of interest, or compare the producibility patterns of certain metabolites across samples. Scalability is ensured by dedicated development regarding data storage and exploration, enabling to scale to the comparison of more than 5,000 microbiome samples analysed with M2M. The application is implemented in Python and Shiny-Python, and relies on Parquet for data management. It generates customisable plots and statistical tests for data exploration. M2M-PostAViz facilitates the scalable exploration of metabolic potential in large microbiome cohorts, with emphasis on user-friendliness and analysis customisation.

ano.nymous@ccsd.cnrs.fr.invalid (Léonard Brindel) 07 Jan 2026

https://inria.hal.science/hal-05447268v1

[hal-05163368] Statistical method inference cMFA for multi-omics data integration in microbial community models

Understanding microbial community functions is challenging due to complex interactions and assembly mechanisms; however, advances in sequencing have enabled the collection of multi-omics data, including population counts and metabolomic or metatranscriptomic data. Our main objective is to develop a mathematical model capable of integrating time series of multiomics data at a community scale. We introduce the community metabolic flux analysis (cMFA) method, which generalizes metabolic flux analyses,, using a list of time series data of experimentally measured production and consumption rates of metabolites and microorganism growth. We aim to infer, for each member of the microbial community, the intracellular distribution of metabolic fluxes. This is a high-dimensional constrained linear regression problem, informed by mass conservation constraints and metatranscriptomic data, encoded in the penalty term. The difficulty here is in accurately inferring latent internal rates from a few observations of exchange fluxes. We evaluated the cMFA method on synthetic data from dynamic models of increasingly complex microbial communities, based on metabolic models of different mutants of Escherichia coli using dynamic flux balance analysis (dFBA). Synthetic metatranscriptomic data were obtained from internal metabolic fluxes in the dynamic model. Different regularization terms were tested, including different levels of sparsity, for the selected penalty weight . To evaluate the robustness of the method, multiple benchmarks were tested. These included assessments of the robustness of the method to data noise, incomplete meta-transcriptomic data, and inaccurate prior knowledge of metabolic import rates. Currently, we are working with real data and expanding the study to a larger microbial community.

ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 15 Jul 2025

https://inria.hal.science/hal-05163368v1

[hal-05194387] Bread wheat pangenome Graphs

[...]

ano.nymous@ccsd.cnrs.fr.invalid (Pauline Lasserre) 31 Jul 2025

https://hal.inrae.fr/hal-05194387v1

[hal-05086840] Spatiotemporal modeling of host–pathogen interactions using level-set method

<div><p>Phenotyping host-pathogen interactions is crucial for understanding infectious diseases in plants. Traditionally, this process has relied on visual assessments or manual measurements, which can be subjective and labor-intensive. Recent advances in image processing and mathematical modeling enable the precise and high-throughput phenotyping of plant symptoms. Among many challenges, considering local deformations of symptoms and host tissues is difficult in plant pathology. In this study, we address this question using a level-set method. We propose an innovative approach in plant pathology that allows one to reconstruct the continuous deformation of leaf and lesion contours from daily image sequences of inoculated leaves. We consider pea stipules inoculated by the fungal pathogen Peyronellaea pinodes as an example pathosystem. After extracting lesion and stipule contours from daily visible images, we use the level-set method to track their deformations within image sequences. The visual assessment of model adequacy, along with the Jaccard Index and relative error metrics, demonstrated strong overall performance. Results showed a gradual decrease in model accuracy over time for leaf contours, while lesion contours exhibited a higher relative error on the first targeted date. These findings highlight the robustness of our method while identifying specific challenges in early lesion detection. We finish by discussing the interest in this method based on partial differential equations for the study of host-pathogen interactions, especially the development of original phenotyping methods in plant pathology.</p></div>

ano.nymous@ccsd.cnrs.fr.invalid (Sheila Rae Permanes) 02 Jun 2025

https://hal.inrae.fr/hal-05086840v2

[hal-05117608] The future of systems genetics in farm animal sciences

Farm animal species are under intense selection on relatively small population sizes. Genetic and genomic selection has provided remarkable genetic gains in the last century. Nevertheless, current methods aiming to link genome to phenome in such populations remain limited, notably due to the difficulty to identify causal variants for complex traits. The diversity of species as well as breeds in livestock has diluted the number of genomic datasets available for each genome as compared to model organisms or human diseases. In this article we propose a systems genetics approach as an opportunity to go beyond current limits, taking advantage of novel computational development allowing integration of omics datasets from different analyses across species. A major challenge is that systems genetics requires careful but efficient data and metadata management, as well as rigorous statistical and strategies on which approach to use. Here, we highlight examples of the broad contribution systems genetics can bring to farm animal sciences, particularly across species, notably in the genome-to-phenome field within the larger scope of agricultural challenges including adaptation to environmental changes and animal welfare.

ano.nymous@ccsd.cnrs.fr.invalid (Guillaume Devailly) 17 Jun 2025

https://hal.science/hal-05117608v1

[hal-05105798] Inference Method cMFA for multi-omics data integration in microbial community models

Understanding microbial community functions is challenging due to complex interactions and assembly mechanisms. However, advances in sequencing technologies have enabled the collection of multi-omics data, including population counts and metabolomic or metatranscriptomic profiles. Our main objective is to develop a mathematical model capable of integrating time series of multi-omics data at the community scale. We introduce the community Metabolic Flux Analysis (cMFA) method: a biology-informed inference approach that generalizes classical Metabolic Flux Analysis. This high-dimensional analytical framework aims to estimate metabolic fluxes by integrating multi-omics data. Specifically, we aim to (i) quantify, for each member of the microbial community, their individual contributions to overall community dynamics based on external measurements of metabolite dynamics, and (ii) infer their intracellular distribution of metabolic fluxes. The difficulty here is in accurately inferring latent internal rates from a few observations of community-scale consumption and production rates for extracellular metabolites. We evaluated the cMFA method using synthetic data generated from dynamic models of microbial communities of increasing complexity using dynamic flux balance analysis, based on metabolic models of different Escherichia coli mutants. Synthetic metatranscriptomic data were obtained from internal metabolic fluxes simulated in the dynamic model. To assess the robustness of the method, we benchmarked its performance under varying levels of experimental noise.

ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 10 Jun 2025

https://hal.science/hal-05105798v1

[hal-05288241] Mapler: a pipeline for assessing assembly quality in taxonomically rich metagenomes sequenced with HiFi reads

Metagenome assembly seeks to reconstruct the most high-quality genomes from sequencing data of microbial ecosystems. Despite technological advancements that facilitate assembly, such as Hi-Fi long reads, the process remains challenging in complex environmental samples consisting of hundreds to thousands of populations. Mapler is a metagenome assembly and evaluation pipeline with a focus on evaluating the quality of Hi-Fi long read metagenome assemblies. It incorporates several state-of-the-art metrics, as well as novel metrics assessing the diversity that remains uncaptured by the assembly process. Mapler facilitates the comparison of assembly strategies and helps identify methodological bottlenecks that hinder genome reconstruction.

ano.nymous@ccsd.cnrs.fr.invalid (Nicolas Maurice) 03 Nov 2025

https://hal.science/hal-05288241v1

Catégorie de cookie	Moyens de désactivation
Cookies analytiques et de performance	Realytics Google Analytics Spoteffects Optimizely
Cookies de ciblage ou publicitaires	DoubleClick Mediarithmics

Cookies obligatoires	Cookies fonctionnels	Cookies sociaux et publicitaires
Ces cookies sont nécessaires au bon fonctionnement du site, ils ne peuvent pas être désactivés. Ils nous sont utiles pour vous fournir une connexion sécuritaire et assurer la disponibilité a minima de notre site internet.	Ces cookies nous permettent d’analyser l’utilisation du site afin de pouvoir en mesurer et en améliorer la performance. Ils nous permettent par exemple de conserver vos informations de connexion et d’afficher de façon plus cohérente les différents modules de notre site.	Ces cookies sont utilisés par des agences de publicité (par exemple Google) et par des réseaux sociaux (par exemple LinkedIn et Facebook) et autorisent notamment le partage des pages sur les réseaux sociaux, la publication de commentaires, la diffusion (sur notre site ou non) de publicités adaptées à vos centres d’intérêt.
Sur nos CMS EZPublish, il s’agit des cookies sessions CAS et PHP et du cookie New Relic pour le monitoring (IP, délais de réponse). Ces cookies sont supprimés à la fin de la session (déconnexion ou fermeture du navigateur)	Sur nos CMS EZPublish, il s’agit du cookie XiTi pour la mesure d’audience. La société AT Internet est notre sous-traitant et conserve les informations (IP, date et heure de connexion, durée de connexion, pages consultées) 6 mois.	Sur nos CMS EZPublish, il n’y a pas de cookie de ce type.

HAL : Dernières publications

[hal-05230510] Seed Inference in Interacting Microbial Communities Using Combinatorial Optimization

[hal-05348017] Measuring shade use of dairy cattle at pasture with an on-cow light sensor: a case study

[hal-05410799] Data Paper: HotPig, a behavioural dataset of pigs under heat stress

[hal-05444004] Les technologies numériques en élevage : de la mesure à l’évaluation comportementale du bien-être de chaque animal

[hal-05419350] MetaNetMap: automatic mapping of metabolomic data onto metabolic networks

[hal-05435147] On Logic-based Self-Explainable Graph Neural Networks

[hal-05264391] Method: An accurate method for detecting drinking bouts in dairy cows based on reticulorumen temperature

[hal-04997560] Data paper: A goat behaviour dataset combining labelled behaviours and accelerometer data for training Machine Learning detection models

[hal-05385353] WAIT4 – un projet de recherche alliant technologies numériques et IA pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis des transitions agroécologique et climatique

[hal-05380224] NINSAR Project: Defining Agroecological Routes Using Robots

[hal-05368332] Modeling the emergent metabolic potential of soil microbiomes in Atacama landscapes

[hal-05178193] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

[hal-05340010] Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification

[hal-05343366] Forest Cover in the Congo Basin: Consistency Evaluation of Seven Datasets

[hal-05322783] Whole genome sequencing dataset for a Vitis vinifera diversity panel

[hal-05435107] Leveraging internal representations of GNNs with Shapley values

[hal-05350945] Considering farmers’ needs in agroliving labs : a case study

[hal-05435121] Diffusion for Explainable Unsupervised Anomaly Detection

[hal-05318560] cMFA for multi-omics data integration in microbial community models

[hal-05304541] Generation of metabolomic-informed models of metabolism for microbial communities

[hal-05110984] Advancing agroecology and sustainability with agricultural robots at field level: A scoping review

[hal-05304536] Generation of metabolomic-informed models of metabolism for microbial communities

[hal-05283043] Assessing fruit tree vigor in peach and apple orchards through wood segmentation in ground-based RGBimages

[hal-05340126] Accurate MAG reconstruction from complex soil microbiome through combined short- and HiFi long-reads metagenomics

[hal-05301772] Long-term evolution of forest cover in the Pacific coast of Ecuador (1960–2019): a comparison of Land Use/Land Cover (LULC) remote sensing products

[hal-05261543] cMFA for multi-omics data integration in microbial community models

[hal-05281103] Evaluating the potential of Sentinel-2 data to assess the coarse fragment cover of the soil surface within a Spanish vineyard

[hal-05281203] Detection of soil management practices using Sentinel-1 time series: the challenges raised by the diversified management sequences in vineyards

[hal-05260643] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

[hal-05285601] Improving the prediction of soil organic carbon content using field-acquired hyperspectral data by accounting for soil moisture and surface roughness

[hal-05285538] Spatial prediction of soil properties using Sentinel-2 temporal mosaics of non-vegetated soils in a semi-arid region: A comparative evaluation of Google Earth Engine and THEIA platforms in Sminja

[hal-05260506] Bare soil mosaicking optimisation for soil organic carbon prediction in Centre-Val de Loire

[hal-05285648] Spectral models learn the context, not the soil: rethinking soc prediction from lab to drone measurements under field conditions

[hal-05265025] Enhanced Genome Assemblies of French-Bred Dactylis glomerata and Medicago sativa: Achieving High-Quality Tetraploid Genomes

[hal-05263786] Genetic and phenotypic diversity of lucerne (Medicago sativa) for optimising its role as a living mulch in agroecological systems

[hal-05263772] Uncovering the role of phenotypic traits in shaping the genetic structure of alfalfa

[hal-05308809] Développement d’une unité d’anonymisation visuelle automatisée par détection humaine via IA

[hal-05262597] How can digital technology use and innovation contribute to sustainable transformation of business models in the agri-food sector?

[hal-05193281] Developing a microfluidic qPCR chip to quantify microbial taxa with a potential biocontrol activity against grapevine downy mildew

[hal-05161584] Cross-Species Predictions of Chromatin Annotations using Neural Networks

[hal-05235924] LIPH4SAS : The French nationnally distributed research infrastructure for livestock phenotyping

[hal-05344529] An advanced stochastic framework for the simulation of transgenerational hologenomic data

[hal-05161904] Generation of metabolomic-informed models of metabolism for microbial communities

[hal-05447268] Metage2Metabo-PostAViz: exploring and visualising the wealth of metabolic modelling predictions to compare microbial communities

[hal-05163368] Statistical method inference cMFA for multi-omics data integration in microbial community models

[hal-05194387] Bread wheat pangenome Graphs

[hal-05086840] Spatiotemporal modeling of host–pathogen interactions using level-set method

[hal-05117608] The future of systems genetics in farm animal sciences

[hal-05105798] Inference Method cMFA for multi-omics data integration in microbial community models

[hal-05288241] Mapler: a pipeline for assessing assembly quality in taxonomically rich metagenomes sequenced with HiFi reads