The ANALYST project has received funding from the European Union’s Horizon Europe Research and Innovation Programme under the Grant Agreement No 101138548
.png)


Jun 24, 2025
New Publication: Leveraging NLP to Accelerate Sustainable Innovation in Material Science.
We’re excited to announce a new open-access conference paper developed within the framework of the ANALYST Project, now published in Springer Nature’s Lecture Notes in Mechanical Engineering, as part of the proceedings from the European Symposium on Artificial Intelligence in Manufacturing (ESAIM 2024).
📘 Title: Enhancing Product Lifecycle Efficiency: Harnessing Natural Language Processing for Materials Insight and Optimization
🖋️ Authors: Inés Pérez Couñago, Lara Suárez Casabiell, Andrea Gregores-Coto, Christian Eike Precker, Santiago Muiños-Landin (AIMEN Technology Centre)
📅 Published: 22 March 2025📚 Part of the book: Advances in Artificial Intelligence in Manufacturing II
🔗 DOI: https://doi.org/10.1007/978-3-031-86489-6_24
Research Overview
In today’s rapidly evolving industrial landscape, data plays a critical role in designing safer and more sustainable materials. However, accessing and interpreting this data, especially across the complex lifecycle of materials, remains a major challenge.
This study addresses that gap by exploring how Natural Language Processing (NLP) and Large Language Models (LLMs) can be used to streamline information retrieval across the chemical, environmental, health, social, and economic dimensions of materials, focusing specifically on the polyvinyl chloride (PVC) value chain.
The work presents an end-to-end pipeline that automates:
Data extraction from scientific literature, databases, and online content
Topic modeling using Latent Dirichlet Allocation (LDA) to organize and filter datasets
Retrieval and response generation with vector-based similarity search and LLMs (Llama2)
Performance evaluation using a mix of statistical and semantic scoring models
Tools and Technical Highlights
The research combines several open-source tools, including:
LangChain for PDF parsing and embedding workflows
Chroma vector store for managing document embeddings locally
XL-Instructor and Llama2 for semantic search and answer generation
BLEU, METEOR, BERTScore, Prometheus and others for evaluating output accuracy
The case study on PVC demonstrated that filtering irrelevant or noisy data using LDA significantly improved both the accuracy and speed of information retrieval—cutting processing time by over 94% in some scenarios.
Implications and Future Outlook
This work contributes to the broader goals of the ANALYST project: enabling Safe and Sustainable by Design (SSbD) through digital innovation. By improving access to structured, relevant lifecycle data, this NLP-driven approach can support:
Smarter material design
Faster sustainability assessments
Better decision-making for researchers, regulators, and industry
The authors also highlight future directions, including the integration of neural topic modeling, enhanced evaluation metrics, and applications beyond the PVC sector.
👉 Read the full article (Open Access): https://zenodo.org/records/15321832
This publication is part of the ANALYST Project, funded by the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No. 101138548.