Boorbeschrijving GEOBERTje

Bridging Geology and AI: GEOBERTje, a Domain-Specific Language Model for Flanders' Subsurface

Experts in geology and IT at VITO have joined forces to develop GEOBERTje, a Dutch-language large language model tailored to the subsurface of Flanders. Trained on thousands of lithological borehole descriptions from the Flemish subsurface database (DOV), GEOBERTje automatically translates unstructured geological texts into structured lithology classifications.

Geological borehole descriptions contain rich, detailed insights into the subsurface—but their unstructured format poses challenges for automated analysis. To address this, we developed GEOBERTje: a domain-adapted large language model trained on Dutch-language borehole descriptions from Flanders (Belgium). GEOBERTje is designed to extract meaningful features from geological texts and represent them in a structured, vectorized format—enabling new possibilities for data-driven geological research.

This open-source project is part of a broader effort to enhance geological data processing through machine learning. We invite researchers, institutions, and practitioners to explore the model and code, share feedback and use cases and collaborate on future developments.

logo GEOBERTje
GEOBERTje

Want to know more?

Would you like more information about GEOBERTje or other innovative projects related to the (shallow) subsurface? Contact Katrijn Dirix.

In-house VITO innovation: IT in geological research

This research was funded by VITO's own innovation budget and focuses on the world of the (un)deep subsurface. It is an example of how VITO invests in its data-driven research capabilities with advanced hardware and data systems. These form the basis for the development of innovative digital tools thanks to close collaboration between geologists and IT experts. VITO develops tools to efficiently unlock complex data for partners, such as the Flemish government. The tools can be used in projects, but can also inspire new lines of research.