The increasing adoption of automation and digitalization in biomanufacturing has led to the proliferation of high-frequency, multi-source sensor data from bioreactors and associated analytical devices. However, raw data generated by different equipment vendors are typically heterogeneous in format, structure, sampling frequency, and naming conventions, which poses a substantial barrier to automated data processing and meaningful cross-instrument integration. Without harmonization, sensor time series, off-gas measurements, and other process signals often remain locked in proprietary or inconsistent representations, requiring significant manual intervention to extract, align, and organize them into analysis-ready datasets. This fragmentation complicates tasks such as time alignment, unit standardization, and variable renaming, and hinders further downstream workflows such as statistical analysis, predictive modelling, and digital-twin construction, all of which rely on coherent, machine-readable data.
raw2ready addresses these challenges by providing a modular, end-to-end Python framework designed to parse, structure, and merge raw bioprocess data into standardized, human- and machine-readable formats. The tool encapsulates device-specific parsing logic that converts vendor output into a common data layout, applies configurable translation tables to produce meaningful column headers, and aligns time series from disparate sources based on nearest timestamp matching. Additionally, raw2ready supports dynamic calculation of derived metrics through user-defined formulas driven by experiment-specific constants, enabling the computation of composite variables that are crucial for modelling and digital twin workflows. Beyond its command-line interface (CLI), raw2ready now features a graphical user interface (UI) through which users can interactively select and rename columns, assign official units for measurements, and configure processing and calculation steps, thereby lowering the usability barrier for non-programmers while maintaining the flexibility needed for automated, reproducible data harmonization in biomanufacturing.
Programme: Bioindustry 4.0
SEEK ID: https://ibisbahub.eu/projects/105
Public web page: https://gitlab.com/arc3972240/bioindustry/raw2ready
Organisms: No Organisms specified
IBISBA PALs: No PALs for this Project
Project created: 5th Feb 2026
Overview
Tags
https://orcid.org/0000-0002-5002-7901
