REFOTDE > Departments > Data Science and Data Management

The Data Science and Data Management Unit (DSDMU) at REFOTDE ensures high-quality, secure, ethical management and exploitation of data for all our research activities. We support the full data lifecycle-from data collection in the field to secure digital storage and analysis-while strictly adhering to international research standards.

Our Mission

To ensure the integrity, confidentiality, and accuracy of research data through advanced systems, ethical practices, and seamless collaboration. After this, we are also involved in the exploitation of data to derive insights to inform a wide range of stakeholders (including policy-makers, partners, members of communities, the general public, the scientific community, etc.). For some repetitive tasks in other units that may need a lot of tech skills to automate, the DSDMU may assist through the creation of tiny programs.

What We Do

  • Design and manage data through Electronic Data Capture systems with the help of our German collaborators (REDCap, Medrio, etc.).
  • Design and maintain local datasets for monitoring purpose through key performance indicators.
  • Ensure secure data collection, entry, validation, and storage.
  • Maintain strict compliance with GCP (Good Clinical Practice) and GCDMP (Good Clinical Data Management Practice) standards.
  • Provide training to build data capacity across teams.
  • Coordinate with field and lab teams for integrated workflows.
  • Build and maintain dashboards to monitor work progress.
  • We leverage varying levels of complexity in statistical models and machine learning algorithms to address regression or classification problems.
  • We prepare our output (key metrics, tables, and visualizations) and contribute in the writing of periodic reports, scientific articles, preparation of poster and other dissemination tools.

Our Workflow

1. Case Report Forms (CRFs):
Data is initially recorded in standardized CRFs based on source documents by trained field and clinical staff.
2. Electronic Data Capture (EDC) tools:
CRFs are reviewed and entered into EDC tools (mostly REDCap or Medrio) using laptops/tablets via secure internet connections.
3. Double Data Entry & Validation:
Two independent entries are compared automatically by the system to detect discrepancies, which are then resolved.
4. Quality ChecksCleaning:
Routine data validation, discrepancy resolution, and query generation ensure clean datasets for analysis.
5. Secure Storage:
– CRFs are stored in locked, access-controlled cabinets for prolonged periods.
– REDCap & Medrio are password-protected, with user-specific access roles.
– Automated backups and encrypted servers ensure data protection.
6. Data Exploitation
Through an iterative process, data is accessed by authorized personnel who proceed to data wrangling, EDA (Exploratory Data Analysis), charts preparation, eventually dashboard preparation, and advanced analysis.

Compliance & Ethics

  • Follows Good Clinical Practice (GCP) and Good Clinical Data Management Practices (GCDMP).
  • Full audit trails ensure data traceability.
  • Participant confidentiality is preserved via anonymization and informed consent.
  • Aligned with local and international data protection laws

Training & Capacity Building

  • Regular workshops on EDC systems and data ethics.
  • Onboarding sessions for new staff.
  • In-house certifications on REDCap, Medrio, and data integrity.
  • Cross-team trainings to strengthen collaboration.
  • Statistical analysis.

System Security

  • Role-based access control and strong password policies.
  • Two-factor authentication (where applicable).
  • Daily backups to secure, encrypted servers.
  • Comprehensive disaster recovery protocols.

Team Structure

Our skilled and multidisciplinary team includes:

  • PI – Oversees the unit activities.
  • Data Manager – Oversees system operations and compliance.
  • Data Clerks – Handle data entry, review, and discrepancy resolution.
  • Database Developer – Designs custom REDCap/Medrio environments.
  • Quality Monitors – Conduct internal audits and data reviews.
  • Field Coordinators – Ensure proper data flow from collection to entry.
  • Data Analyst/Scientist – Ensures the most is extracted from collected data, in close collaboration with the PI and collaborators.

Software

  • EDC systems: REDCap, Medrio, etc.
  • Spreadsheets: Microsoft Excel, Google Sheets.
  • Statistical analysis: IBM SPSS Statistics, Graphpad Prism, R, Python.
  • Integrated Development Environments: RStudio, Positron, Jupyter Lab/Notebook.
  • Document Processing and Publishing Tools: Microsoft Word, Google Docs, R RMardown, Quarto, LaTeX, R Shiny.
  • Version Control: Git and GitHub.

Major Projects

  • TAKeOFF 1 (completed)
  • MAPTB (completed)
  • FiMMIP (completed)
  • ESRiFAL  (ongoing)
  • TAKEOFF2 (upcoming)
  • eWHORM (upcoming)

Integration Across Teams

The DSDMU works hand-in-hand with field and laboratory teams:

  • Real-time feedback and corrections.
  • Continuous monitoring and improvement of CRF accuracy.
  • Seamless lab result integration into the database.
  • Support for real-time data analysis and reporting.

Why It Matters

  • Accurate, secure, and ethically managed data is the foundation of impactful research. At REFOTDE, our Data Science and Data Management Unit ensures that every dataset is audit-ready, scientifically valid, policy-relevant analyzed and presented in the convenient manner for the audience.