Narrativa® Dataset Voyager

AI Agents for SDTM and ADaM dataset generation

Within the Narrativa Navigator platform, the Dataset Voyager solution ingests raw clinical study data and transforms it into CDISC-compliant SDTM and ADaM datasets — ready for statistical analysis, Define.xml authoring, and regulatory submission to the FDA, PMDA, EMA, and other health authorities.

Talk to our teamTalk to our team

SPOTLIGHT

In just a few minutes, generate SDTM and ADaM datasets using AI Agents that conform to CDISC standards and pass regulatory validation.

AI Agents for SDTM and ADaM Automation

Standardized datasets are the foundation of every modern regulatory submission. The Study Data Tabulation Model (SDTM) organizes raw clinical data into a structured, reviewable format, while the Analysis Data Model (ADaM) prepares those datasets for statistical analysis and reviewer reproducibility. Traditionally, building both layers is a long, code-intensive process involving statistical programmers, biostatisticians, and data managers — with multiple rounds of mapping, derivation, validation, and quality control before a package is ready for the FDA, PMDA, or EMA.

Narrativa® Dataset Voyager streamlines this workflow using AI agents and the Narrativa Knowledge Graph. The platform interprets raw study data, maps it to the correct CDISC SDTM domains, derives ADaM analysis datasets, and produces the supporting submission artifacts — Define.xml, validation reports, and the Study Data Reviewer’s Guide — with minimal manual coding.

Capabilities:

Automatically maps raw clinical data to SDTM domains based on the Annotated CRF (aCRF) and the CDISC SDTM Implementation Guide
Generates ADaM datasets — including ADSL, ADAE, ADLB, ADVS, and other BDS structures — directly from validated SDTM
Applies standard ADaM derivations (baseline flags, change from baseline, treatment-emergent flags, analysis populations) with full traceability
Produces files in the format required for FDA submission
Drafts Define.xml v2.1 metadata and the Study Data Reviewer’s Guide (SDRG) as part of the same workflow
Runs pre-submission validation to flag conformance issues before package delivery

Benefits

No SAS programming experience required to produce a submission-ready dataset package
CDISC SDTM and ADaM compliance built in by default
Drastically reduced validation and rework cycles between programmers, biostatisticians, and medical writers
Full end-to-end traceability from raw data to analysis value
Faster ADaM availability shortens CSR timelines and accelerates regulatory submission

Learn more — SDTM & ADaM Automation

The SDTM Generation Proces

Raw to SDTM mapping. Source data is ingested and mapped into specific SDTM domains — for example, Demographics (DM), Adverse Events (AE), and Vital Signs (VS) — based on the Annotated Case Report Form (aCRF) that links each CRF field to its target SDTM variable.

Standard guides. Programmers follow the CDISC SDTM Implementation Guide (SDTMIG) so that variables, controlled terminology, and dataset structures align with FDA and PMDA expectations published in the Study Data Technical Conformance Guide.

Key SDTM domain classes. SDTM organizes data into three observation classes: Interventions (e.g., Concomitant Medications, Exposure), Events (e.g., Adverse Events, Medical History), and Findings (e.g., Laboratory Tests, ECG, Questionnaires). Special-purpose domains such as DM, CO, and SE complete the model.

The ADaM Generation Process

ADaM takes the standardized SDTM data and organizes it specifically for statistical analysis, allowing reviewers to replicate the results.

Subject-Level Analysis Dataset (ADSL). Every study begins with ADSL, which contains exactly one record per subject and houses baseline characteristics, treatment assignments, and analysis population flags.

Basic Data Structure (BDS). BDS datasets, such as ADLB for laboratory data and ADVS for vital signs, hold parameter and analysis values in a long, one-record-per-observation format that supports flexible statistical reporting.

Occurrence Data Structure (OCCDS). Datasets like ADAE (Adverse Events) follow the OCCDS structure, optimized for incidence and frequency analyses.

Implementation standard. Programmers rely on the CDISC ADaM Implementation Guide to ensure derivations — change from baseline, treatment-emergent flags, study day calculations, analysis populations — are traceable, consistent, and reproducible by regulatory reviewers.

Learn more: TLF Automation

What is the difference between SDTM and ADaM?

SDTM (Study Data Tabulation Model) organizes raw clinical data into a standardized, observation-level structure for regulatory submission. ADaM (Analysis Data Model) takes those standardized SDTM datasets and restructures them specifically for statistical analysis, adding derived variables, analysis flags, and traceability so that reviewers can reproduce every result in the Clinical Study Report.

Is SAS still mandatory for FDA submissions?

No. The FDA mandates the SAS XPORT v5 (.xpt) transport format for tabulation and analysis datasets, but it does not require sponsors to use SAS software to produce them. Datasets generated in R, Python, or any other language are acceptable as long as the final files conform to the required XPORT v5 format and pass CDISC validation. The FDA is also piloting Dataset-JSON as a modern alternative to XPORT v5.

How does the R package fit in?

is an open-source R package developed under the pharmaverse initiative by Roche, GSK, and other sponsors. It provides modular, reusable functions for deriving ADaM datasets directly from SDTM, covering common derivations such as study day, treatment-emergent flags, and analysis populations. It is increasingly adopted as a vendor-neutral alternative — or complement — to traditional SAS macro libraries.

What does Pinnacle 21 validate?

Pinnacle 21 checks SDTM, ADaM, SEND, and Define.xml files against the published CDISC conformance rules and against the FDA and PMDA Study Data Technical Conformance Guides. It flags issues such as missing required variables, invalid controlled terminology, structural inconsistencies, and metadata mismatches between Define.xml and the datasets themselves.

Can AI agents generate Define.xml and the SDRG automatically?

Yes. Narrativa® Dataset Voyager uses AI agents and the Narrativa Knowledge Graph to draft Define.xml v2.1 metadata and a first-pass Study Data Reviewer’s Guide as part of the same workflow that produces the SDTM and ADaM datasets, dramatically reducing the manual effort traditionally required to assemble a complete submission package.