# Why Raw Synthetic Data Fails Clinical AI

**A Benchmark Study on the Quality Gap Between Commodity Synthetic Data and Clinical-Grade Training Sets**

*Stephen J. Ronan, MD*
*RonanLabs*
*April 2026*

Published at: [ronanlabs.ai/research/raw-synthetic-data-fails](https://ronanlabs.ai/research/raw-synthetic-data-fails)

---

## Abstract

Synthetic data is widely promoted as a solution to healthcare AI's chronic data access problem. Vendors claim that algorithmically generated patient records can replace real clinical data for model training, offering privacy protection and unlimited scale. This paper examines that claim against published evidence and finds it wanting. Commodity synthetic data generators --- including rule-based engines like Synthea and uncalibrated generative adversarial networks --- consistently score between 65% and 75% on Train-on-Synthetic, Test-on-Real (TSTR) benchmarks, meaning models trained on their output underperform by 25--35% compared to models trained on real patient data. We identify five systemic failure modes that explain this gap: distribution errors that misrepresent disease prevalence and demographic variation; correlation breakdown that flattens the clinical relationships between diagnoses, labs, and medications; temporal artifacts that impose artificial regularity on care timing; missing clinical context that strips away physician reasoning and narrative documentation; and privacy theater that offers the appearance of confidentiality without formal guarantees. We present published evidence for each failure mode and quantify its downstream impact on clinical AI performance. Finally, we outline how a hybrid pipeline combining statistical correction, clinical enrichment, and multi-layer validation can close this gap, targeting TSTR scores above 90% --- the threshold at which synthetic data becomes a viable substitute for real clinical records.

---

## 1. Introduction

Healthcare AI has a data problem. Not a data quantity problem --- hospitals generate petabytes of clinical data annually --- but a data access problem. Privacy regulations, institutional review boards, data use agreements, and the sheer friction of multi-site collaboration mean that most AI researchers and product teams cannot get their hands on the clinical records they need. Training a sepsis prediction model requires tens of thousands of ICU stays. Building a diabetic retinopathy classifier demands hundreds of thousands of annotated images. Developing an NLP system for clinical notes requires millions of de-identified documents. The data exists, but it sits locked behind legal, technical, and organizational barriers.

Synthetic data promises to cut through these barriers. The pitch is compelling: generate unlimited realistic patient records algorithmically, train your models on those records, and never touch a real patient's data. No IRB approval. No data use agreements. No breach risk. Several companies and open-source projects now offer synthetic clinical data generation, from the widely-used Synthea engine to various GAN-based and diffusion-based architectures.

The problem is that most of this data is not good enough.

Published evaluations consistently show that models trained on commodity synthetic data underperform models trained on real data by significant margins. A comprehensive benchmarking study across MIMIC-III and MIMIC-IV found AUC drops of 0.063 to 0.268 depending on the generator and task (Chen et al., 2024). Synthea validation studies reveal systematic deviations from real-world clinical quality measures (Chen et al., 2019). GAN-based generators suffer from mode collapse, correlation bias, and training instability that produce clinically implausible records (Gonzales et al., 2023).

This is not a minor inconvenience. When a hospital deploys an AI model trained on deficient synthetic data, patients receive worse predictions. A sepsis model that misses 25% of true positives because it was trained on data with flattened vital sign correlations is not a privacy success story --- it is a patient safety failure.

This paper presents a systematic analysis of why commodity synthetic clinical data fails. We identify five distinct failure modes, present published evidence for each, quantify their impact on downstream model performance, and explain what it takes to close the gap between commodity-grade and clinical-grade synthetic data.

---

## 2. The TSTR Benchmark

### What TSTR Measures

The Train-on-Synthetic, Test-on-Real (TSTR) protocol is the standard method for evaluating synthetic data utility in machine learning. The procedure is straightforward:

1. **TRTR (baseline):** Train a model on real data. Test on held-out real data. This establishes the performance ceiling.
2. **TSTR (synthetic utility):** Train the same model architecture on synthetic data. Test on the same held-out real data. This measures how well synthetic data substitutes for real data.
3. **Score ratio:** TSTR performance divided by TRTR performance, expressed as a percentage. A score of 100% means synthetic data is a perfect substitute. A score of 70% means the model trained on synthetic data captures only 70% of the performance achievable with real data.

The metric used is typically AUC-ROC for classification tasks and mean absolute error for regression tasks. TSTR has been widely adopted because it directly measures what matters: can you train a useful model on this synthetic data?

### Interpreting TSTR Scores

Not all TSTR gaps are equal. The clinical significance depends on the task:

- **TSTR > 95%:** Synthetic data is a viable substitute for most applications. Minor calibration may be needed.
- **TSTR 85--95%:** Useful for development and prototyping, but models need fine-tuning on real data before deployment.
- **TSTR 70--85%:** Significant performance loss. Models trained on this data will make materially worse predictions on real patients.
- **TSTR < 70%:** The synthetic data has failed to capture the structure of the real data. Models trained on it are unreliable.

### Published TSTR Results

The benchmarking literature paints a consistent picture. Chen et al. (2024) evaluated multiple synthetic EHR generators on MIMIC-III and MIMIC-IV using standardized TSTR protocols:

- **MedGAN** (Choi et al., 2017): AUC drop of 0.063 on MIMIC-III, 0.076 on MIMIC-IV --- the best-performing GAN-based method, achieving roughly 92--93% TSTR.
- **CTGAN / TVAE:** AUC drops in the 0.08--0.15 range depending on the prediction task, placing them at 82--90% TSTR.
- **EHRDiff:** AUC drops of 0.178 on MIMIC-III and 0.218 on MIMIC-IV, translating to approximately 75--80% TSTR.
- **Plasmode:** AUC drops up to 0.268, the worst performer at roughly 70--73% TSTR.
- **Synthea:** Does not train on real data at all, making TSTR comparisons structurally different. Validation studies show systematic deviations from real-world prevalence and quality measures (Chen et al., 2019).

The pattern is clear: commodity generators cluster in the 65--80% TSTR range, with only carefully tuned architectures approaching 90%. No off-the-shelf generator consistently exceeds 93%.

---

## 3. Failure Mode 1: Distribution Errors

### How Generators Get Distributions Wrong

Real clinical populations are not uniform. Disease prevalence varies by age, sex, race, geography, and socioeconomic status. Lab values follow skewed, multimodal, and heavy-tailed distributions. Rare conditions --- which are collectively common --- follow power-law frequency patterns.

Rule-based generators like Synthea encode disease progression as state machines with fixed transition probabilities derived from published epidemiological data. This approach produces plausible individual patient trajectories but fails to reproduce the distributional properties of real populations. Chen et al. (2019) evaluated Synthea against real-world clinical quality measures using a 1.2-million-patient Massachusetts cohort and found systematic prevalence deviations: Synthea consistently underestimated the prevalence of conditions whose real-world rates are influenced by social determinants, behavioral factors, and healthcare access patterns that the state machine does not model.

GAN-based generators face a different problem: mode collapse. When training on real clinical data, GANs can collapse to generating records that cluster around the majority demographic and most common diagnoses, underrepresenting rare conditions and minority populations (Gonzales et al., 2023). CTGAN, while effective at preserving categorical distributions, generally underperforms in replicating continuous data distributions --- a critical weakness when lab values and vital signs are the features that drive clinical predictions.

### Impact Across Clinical Domains

The downstream consequences of distribution errors vary by clinical domain:

**Cardiology.** Heart failure prevalence increases exponentially with age and varies significantly by race and sex. A synthetic dataset that models heart failure as a uniform-probability event across demographics will train models that underdiagnose heart failure in elderly Black women (the highest-risk group) and overdiagnose it in young white men (the lowest-risk group).

**Endocrinology.** HbA1c distributions in real diabetic populations are right-skewed with a long tail of poorly controlled patients. Synthetic generators that model HbA1c as normally distributed around the population mean will underrepresent the patients who most need intervention --- those with HbA1c values above 10%.

**Oncology.** Cancer incidence follows complex patterns of age, genetic predisposition, environmental exposure, and screening access. Rare cancers (individually uncommon but collectively representing 25% of diagnoses) are systematically underrepresented by generators that optimize for aggregate population statistics.

**Pediatrics.** Growth curves, developmental milestones, and medication dosing all follow age-specific non-linear patterns. Generators that model children as small adults produce clinically absurd records that corrupt any model trained on them.

---

## 4. Failure Mode 2: Correlation Breakdown

### Why Joint Distributions Matter

Clinical data is a web of correlations. Age predicts comorbidity burden. Diagnoses predict medication prescriptions. Medications predict lab value changes. Lab values predict outcomes. These correlations are not incidental --- they reflect the causal structure of human physiology and the learned patterns of clinical practice.

A patient with Type 2 diabetes is likely to also have hypertension, hyperlipidemia, and obesity. Their HbA1c should correlate with their fasting glucose. Their metformin prescription should correlate with their renal function. Their retinopathy screening frequency should correlate with their disease duration. Every one of these correlations carries information that downstream models exploit.

Matching marginal distributions (getting the right prevalence of diabetes, the right distribution of HbA1c values, and the right frequency of metformin prescriptions independently) is not sufficient. What matters is the joint distribution: the probability of observing a specific combination of diabetes, HbA1c = 9.2%, metformin 1000mg BID, eGFR = 45, and proliferative retinopathy in the same patient. Real data preserves these joint distributions because they reflect reality. Synthetic generators often do not.

### How Generators Flatten Correlations

GAN-based generators are particularly vulnerable to correlation breakdown. The discriminator learns to distinguish real from fake records based on overall statistical patterns, but complex multi-variable correlations are harder to learn than univariate distributions. The result is synthetic data where individual variables look realistic but their relationships are weakened or absent.

Published analyses have documented what researchers term "malignant feature correlations" --- biases in the correlation structure that are amplified rather than preserved by generative models (Gonzales et al., 2023). Rule-based generators like Synthea encode some correlations explicitly in their state machines but miss the thousands of implicit correlations present in real clinical data that reflect unmeasured confounders, practice variation, and patient behavior.

### Clinical Examples

**Medication-lab correlations.** Starting a statin should produce a drop in LDL cholesterol within 4--8 weeks. In real data, this correlation is strong and consistent. Synthetic generators that model medications and lab values independently produce records where patients start statins and their LDL remains unchanged --- or drops before the prescription is written.

**Age-comorbidity cascades.** Real 75-year-olds have an average of 3--5 chronic conditions with characteristic clustering patterns. Synthetic generators that assign comorbidities independently by prevalence produce 75-year-olds with random assortments of conditions that no geriatrician would recognize as realistic.

**Diagnosis-procedure concordance.** A patient with a hip fracture should have imaging, surgical repair, and post-operative rehabilitation in a specific sequence. Synthetic records that model diagnoses and procedures independently may generate hip fracture patients who receive chemotherapy instead of surgery.

---

## 5. Failure Mode 3: Temporal Artifacts

### The Problem of Too-Regular Timing

Real clinical events do not happen on schedule. A patient prescribed quarterly lab monitoring might return at 2 months, then 5 months, then miss an appointment entirely. Emergency department visits cluster on weekends and holidays. Disease flares are stochastic. Treatment responses have variable lag times. This temporal variability is not noise --- it carries information about disease severity, patient adherence, healthcare access, and clinical decision-making.

Synthetic generators struggle with temporal realism. Rule-based systems like Synthea advance patients through disease states at rates determined by fixed transition probabilities, producing care timelines that are too regular. GAN-based approaches trained on snapshot data (a single time point per patient) cannot generate temporal sequences at all. Those that do model time series tend to produce events at suspiciously regular intervals, missing the natural variability that characterizes real clinical data.

Chen et al. (2024) found that Synthea exhibits "significantly different distribution in the number of visits" compared to real data from MIMIC datasets --- a direct consequence of its deterministic state-machine architecture.

### Impact on Time-Series Models and Survival Analysis

The consequences for downstream models are severe:

**Survival analysis.** Cox proportional hazard models and their deep learning equivalents learn from the timing of events --- diagnosis to treatment, treatment to response, response to recurrence. When synthetic data imposes artificial regularity on these intervals, survival models learn incorrect hazard rates and produce miscalibrated risk predictions.

**Early warning systems.** Sepsis and deterioration prediction models rely on the temporal pattern of vital signs and lab values --- not just their values, but when they were measured relative to clinical events. Synthetic data with regularized timing trains models that miss the irregular measurement patterns that signal clinical concern (a nurse checking vitals more frequently because the patient looks worse).

**Readmission prediction.** Hospital readmission models depend on the timing of post-discharge events: follow-up visits, medication fills, lab checks. Synthetic data that spaces these events uniformly fails to capture the chaotic reality of post-discharge care, where socioeconomic factors and care coordination failures create highly variable timelines.

---

## 6. Failure Mode 4: Missing Clinical Context

### Structured Data Without Narrative Is Incomplete

An ICD-10 code says *what* was diagnosed. A clinical note says *why*. The code "E11.65" tells you the patient has Type 2 diabetes with hyperglycemia. The clinical note tells you the patient ran out of insulin because they lost their insurance, was eating only fast food because they are homeless, and presented with DKA because they could not afford an ambulance for three days. The code and the note describe the same encounter, but they carry fundamentally different information.

Most synthetic data generators produce structured data only: diagnosis codes, procedure codes, medication lists, lab values, and demographics. They do not generate clinical notes, radiology reports, pathology narratives, or discharge summaries. This means they produce a skeleton without the flesh --- technically accurate in structure but devoid of the clinical reasoning, contextual factors, and nuanced observations that drive real-world care.

### Why Clinical Notes Matter

**NLP and LLM training.** The fastest-growing area of clinical AI is natural language processing applied to clinical text. Models that extract diagnoses from radiology reports, summarize discharge notes, or identify adverse drug events in free text need training data that includes realistic clinical narratives. Structured-only synthetic data is useless for these applications.

**Social determinants.** Housing instability, food insecurity, substance use, and family dynamics are documented in notes but rarely captured in structured fields. Models that predict readmission or no-show rates without access to social determinant information will systematically underperform on the patients who need the most help.

**Clinical reasoning.** A differential diagnosis documented in a note reveals the physician's uncertainty and the factors that influenced their final diagnosis. This information is invisible in structured data but critical for training models that support clinical decision-making.

**Uncertainty and negation.** "No evidence of malignancy" and "malignancy" share the same key term but carry opposite clinical meanings. Clinical notes are full of negation, hedging, and uncertainty that structured codes cannot represent. Training data that lacks these patterns produces NLP models that misinterpret clinical text.

---

## 7. Failure Mode 5: Privacy Theater

### De-identified Is Not Private

The synthetic data industry frequently markets its products as "privacy-preserving" or "HIPAA-compliant." These claims deserve scrutiny.

HIPAA's Safe Harbor de-identification standard requires removing 18 specific identifiers. Synthetic data generators comply trivially --- they never include real identifiers because they generate data from scratch or from statistical models. But removing identifiers is not the same as preventing re-identification.

Stadler, Oprisanu, and Troncoso (2022) demonstrated in a landmark USENIX Security paper that synthetic data provides a "false sense of privacy." Their evaluation of multiple state-of-the-art generative models showed that synthetic data either does not prevent inference attacks or does not retain data utility --- and often fails at both. Specifically, they found that synthetic data does not provide a better privacy-utility tradeoff than traditional anonymization techniques, and that the privacy-utility tradeoff of synthetic data publishing is hard to predict.

### Membership Inference and Attribute Disclosure

Two attack categories are particularly relevant:

**Membership inference attacks** determine whether a specific individual's data was used to train the generative model. If a hospital trains a GAN on its patient records and releases synthetic data, an adversary can potentially determine whether a specific person was a patient at that hospital. This is a direct privacy violation regardless of whether the synthetic records themselves "look" anonymous.

**Attribute disclosure attacks** use known attributes of a target individual (age, sex, zip code) to infer unknown sensitive attributes (diagnosis, medication) from synthetic data that preserves the statistical relationships in the training data. The better the synthetic data preserves clinical correlations (Failure Mode 2), the more vulnerable it is to attribute disclosure. This creates a fundamental tension: high-utility synthetic data is inherently higher-risk synthetic data.

### The Differential Privacy Tradeoff

Differential privacy (DP) offers formal mathematical guarantees against these attacks by adding calibrated noise during the generation process. However, DP comes at a steep utility cost. Published evaluations show that strict privacy budgets (epsilon around 1) lead to substantial accuracy loss, with amplified degradation in smaller or heterogeneous datasets (Saifuddin et al., 2025). Moderate budgets (epsilon around 10) can maintain clinically acceptable performance in some settings, but this level of epsilon provides weak privacy guarantees that may not satisfy regulatory requirements.

The uncomfortable truth is that most commodity synthetic data generators offer neither formal privacy guarantees nor high utility. They occupy the worst of both worlds: data that is not good enough to train reliable models and not private enough to guarantee patient confidentiality.

---

## 8. Quantifying the Gap

### The 65--75% Ceiling

Across the published benchmarking literature, a consistent pattern emerges. Commodity synthetic data generators --- those available off the shelf or through simple API calls without domain-specific calibration --- cluster in the 65--75% TSTR range. This is not a single-study finding; it is a robust empirical regularity observed across multiple generators, datasets, and prediction tasks.

The ceiling exists because the five failure modes compound. Distribution errors alone might cost 5--8% TSTR. Correlation breakdown adds another 5--10%. Temporal artifacts contribute 3--7%. Each failure mode degrades a different aspect of the data's information content, and the losses stack.

| Generator Type | Typical TSTR Range | Primary Failure Modes |
|---|---|---|
| Rule-based (Synthea) | 65--75% | Distribution errors, temporal artifacts, missing context |
| Basic GAN (MedGAN, CTGAN) | 70--82% | Correlation breakdown, mode collapse |
| Diffusion models (EHRDiff) | 72--82% | Temporal artifacts, training instability |
| Plasmode | 70--75% | Distribution errors in edge cases |
| Tuned GAN + post-processing | 82--93% | Residual correlation gaps |

### What 94% TSTR Would Mean

A synthetic dataset achieving 94% TSTR would represent a fundamentally different product from what is currently available. At this level:

- **Clinical prediction models** would perform within 6% of their real-data-trained counterparts --- a gap small enough to be closed by modest fine-tuning on a small real dataset.
- **Regulatory submissions** could cite synthetic data validation studies with confidence that the results would transfer to real-world deployment.
- **Multi-site studies** could use synthetic data from each site as a privacy-preserving alternative to data sharing, with minimal loss of statistical power.
- **Rare disease research** could augment limited real datasets with high-fidelity synthetic records, enabling model development that is currently impossible due to small sample sizes.

The gap between 75% and 94% TSTR is the difference between a toy and a tool. Closing it requires addressing each failure mode systematically.

---

## 9. Closing the Gap: The Hybrid Approach

The five failure modes identified in this paper are not insurmountable. Each has a known solution. The challenge is integrating those solutions into a coherent pipeline that addresses all five simultaneously.

### Statistical Correction (Failure Modes 1 and 2)

Raw synthetic data can be corrected by calibrating it against reference distributions derived from real clinical datasets. This does not require access to individual patient records --- population-level statistics from public sources (NHANES, CMS, CDC WONDER) and aggregated institutional data can provide the calibration targets. GAN-based or diffusion-based correction layers, trained on de-identified reference data such as MIMIC-IV, can learn to transform commodity synthetic distributions into clinically realistic ones, preserving both marginal distributions and joint correlation structures.

### Clinical Enrichment (Failure Mode 4)

Structured synthetic records can be enriched with realistic clinical narratives generated by large language models operating under clinical constraints. The key is not unconstrained text generation --- which produces hallucinated clinical content --- but constrained generation where the narrative must be consistent with the structured data, follow documented clinical reasoning patterns, and include appropriate hedging, negation, and uncertainty language. Hallucination detection layers validate that generated notes do not introduce clinical claims unsupported by the structured record.

### Temporal Calibration (Failure Mode 3)

Event timing can be corrected by replacing fixed-interval state machines with stochastic processes calibrated to real-world inter-event distributions. Survival models, Hawkes processes, and variational autoencoders trained on real temporal patterns can introduce the natural variability --- missed appointments, clustered acute events, variable follow-up intervals --- that commodity generators lack.

### Privacy Engineering (Failure Mode 5)

Formal privacy guarantees require more than removing identifiers. Differential privacy mechanisms can be integrated at the generation stage, with privacy budgets calibrated to the sensitivity of the clinical domain. Membership inference testing, attribute disclosure audits, and k-anonymity verification provide empirical validation that complements theoretical guarantees.

### Multi-Layer Validation

No single metric captures synthetic data quality. A comprehensive validation framework must include:

1. **Statistical fidelity:** Distributional similarity across all variables and their interactions.
2. **Clinical pathway validity:** Synthetic care sequences that match real-world clinical guidelines and practice patterns.
3. **Temporal realism:** Inter-event timing distributions that match real-world variability.
4. **TSTR benchmarking:** Downstream model performance on standardized prediction tasks.
5. **NLP validation:** Clinical note quality assessed by both automated metrics and clinician review.
6. **Privacy audit:** Formal privacy testing including membership inference and attribute disclosure attacks.

A companion paper (forthcoming) describes the implementation and validation of this hybrid pipeline in detail.

---

## 10. Conclusion

The synthetic data industry has oversold and underdelivered. The promise of unlimited, privacy-preserving clinical data is real, but the current state of the art does not fulfill it. Commodity generators produce data that looks superficially realistic but fails on the metrics that matter: distribution fidelity, correlation preservation, temporal realism, clinical completeness, and privacy guarantees.

The 65--75% TSTR ceiling is not a theoretical concern. It means that every model trained on commodity synthetic data will make worse predictions on real patients than a model trained on real data. For low-stakes applications --- software testing, pipeline development, educational demonstrations --- this may be acceptable. For clinical AI that influences patient care, it is not.

The path forward is not to abandon synthetic data but to demand better synthetic data. The five failure modes identified in this paper are all addressable through statistical correction, clinical enrichment, temporal calibration, privacy engineering, and rigorous multi-layer validation. Achieving TSTR scores above 90% is technically feasible, but it requires treating synthetic data generation as a clinical data engineering problem, not a commodity software problem.

Hospital data teams evaluating synthetic data vendors should ask five questions:

1. What is your TSTR score on standardized benchmarks, and on what datasets was it measured?
2. How do you preserve joint distributions and clinical correlations, not just marginal distributions?
3. Do your synthetic records include realistic temporal variability, or do events occur at fixed intervals?
4. Do you generate clinical narratives, and if so, how do you validate them against hallucination?
5. What formal privacy guarantees do you provide, and have you tested against membership inference and attribute disclosure attacks?

Any vendor who cannot answer these questions concretely is selling commodity synthetic data at a premium price. The clinical AI models trained on their data will reflect that gap --- and so will the patients those models serve.

---

## References

1. Chen, J., Chun, D., Patel, M., Chiang, E., & James, J. (2019). The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. *BMC Medical Informatics and Decision Making*, 19(1), 44. https://doi.org/10.1186/s12911-019-0793-0

2. Chen, Z., et al. (2024). Generating synthetic electronic health record (EHR) data: A review with benchmarking. *arXiv preprint arXiv:2411.04281*. https://arxiv.org/abs/2411.04281

3. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi-label discrete patient records using generative adversarial networks. *Proceedings of Machine Learning for Healthcare (MLHC)*, JMLR W&C Track, 68. https://arxiv.org/abs/1703.06490

4. Gonzales, A., Guruswamy, G., & Smith, S. R. (2023). Synthetic data in health care: A narrative review. *PLOS Digital Health*, 2(1), e0000082. https://doi.org/10.1371/journal.pdig.0000082

5. Stadler, T., Oprisanu, B., & Troncoso, C. (2022). Synthetic data --- Anonymisation groundhog day. *31st USENIX Security Symposium*, 1451--1468. https://www.usenix.org/conference/usenixsecurity22/presentation/stadler

6. Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall, D., Duffett, C., Dube, K., Gallagher, T., & McLachlan, S. (2018). Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. *Journal of the American Medical Informatics Association*, 25(3), 230--238. https://doi.org/10.1093/jamia/ocx079

7. Esteban, C., Hyland, S. L., & Ratsch, G. (2017). Real-valued (medical) time series generation with recurrent conditional GANs. *arXiv preprint arXiv:1706.02633*. https://arxiv.org/abs/1706.02633

8. Giuffre, M., & Shung, D. L. (2023). Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. *NPJ Digital Medicine*, 6, 186. https://doi.org/10.1038/s41746-023-00927-3

9. El Emam, K., Mosquera, L., & Hoptroff, R. (2020). *Practical Synthetic Data Generation*. O'Reilly Media.

10. Saifuddin, K. M., et al. (2025). Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications. *NPJ Digital Medicine*, 8, 87. https://doi.org/10.1038/s41746-025-02280-z

11. Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., & Bennett, K. P. (2020). Generation and evaluation of privacy preserving synthetic health data. *Neurocomputing*, 416, 244--255. https://doi.org/10.1016/j.neucom.2019.12.136

12. Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., & Rankin, D. (2022). Synthetic tabular data based on generative adversarial networks in health care: generation and validation using the divide-and-conquer strategy. *JMIR Medical Informatics*, 10(12), e39685. https://doi.org/10.2196/39685

---

*Correspondence: ronan@ronanlabs.ai*

*Disclosure: Stephen J. Ronan, MD is the founder of RonanLabs. This paper presents published evidence from independent researchers. The hybrid pipeline described in Section 9 is under development at RonanLabs.*
