# On-Premise Clinical AI Without Data Exposure

### A Technical Architecture for Hospital-Sovereign Artificial Intelligence

**Stephen J. Ronan, MD**
RonanLabs | ronanlabs.ai

April 2026

---

## Abstract

Hospitals want clinical AI. They cannot afford the data exposure that cloud-based AI services require. This paper presents an architecture that eliminates the conflict: hospitals provide de-identified clinical data under HIPAA Safe Harbor or Expert Determination standards, RonanLabs trains custom models on isolated GPU infrastructure, and the resulting model weights are deployed on the hospital's own hardware. No live EHR access is required. No API keys are exchanged. No VPN tunnels connect to the hospital network. The trained model weights are mathematical representations of learned clinical patterns -- they are not Protected Health Information and cannot be reverse-engineered to reconstruct individual patient records. This architecture delivers institution-specific clinical AI -- including custom models ranging from 7 billion to 400 billion+ parameters and a synthetic data generator calibrated to the hospital's patient population -- without creating a single new attack surface on the hospital's network. We describe the end-to-end data flow, the security threat model, the technical basis for why model weights do not constitute PHI, and deployment configurations for on-premise, private cloud, and air-gapped environments. The result is a hospital AI capability with zero ongoing cloud dependency and zero per-query fees.

---

## 1. Introduction

Every hospital CIO in the United States faces the same paradox. Clinical AI has demonstrated measurable improvements in diagnostic accuracy, documentation efficiency, and decision support. McKinsey estimates that generative AI could generate $200 billion to $360 billion in annual value for the U.S. healthcare system [1]. Yet adoption remains stalled. A 2024 survey by the American Hospital Association found that 56% of hospital executives cited data privacy and security as the primary barrier to AI adoption, ahead of cost, workflow integration, and clinician trust [2].

The reason is straightforward. The dominant model for clinical AI requires hospitals to transmit patient data -- or grant API access to systems containing patient data -- to external cloud services. Every major cloud AI vendor (OpenAI, Google, Microsoft, Amazon) requires some form of data transmission to deliver inference. Even vendors that offer HIPAA-compliant Business Associate Agreements still require the hospital to accept residual risks: data in transit across public networks, data at rest on shared infrastructure, and vendor personnel with theoretical access to decryption keys.

Hospital security teams are not wrong to resist this. Between 2020 and 2025, the U.S. Department of Health and Human Services Office for Civil Rights reported over 3,000 healthcare data breaches affecting 500 or more individuals, exposing more than 385 million patient records [3]. The average cost of a healthcare data breach reached $10.93 million in 2023, the highest of any industry for the thirteenth consecutive year [4]. No CISO wants to be the one who approved the AI vendor that became the next headline.

The on-premise alternative has historically been impractical. Training clinically useful AI models required GPU clusters costing millions of dollars, teams of machine learning engineers, and months of iteration. A community hospital with a $2 million IT capital budget could not justify a $5 million GPU cluster for an experimental AI project.

That constraint has changed. NVIDIA's DGX Spark -- a desktop-class system with 128 GB of unified memory capable of running models up to 200 billion parameters -- costs under $10,000. Parameter-efficient fine-tuning methods like LoRA (Low-Rank Adaptation) can produce institution-specific clinical models in days rather than months, using a fraction of the compute previously required [5]. And advances in de-identification tooling have made Safe Harbor-compliant data extraction a repeatable, auditable process.

This paper describes an architecture that exploits these changes: a service model where hospitals provide de-identified data, receive custom-trained models, and deploy them on their own hardware. The hospital's data never touches a cloud service. The hospital's network is never connected to an external system. The hospital gets sovereign AI -- and the data stays home.

---

## 2. The Data Sovereignty Problem

### 2.1 HIPAA and the Cloud Conundrum

The Health Insurance Portability and Accountability Act (HIPAA) does not prohibit sharing patient data with vendors. It requires a Business Associate Agreement (BAA) that obligates the vendor to safeguard Protected Health Information (PHI) according to the same standards that apply to the covered entity [6]. In theory, a hospital can send PHI to any vendor willing to sign a BAA and implement the required administrative, physical, and technical safeguards.

In practice, BAAs transfer legal liability without eliminating risk. When a cloud AI vendor suffers a breach, the hospital faces regulatory investigation, patient notification obligations, potential OCR enforcement actions, and reputational damage -- regardless of whether the vendor's BAA was technically compliant. The 2023 breach of MOVEit, a file transfer tool used by healthcare organizations, exposed data from over 60 million individuals across multiple sectors, including hospital systems that had signed BAAs with the affected vendors [7].

### 2.2 Why Hospitals Resist

Hospital resistance to cloud AI is not irrational conservatism. It reflects a rational assessment of asymmetric risk. The hospital bears the full downside of a breach (regulatory penalties up to $2.1 million per violation category per year [8], class action lawsuits, patient trust erosion) while the AI vendor bears limited contractual liability typically capped at the contract value.

Additionally, hospitals face compounding compliance obligations:

- **Vendor risk assessments** under HIPAA Security Rule Section 164.308(a)(1) require documented analysis of every vendor with PHI access
- **State privacy laws** (California CCPA/CPRA, Washington My Health My Data Act, and others) impose additional requirements beyond HIPAA
- **CMS Conditions of Participation** require hospitals to protect patient information as a condition of Medicare reimbursement
- **Joint Commission standards** include information management requirements that encompass AI systems processing clinical data

Each new AI vendor multiplies this compliance burden. A hospital using three cloud AI services must maintain three BAAs, conduct three annual vendor risk assessments, monitor three external attack surfaces, and manage three sets of access credentials. The marginal compliance cost of adding an AI vendor exceeds the marginal cost of adding a traditional SaaS tool because PHI is involved at every layer.

### 2.3 The Breach Landscape

The scale of healthcare data breaches has accelerated. Notable incidents illustrate the systemic risk:

- **Change Healthcare (2024):** A ransomware attack on UnitedHealth Group's Change Healthcare unit exposed data from an estimated 100 million individuals, the largest healthcare breach in U.S. history. The attack disrupted claims processing for thousands of healthcare providers nationwide [9].
- **HCA Healthcare (2023):** An unauthorized party accessed an external storage location containing 11 million patient records including names, dates of birth, and appointment information [10].
- **Cerebral (2023):** The telehealth company disclosed that it had been sharing PHI with third-party advertising platforms (Google, Meta, TikTok) via tracking pixels for three years, affecting 3.1 million users [11].

These incidents share a common vector: patient data was stored or transmitted outside the hospital's direct physical control. The architecture described in this paper eliminates that vector entirely.

---

## 3. Architecture Overview

### 3.1 End-to-End Data Flow

```
+------------------+     +-------------------+     +------------------+
|                  |     |                   |     |                  |
|    HOSPITAL      |     |    RONANLABS      |     |    HOSPITAL      |
|    (Source)      |     |    (Training)     |     |    (Deployment)  |
|                  |     |                   |     |                  |
|  +-----------+   |     |  +-------------+ |     |  +------------+  |
|  |    EHR    |   |     |  |  DGX Spark  | |     |  | On-Premise |  |
|  |  System   |   |     |  |  Cluster    | |     |  | GPU Server |  |
|  +-----+-----+   |     |  |  (3 units)  | |     |  +------+-----+  |
|        |         |     |  +------+------+ |     |         |        |
|        v         |     |         |        |     |         v        |
|  +-----------+   |     |  +------+------+ |     |  +------------+  |
|  | De-Ident  |   |     |  | Fine-Tuning | |     |  |  Clinical  |  |
|  |  Engine   |   |     |  | Pipeline    | |     |  |  AI Model  |  |
|  +-----+-----+   |     |  +------+------+ |     |  +------+-----+  |
|        |         |     |         |        |     |         |        |
|        v         |     |  +------+------+ |     |         v        |
|  +-----------+   |     |  | 6-Layer     | |     |  +------------+  |
|  | De-Ident  |   |     |  | Validation  | |     |  | Inference  |  |
|  | Extract   |   |     |  +------+------+ |     |  | API        |  |
|  +-----+-----+   |     |         |        |     |  +------------+  |
|        |         |     |  +------+------+ |     |                  |
|        |         |     |  | Synthetic   | |     |                  |
|        |  SFTP/  |     |  | Data Gen    | |     |                  |
|        +-------->|     |  +-------------+ |     |                  |
|     Encrypted    |     |                  |     |                  |
|     S3 / Local   |     |    Deliverables: |     |                  |
|     Script       |     |    - Model weights ---->                  |
|                  |     |    - Synth gen    ----->                  |
|                  |     |    - Validation   ----->                  |
|                  |     |      report       |     |                  |
+------------------+     +-------------------+     +------------------+

  PHASE 1: Extract        PHASE 2: Train          PHASE 3: Deploy
  (Hospital controls)     (RonanLabs, isolated)   (Hospital controls)
```

### 3.2 Phase 1: De-Identification and Extraction

The hospital's IT team performs all de-identification. RonanLabs never receives identified data. De-identification follows one of two HIPAA-approved methods:

**Safe Harbor Method (45 CFR 164.514(b)(2)):** Removal or generalization of 18 specific identifier categories: names, geographic data smaller than a state, dates (except year) for individuals over 89, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number [12].

**Expert Determination Method (45 CFR 164.514(b)(1)):** A qualified statistical expert certifies that the risk of identifying any individual from the dataset is very small, applying accepted statistical and scientific principles. This method permits retention of more clinical detail (e.g., precise dates, geographic data at sub-state levels) when the expert determines that re-identification risk is acceptably low [12].

RonanLabs provides hospitals with a de-identification specification document tailored to the intended clinical use case. For clinical decision support models, the specification typically requires diagnosis codes, procedure codes, lab values, medication lists, vital signs, and clinical notes with entities removed -- but does not require dates, locations, or any direct identifiers.

### 3.3 Phase 2: Secure Transfer

De-identified extracts are transferred via one of three mechanisms, selected by the hospital's IT security team:

1. **SFTP with RSA-4096 key exchange:** Hospital pushes de-identified files to a dedicated SFTP endpoint. Files are encrypted in transit (SSH tunnel) and at rest (AES-256 on the receiving volume). The SFTP server is isolated from all other RonanLabs systems.

2. **Encrypted S3 bucket with customer-managed keys:** For hospitals that prefer object storage, a dedicated S3 bucket with server-side encryption (SSE-KMS) and a customer-managed AWS KMS key ensures that RonanLabs cannot access the data without the hospital's explicit key grant. Bucket policies restrict access to a single IAM role used exclusively for data ingestion.

3. **On-premise script execution:** For maximum control, RonanLabs provides a containerized extraction script that runs inside the hospital's environment. The script processes EHR exports, performs de-identification validation, and produces the training-ready dataset without any data leaving the hospital network until the hospital IT team manually transfers the output.

### 3.4 Phase 3: Training

Training occurs on RonanLabs' DGX Spark cluster -- three NVIDIA DGX Spark units, each equipped with NVIDIA Grace Blackwell architecture providing 128 GB of unified CPU+GPU memory. The cluster operates on an isolated network segment with no inbound internet access. Training workloads use parameter-efficient fine-tuning:

- **LoRA (Low-Rank Adaptation):** Trains small adapter matrices (typically 0.1-1% of total model parameters) that modify the behavior of a pre-trained foundation model. This approach requires orders of magnitude less compute than full fine-tuning while achieving comparable task performance [5].
- **QLoRA:** Combines LoRA with 4-bit quantization of the base model, enabling fine-tuning of 70B+ parameter models on a single DGX Spark unit [13].
- **Foundation models:** Base models are selected from open-weight families (Llama, Qwen, Mistral) that permit commercial use and do not require data to be shared with the model provider.

### 3.5 Phase 4: Validation and Delivery

Every model undergoes a 6-layer validation pipeline before delivery:

1. **Statistical fidelity:** Distribution comparison between synthetic outputs and the original de-identified training data across key clinical variables
2. **Clinical accuracy:** Domain-specific evaluation against published clinical benchmarks and guidelines
3. **Privacy audit:** Membership inference attack testing to verify that the model cannot be used to determine whether a specific record was in the training set
4. **Bias assessment:** Performance evaluation across demographic subgroups to identify disparities
5. **Adversarial robustness:** Red-team testing for prompt injection, jailbreaking, and hallucination under clinical scenarios
6. **Deployment readiness:** Integration testing with standard inference frameworks (vLLM, Ollama, NVIDIA NIM)

Deliverables shipped to the hospital:

- Model weight files (safetensors format, checksummed)
- LoRA adapter weights (if applicable)
- Synthetic data generator (containerized)
- Validation report (PDF, 50-100 pages)
- Deployment guide and inference server configuration
- Model card documenting training data characteristics, intended use, limitations, and ethical considerations

---

## 4. Why Model Weights Are Not PHI

This section addresses the most critical regulatory question in the architecture: do the trained model weights constitute Protected Health Information?

### 4.1 How Neural Networks Store Knowledge

A neural network trained on clinical data does not store patient records. It stores learned statistical relationships between inputs and outputs, encoded as millions or billions of floating-point numbers (weights) organized in matrix form. During training, the network adjusts these weights to minimize prediction error across the entire training dataset. The result is a compressed, lossy representation of patterns observed across the population -- not a retrievable database of individual records.

Consider a 7-billion-parameter model fine-tuned on 100,000 de-identified clinical notes. The model weights occupy approximately 14 GB in 16-bit floating-point format. The original clinical notes, at an average of 2,000 words each, total approximately 1.2 GB of text. The model has not memorized the notes; it has extracted distributional patterns (e.g., "patients presenting with chest pain and elevated troponin are frequently diagnosed with acute coronary syndrome") and encoded those patterns as weight adjustments across billions of parameters. Individual patient records are not recoverable from this representation.

### 4.2 Differential Privacy Guarantees

RonanLabs applies differential privacy techniques during training to provide mathematical guarantees against individual record extraction:

- **DP-SGD (Differentially Private Stochastic Gradient Descent):** Clips per-sample gradients and adds calibrated Gaussian noise during training, ensuring that the inclusion or exclusion of any single record changes the model output by no more than a bounded amount [14].
- **Privacy budget tracking:** Each training run maintains an (epsilon, delta) privacy budget that quantifies the maximum information leakage about any individual record. Typical training configurations achieve epsilon values below 8.0 with delta of 1e-5, providing strong formal privacy guarantees [15].

### 4.3 Membership Inference Resistance

Membership inference attacks attempt to determine whether a specific data record was used in training a model. RonanLabs tests every delivered model against state-of-the-art membership inference attacks, including:

- **Shadow model attacks** [16]
- **Likelihood ratio attacks** [17]
- **Calibrated confidence-based attacks**

Models must demonstrate membership inference accuracy no better than random chance (50% +/- 2%) before delivery. This is validated using a held-out test set that was excluded from training.

### 4.4 Legal and Regulatory Perspective

HIPAA defines PHI as "individually identifiable health information" that is transmitted or maintained in any form [6]. Model weights are not individually identifiable: they do not contain names, dates, identifiers, or any data element that could be linked to a specific individual. Furthermore, the training data itself was de-identified under Safe Harbor or Expert Determination before training began, meaning it was not PHI at the time of training.

The HHS Office for Civil Rights has not issued specific guidance on whether model weights constitute PHI. However, the structural argument is clear: if the input to training is not PHI (because it was properly de-identified), and the output of training is a lossy mathematical compression of population-level patterns, then the output is not PHI.

This position is consistent with the approach taken by the European Data Protection Board regarding AI model outputs under GDPR, which has recognized that trained model parameters generally do not constitute personal data when the training data has been properly anonymized [18].

---

## 5. The Synthetic Data Bonus

### 5.1 Population-Calibrated Data Generation

Every RonanLabs engagement delivers a synthetic data generator alongside the clinical AI model. This generator produces artificial patient records that statistically mirror the hospital's patient population without corresponding to any real individual. The generator is a byproduct of the same training process that produces the clinical model -- the model learns the joint distribution of clinical variables in the hospital's population and can sample from that distribution to produce new, synthetic records.

### 5.2 Use Cases

Synthetic data calibrated to a specific hospital's population unlocks capabilities that are difficult or impossible to achieve with real patient data:

- **Research:** Investigators can develop and test hypotheses on synthetic datasets without IRB approval requirements, accelerating the research cycle from months to days
- **Education:** Medical trainees can interact with realistic clinical scenarios derived from the institution's actual case mix, providing more relevant training than generic textbook cases
- **Algorithm development:** IT teams can build, test, and validate clinical decision support tools using synthetic data that reflects real-world distributions, then validate on a small sample of real data before deployment
- **Regulatory submissions:** Synthetic data can augment real-world evidence in FDA submissions for clinical decision support software, a use case explicitly acknowledged in the FDA's 2023 guidance on AI/ML-based software as a medical device [19]
- **Vendor evaluation:** When evaluating new EHR modules or clinical tools, hospitals can provide vendors with synthetic data instead of real patient data for testing and configuration

### 5.3 The Flywheel Effect

The synthetic data capability creates a self-reinforcing cycle. As the hospital generates and uses synthetic data, clinical teams identify additional use cases that require model refinement. Refined models produce better synthetic data. Better synthetic data enables more sophisticated applications. This flywheel reduces the marginal cost of each subsequent AI application because the foundational model and generator are already in place.

---

## 6. Deployment Models

### 6.1 On-Premise GPU Server

The most common deployment: a dedicated GPU server installed in the hospital's data center. Recommended configurations:

| Use Case | Model Size | Hardware | Approx. Cost |
|---|---|---|---|
| Departmental (single specialty) | 7B-8B | NVIDIA DGX Spark or RTX 6000 workstation | $5,000-$10,000 |
| Multi-department | 70B-72B | NVIDIA DGX Spark (quantized) or dual RTX 6000 | $10,000-$25,000 |
| Enterprise (full hospital) | 200B+ | NVIDIA DGX Station or HGX B200 | $50,000-$300,000 |
| Research + Clinical | 400B+ (LoRA) | NVIDIA DGX B200 or GB200 NVL2 | $200,000+ |

### 6.2 Hospital Private Cloud

For health systems operating private cloud infrastructure (VMware, OpenStack, Nutanix), models can be deployed as containerized services using NVIDIA NIM (NVIDIA Inference Microservices) or vLLM, integrated with the hospital's existing orchestration and monitoring tools.

### 6.3 Air-Gapped Environments

For maximum security, models can be deployed in fully air-gapped environments with no network connectivity. Model weights are delivered on encrypted physical media (hardware-encrypted USB drives or NVMe drives). The inference server operates without any network connection, accessed only via a local terminal or isolated internal network segment.

### 6.4 Inference Optimization

Deployed models are optimized for production inference:

- **Quantization:** FP16 models are quantized to INT4 or INT8 using GPTQ or AWQ methods, reducing memory requirements by 2-4x with minimal accuracy degradation (typically <1% on clinical benchmarks) [20]
- **Continuous batching:** vLLM's PagedAttention mechanism enables efficient handling of multiple concurrent requests, critical for clinical workflows where multiple clinicians query the system simultaneously [21]
- **Speculative decoding:** For latency-sensitive applications, a smaller draft model generates candidate tokens that the full model verifies, reducing time-to-first-token by 40-60%

---

## 7. Security Architecture

### 7.1 Threat Model

The architecture is designed to resist the following threat categories:

| Threat | Mitigation |
|---|---|
| Data breach during transfer | End-to-end encryption (TLS 1.3 + AES-256 at rest); hospital controls transfer timing |
| Unauthorized access to training data at RonanLabs | Isolated network segment, no inbound internet, encrypted storage, access logging |
| Model weight exfiltration during delivery | Checksummed delivery, tamper-evident packaging, chain of custody documentation |
| Membership inference on deployed model | DP-SGD training, membership inference testing, privacy budget enforcement |
| Reconstruction of training data from model | Lossy compression, differential privacy, no verbatim memorization testing |
| Insider threat at RonanLabs | Background checks, access controls, audit logging, data deletion after delivery |
| Supply chain attack on model weights | Signed model files, hash verification, deterministic build process |

### 7.2 Data-at-Rest Encryption

All de-identified data received by RonanLabs is stored on LUKS2-encrypted volumes with AES-256-XTS. Encryption keys are stored in a hardware security module (HSM) and are not accessible to training processes. Data is decrypted only during active training runs and only on the isolated training network segment.

### 7.3 Data-in-Transit Encryption

All data transfers use TLS 1.3 with perfect forward secrecy. SFTP transfers use Ed25519 or RSA-4096 key pairs generated specifically for each engagement. S3 transfers use AWS Signature Version 4 authentication with SSE-KMS encryption using customer-managed keys.

### 7.4 Chain of Custody

Every engagement produces a chain of custody document that records:

- Date and time of data receipt (with hash of received files)
- Personnel who accessed the data (names, roles, access times)
- Training runs executed (timestamps, configurations, logs)
- Date and time of data deletion (with verification)
- Delivery method and hash of delivered model weights

This document is signed by the RonanLabs engagement lead and provided to the hospital as part of the final deliverable package.

### 7.5 Data Deletion Protocol

After model delivery and hospital acceptance:

1. All de-identified training data is cryptographically erased (LUKS key destruction followed by secure overwrite)
2. Training logs that could contain data excerpts are deleted
3. Intermediate model checkpoints are deleted
4. Deletion is verified by an independent process that scans storage volumes for residual data
5. A deletion certificate is issued to the hospital with timestamps and verification hashes

The only artifacts retained by RonanLabs are: the engagement contract, the chain of custody document, the validation report, and aggregate (non-identifiable) training metrics for quality assurance purposes.

---

## 8. Case Study Framework

### 8.1 Typical Engagement Timeline

| Week | Phase | Activities |
|---|---|---|
| 1-2 | Scoping | Clinical use case definition, data requirements specification, de-identification plan review |
| 3-4 | Data preparation | Hospital IT executes de-identification, RonanLabs validates extract quality |
| 5-6 | Secure transfer | Data transferred via agreed mechanism, integrity verification |
| 7-10 | Training | Base model selection, LoRA fine-tuning, hyperparameter optimization |
| 11-12 | Validation | 6-layer validation pipeline, privacy audit, bias assessment |
| 13-14 | Delivery | Model weights delivered, synthetic data generator configured, deployment support |
| 15-16 | Deployment | On-premise installation, integration testing, clinician training |

Total timeline: 3-4 months for pilot, 5-6 months for full enterprise deployment.

### 8.2 Pilot Scope and Deliverables

The standard pilot engagement ($50,000) includes:

- One clinical use case (e.g., clinical documentation assistance, discharge summary generation, or clinical coding support)
- One model size (typically 7B-8B, suitable for single-department deployment)
- De-identification specification and validation support
- Fine-tuned model with LoRA adapters
- Synthetic data generator (same clinical domain)
- Full 6-layer validation report
- Deployment on hospital-provided hardware (RonanLabs provides configuration guidance)
- 30 days of post-deployment support

### 8.3 Success Metrics

Pilot success is measured against pre-defined metrics agreed during scoping:

- **Clinical accuracy:** Model performance on domain-specific benchmarks (e.g., ICD-10 coding accuracy, clinical note quality scores)
- **Clinician acceptance:** User satisfaction scores from clinicians interacting with the model during the pilot period
- **Throughput:** Inference latency and concurrent user capacity on the deployed hardware
- **Privacy validation:** Membership inference attack resistance at or below chance level
- **Operational stability:** Uptime and error rates during the pilot period

### 8.4 Common Use Cases

**Clinical Decision Support:** Models trained on institutional clinical notes and outcomes provide real-time diagnostic and treatment suggestions contextualized to the hospital's patient population, formulary, and care protocols.

**Clinical Documentation:** Models fine-tuned on the hospital's documentation style generate draft clinical notes, discharge summaries, and referral letters that match institutional conventions and reduce documentation burden.

**Clinical Coding:** Models trained on the hospital's historical coding patterns suggest ICD-10, CPT, and DRG codes with accuracy calibrated to the institution's coding practices and payer mix.

---

## 9. Comparison: Cloud AI vs. On-Premise AI

### 9.1 Side-by-Side Comparison

| Dimension | Cloud AI Service | RonanLabs On-Premise |
|---|---|---|
| Data exposure | PHI transmitted to vendor cloud | De-identified data only; no PHI leaves hospital |
| BAA required | Yes | No (no PHI handled) |
| Vendor network access | API keys, often VPN | None |
| Live EHR connection | Typically required | Not required |
| Ongoing data transmission | Every query sends data | Zero -- model runs locally |
| Per-query cost | $0.01-$0.10+ per query | Zero after deployment |
| Model customization | Limited (prompt engineering, RAG) | Full fine-tuning on institutional data |
| Vendor lock-in | High (proprietary APIs) | None (open-weight models, standard formats) |
| Internet dependency | Complete | None |
| Compliance burden | BAA + vendor risk assessment + ongoing monitoring | One-time engagement, no ongoing vendor relationship |
| Downtime risk | Vendor outage affects all customers | Hospital controls availability |

### 9.2 Total Cost of Ownership (5-Year)

**Scenario:** 500-bed hospital, clinical documentation AI, 200 concurrent users.

| Cost Category | Cloud AI | On-Premise |
|---|---|---|
| Initial setup | $25,000 | $50,000 (pilot) |
| Annual subscription/license | $500,000/year | $0 |
| Per-query costs (est. 2M queries/year) | $200,000/year | $0 |
| Full engagement (Year 1) | -- | $200,000 |
| Hardware (on-premise GPU server) | $0 | $15,000 |
| IT staff time (ongoing) | 0.25 FTE ($40,000/yr) | 0.1 FTE ($16,000/yr) |
| Compliance/legal (annual) | $30,000 | $5,000 |
| **5-Year Total** | **$3,875,000** | **$345,000** |

The on-premise model is approximately 91% less expensive over five years. The break-even point typically occurs within 6-9 months of deployment.

### 9.3 When Cloud AI Is Appropriate

Cloud AI services remain appropriate when:

- The use case does not involve PHI (e.g., administrative tasks, supply chain, facilities management)
- The hospital has already accepted cloud risk for other clinical systems and has mature vendor management processes
- Rapid deployment is prioritized over data sovereignty (e.g., pandemic response)
- The use case requires capabilities only available from frontier models (GPT-4-class and above) that cannot currently run on-premise

For any use case involving PHI where data sovereignty is a requirement, on-premise deployment eliminates the fundamental conflict between AI capability and data protection.

---

## 10. Conclusion

The architecture described in this paper resolves the central tension in hospital AI adoption. Hospitals want clinical AI trained on their data, reflecting their patient population, their clinical practices, and their care protocols. They cannot accept the data exposure that cloud-based AI services require.

By separating the problem into three phases -- hospital-controlled de-identification, isolated model training, and hospital-controlled deployment -- this architecture delivers institution-specific AI without creating a single new pathway for data exposure. The hospital's identified data never leaves its network. The de-identified extracts are encrypted in transit and at rest, processed on isolated infrastructure, and deleted after model delivery. The delivered model weights are mathematical representations of clinical patterns, not patient records, and are formally validated against privacy attacks before delivery.

The result is a hospital that operates its own clinical AI -- customized to its population, running on its hardware, with no ongoing cloud dependency, no per-query fees, and no vendor with standing access to its data. This is not a compromise between AI capability and data sovereignty. It is both, simultaneously.

Hospital CIOs evaluating clinical AI should ask one question of every vendor: *Where does the data go?* If the answer involves a cloud service, an API, or a vendor-hosted environment, the hospital is accepting residual data exposure risk. If the answer is "the data stays in your building, and we deliver model weights," then the hospital has achieved clinical AI without data exposure.

That is the architecture RonanLabs delivers.

---

## References

[1] McKinsey & Company. "The Economic Potential of Generative AI: The Next Productivity Frontier." McKinsey Global Institute, June 2023.

[2] American Hospital Association. "2024 Health Care AI Survey: Barriers to Adoption." AHA Center for Health Innovation, 2024.

[3] U.S. Department of Health and Human Services, Office for Civil Rights. "Breach Portal: Notice to the Secretary of HHS Breach of Unsecured Protected Health Information." https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf

[4] IBM Security. "Cost of a Data Breach Report 2023." IBM Corporation and Ponemon Institute, 2023.

[5] Hu, E.J., Shen, Y., Wallis, P., et al. "LoRA: Low-Rank Adaptation of Large Language Models." *arXiv:2106.09685*, 2021.

[6] U.S. Department of Health and Human Services. "Summary of the HIPAA Privacy Rule." https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html

[7] Progress Software. "MOVEit Transfer Critical Vulnerability (CVE-2023-34362)." Security Advisory, June 2023. See also: Emsisoft, "MOVEit Breach Impact Report," December 2023.

[8] 45 CFR 160.404. "Amount of a Civil Money Penalty." U.S. Code of Federal Regulations.

[9] U.S. Department of Health and Human Services, Office for Civil Rights. "Change Healthcare Breach Notification." 2024. See also: U.S. Senate Finance Committee, "Hearing on the Change Healthcare Cyberattack," May 2024.

[10] HCA Healthcare. "Notice Regarding Data Security Incident." July 2023. https://hcahealthcare.com/data-security-incident/

[11] Cerebral, Inc. "Notice of HIPAA Privacy Breach." March 2023. OCR Breach Report Reference.

[12] 45 CFR 164.514(b). "Implementation Specifications: Requirements for De-Identification of Protected Health Information." U.S. Code of Federal Regulations.

[13] Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L. "QLoRA: Efficient Finetuning of Quantized Language Models." *arXiv:2305.14314*, 2023.

[14] Abadi, M., Chu, A., Goodfellow, I., et al. "Deep Learning with Differential Privacy." *Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security*, 2016.

[15] De, S., Berrada, L., Hayes, J., et al. "Unlocking High-Accuracy Differentially Private Image Classification through Scale." *arXiv:2204.13650*, 2022.

[16] Shokri, R., Stronati, M., Song, C., Shmatikov, V. "Membership Inference Attacks Against Machine Learning Models." *IEEE Symposium on Security and Privacy*, 2017.

[17] Carlini, N., Chien, S., Nasr, M., et al. "Membership Inference Attacks From First Principles." *IEEE Symposium on Security and Privacy*, 2022.

[18] European Data Protection Board. "Guidelines 06/2024 on the Application of GDPR to AI Model Training." Draft for public consultation, 2024.

[19] U.S. Food and Drug Administration. "Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence/Machine Learning-Enabled Device Software Functions." Guidance Document, 2023.

[20] Frantar, E., Ashkboos, S., Hoefler, T., Alistarh, D. "GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers." *arXiv:2210.17323*, 2022.

[21] Kwon, W., Li, Z., Zhuang, S., et al. "Efficient Memory Management for Large Language Model Serving with PagedAttention." *Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles*, 2023.

---

*Copyright 2026 RonanLabs. All rights reserved.*

*Contact: ronan@ronanlabs.ai | ronanlabs.ai*