Publicly Available Clinical BERT Embeddings

Table of Contents

Title: Publicly Available Clinical BERT Embeddings

Authors: Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott

Published: Apr 6 2019

Summary:

Summary (Generated by Microsoft Copilot):

Introduction:

The paper introduces Clinical BERT models for clinical text, addressing the lack of publicly available pre-trained BERT models in this domain.

Challenges:

General BERT models are not optimized for clinical narratives, which have unique linguistic characteristics.

Methods:

Two BERT models were trained: one on all clinical notes and another specifically on discharge summaries using the MIMIC-III database.

Novelties:

Release of domain-specific BERT models for clinical text, demonstrating improvements over general BERT and BioBERT.

Results:

Clinical BERT models showed performance improvements on three clinical NLP tasks but not on de-identification tasks.

Performances:

Achieved state-of-the-art accuracy on MedNLI and improved performance on i2b2 2010 and 2012 tasks.

Limitations:

The models did not improve de-identification tasks due to differences in text distribution.

Discussion:

The study highlights the benefits of domain-specific embeddings and suggests further research with more advanced models and diverse datasets.