Table of Contents

Title: Publicly Available Clinical BERT Embeddings

Authors: Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott

Published: Apr 6 2019

Link: https://arxiv.org/abs/1904.03323

Summary:

  • MIT x Microsoft​
  • 2 million notes in the MIMIC-III v1.4 ​
  • database (Johnson et al., 2016)​

Summary (Generated by Microsoft Copilot):

Introduction:

  • The paper introduces Clinical BERT models for clinical text, addressing the lack of publicly available pre-trained BERT models in this domain.

Challenges:

  • General BERT models are not optimized for clinical narratives, which have unique linguistic characteristics.

Methods:

  • Two BERT models were trained: one on all clinical notes and another specifically on discharge summaries using the MIMIC-III database.

Novelties:

  • Release of domain-specific BERT models for clinical text, demonstrating improvements over general BERT and BioBERT.

Results:

  • Clinical BERT models showed performance improvements on three clinical NLP tasks but not on de-identification tasks.

Performances:

  • Achieved state-of-the-art accuracy on MedNLI and improved performance on i2b2 2010 and 2012 tasks.

Limitations:

  • The models did not improve de-identification tasks due to differences in text distribution.

Discussion:

  • The study highlights the benefits of domain-specific embeddings and suggests further research with more advanced models and diverse datasets.