Publicly Available Clinical BERT Embeddings
Table of Contents
Title: Publicly Available Clinical BERT Embeddings
Authors: Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott
Published: Apr 6 2019
Link: https://arxiv.org/abs/1904.03323
Summary:
- MIT x Microsoft
- 2 million notes in the MIMIC-III v1.4
- database (Johnson et al., 2016)
Summary (Generated by Microsoft Copilot):
Introduction:
- The paper introduces Clinical BERT models for clinical text, addressing the lack of publicly available pre-trained BERT models in this domain.
Challenges:
- General BERT models are not optimized for clinical narratives, which have unique linguistic characteristics.
Methods:
- Two BERT models were trained: one on all clinical notes and another specifically on discharge summaries using the MIMIC-III database.
Novelties:
- Release of domain-specific BERT models for clinical text, demonstrating improvements over general BERT and BioBERT.
Results:
- Clinical BERT models showed performance improvements on three clinical NLP tasks but not on de-identification tasks.
Performances:
- Achieved state-of-the-art accuracy on MedNLI and improved performance on i2b2 2010 and 2012 tasks.
Limitations:
- The models did not improve de-identification tasks due to differences in text distribution.
Discussion:
- The study highlights the benefits of domain-specific embeddings and suggests further research with more advanced models and diverse datasets.