About

EHR

  • most popular are MIMIC and now eICU

FHIR

  • What is it
  • I chose this because it is a canonical format

Synthea

  • What is it + how-tos
  • I used varying sizes - all MA for now
  • Data dictionary

Labels

  • Most models use in-patient mortality, readmission, prolonged length of stay etc because they are hospitalization or ICU datasets
  • I chose to predict chronic conditions instead for the following reasons
    • Given the nature of Synthea dataset (i.e. simulating standard (all) events over a lifetime not just hospitalization)
    • Also given chronic conditions account for the majority of US healthcare costs

Cleaning

  • First split by patient ids
  • So as to isolate other records based on pt ids ## Other cleaning Mostly standard stuff, standardizing column names etc. Dropped Encounters - Code Tables To create vocabularies Identifying START and STOP Observations is a little more complex than the rest

Inserting Age

  • Years and months for now (given the nature of the Synthea dataset)
  • Hours or Days as age are possible in more granular data - e.g. ICU or hospitalization
    • where we are trying to predict outcomes within say 24 or 48 hours after admission
    • Also as we'll see in upcoming posts, age allows some flexibility
      • Initially everything started at age 0
      • With a little change, I am now able to get any arbitrary age span - say month 24 to month 104 or 20 to 40 years

Extracting Labels

  • Extract them from the conditions df and put in patients df for ease of use later

Creating Vocab

  • A note about EmbeddingBag and Embedding
    • The difference
    • The idea of representing a time period with EmbeddingBag (as described in the Google paper)
  • Implementation - Vocab classes
    • EhrVocab class
    • ObsVocab class - Observations vocab is special
    • Demographics vocab is different
    • Vocablist class
  • Tried to use fastai Vocab, but this required quite a bit of customization, so wrote another one on similar lines
  • Emb Matrix Dimensions - a convenience fn to get the dimensions as this is needed during creation of the models