MedML: Patient and Data — Part 1

Data is a window for understanding your patient

3 min readJan 12, 2021

Observation is at the core of a physician’s job. From the questions we ask when we first meet, to the electrolytes we collect from the patient — it’s all observation towards the goal of healing.

Data is a fancy word for ‘structured observations’, a way to consistently and systematically assess. Machine learning (ML) is all about finding patterns, whether simple or complex, in data using modern computers.

Follow along with our MedML Workshop (video) and Interactive Notebook (https://bit.ly/3hhXqCB).

In this article we’ll give a brief introduction to data, the ecosystem it lives in, and where ML helps us as physicians.

Where does data come from? (System and Measurement)
What is our data telling us? (Inference)

Where does data come from?

Good data never comes from thin air — it always comes from something. Let’s call that your system, something you’re interested in because your patient has complaints there.

Importantly, you get to define the system based on what you care about right now: the heart, the lung, the heart+lungs, the social factors, emotional state, or even all of them together. When you’re collecting data, you’re collecting it to understand your system better.

If we’re good doctors we can’t hold the system we’re interested in our hands — we have to find some other way to measure it. And, almost always, that process is messy.

Measurement is messy. We almost never measure just the system we care about, we get interference and we call it “noise”.

Messiness comes in the form of noise and bias. As long as our measurements aren’t too messy though, we might be able to see through the noise and focus on what we care about…

What is our data telling us?

Trying to go from our data to a true understanding of our system is called inference. Inference is the main goal of diagnosis in medicine: how do we figure out the disease our patient has from the signs+symptoms we measure.

For example, if I measure your heart rate and ECG as data, I might be trying to infer whether you’re having a heart attack — an important behavior between your heart, your blood, and your body getting oxygen!

Inference goes from data to understanding the system. Machine learning helps us find patterns — both simple and complex. We can even do it with lots of noise!

Inference is also important in science, but science uses experiments and large sample sizes to generate close-to-perfect data — this makes inference easier with simpler tools. But can be very slow and, more importantly, requires experiments that may be impossible or unethical in patients.

In medicine, we have to treat the patients where they are, not run a perfect experiment, so we need more sophisticated tools to achieve our inference.

Where ML comes in…

ML is a modern tool for inference that lets you find patterns in the data even if the measurement was far from perfect. It does this by leveraging computers to build and validate models. These models, like hypotheses, try to capture the patterns in our data that came from our system.

Before we use ML on data to help patients, we have to have a clearer idea of where our data came from so we can use the right ML to focus on what we care about and ignore the noise and biases.

The full circle

We’re taught in med school “treat the patient, not the numbers”. This means acknowledging that data, even ‘big data’, can’t completely capture our patients. Our job is to look beyond the data to the systems generating it — as physicians we’re in a unique position to apply ML thoughtfully.

Of course, how do you know your pattern isn’t just an illusion? That’s where we can use techniques like training/testing splits to see how well our patterns hold. We’ll talk about that in Part 2.