Overview
Teaching: 15 min Exercises: 1 minQuestions
What is image processing and how is it useful in neuroimaging?
How are images represented in scientific computing?
Objectives
Explain the role of image processing in neuroimaging
Identify image data, and distinguish it from other data (e.g., tabular, time-series, etc.)
Extract image data from neuroimaging file formats with nibabel
Slice and stride through data arrays with numpy indexing operations
Visualize image data with Matplotlib
Image processing is a large and very general set of tools that are used across a variety of research disciplines to analyze image data. Naturally, image processing algorithms are fundamental to neuroimaging, because a lot (if not all) the data that we analyze in neuroimaging is image data.
What is image data? How is it different from other data, such as time-series, or tabular data?
For our purposes: image data is defined as multi-dimensional homogenous data in which spatial relationships matter. That is, neighboring pixels are treated differently than pixels from disparate parts of the array. Spatial contiguity is meaningful. Usually, image data will have 2 or 3 dimensions, corresponding to the 3 spatial dimensions or 2D projection: either from a specific view-point (think photographs) or through a 3D object (think slice). But it is possible to use image processing algorithms in cases in which there are more dimensions, and where the dimensions do not correspond to the spatial dimensions (does anyone know a good example of this?).
What is a good example of pseudo-image data?
- In radio-astronomy, Fourier spectra are analyzed using image-processing methods
- In a project recently undertaken in the University of Washington eScience Institute Data Science Incubator Program, Siva Kasinathan used image-processing algorithms to characterize and < reassemble sequences from human chromosome centromeres.
Note that these categories are also not mutually exclusive. For example, functional MRI data is image data, but is also time-series data at the same time.
There are many different kinds of image processing operations. Here are a few common operations:
Because of their nature (homogenous/spatial dimensions matter) data lend themselves easily to a representation as arrays. Let’s demonstrate this with some data from the Human Connectome Project.
Downloading data from the Human Connectome Project
The Human Connectome Project provides high-quality functional, structural and diffusion MRI data. These can be accessed through AWS Simple Storage Service, or “S3”. This allows us to programatically download the data, through a Python library called
boto3
. For the following code to work, you need to have a file ‘~/.aws/credentials’, that includes a section:[hcp] AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXX AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXX
The keys are credentials that you can get from Human Connectome Project
In addition, you’ll need to install
boto3
. This can usually be done with the following command-line call:pip install boto3
First, we download the data to our computer from S3 using boto3
.
import boto3
boto3.setup_default_session(profile_name='hcp')
s3 = boto3.resource('s3')
bucket = s3.Bucket('hcp-openaccess')
subject = 991267 # We can replace that with other subject IDs!
bucket.download_file('HCP/%s/T1w/T1w_acpc_dc.nii.gz'%subject,
'%s-T1w_acpc_dc.nii.gz'%subject)
After running this code, we should have T1-weighted MRI scan of this subject
stored in the file 991267-T1w_acpc_dc.nii.gz
. We can read this file into
memory using the nibabel
library.
Nibabel: accessing a cacophony of neuroimaging file formats
One of the challenges of data science in neuroimaging (and in other scientific fields) is the range of different file formats that are used to store data. Often these files will be opaque to a naive user, because the data is stored in a binary format, that cannot be read without knowledge of the organization of the data on disks.
The
nibabel
library alleviates these difficulty through a careful implemntation of a wider range of different neuroimaging file-formats. Wherever possible, the library presents a common interface to these different file formats, making it particularly easy to write code that will work on data stored in these different formats.To install it, you can use the following command-line call:
pip install nibabel
The nibabel
API for reading data from file has two steps:
import nibabel as nib
T1w_img = nib.load('991267-T1w_acpc_dc.nii.gz')
Because nibabel
loads the data “lazily”, the data hasn’t been read into memory
yet, only some basic metadata stored in the file header. To access the data, we
need to explicitly call the get_data
method of the image object that we
currently have in memory:
T1w_data = T1w_img.get_data()
The data is stored in a numpy
array. We can verify that by running:
type(T1w_data)
We can check some basic properties of this array by running:
T1w_data.shape
T1w_data.dtype
We can also visualize the data that was stored in the file using Matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(1)
ax.matshow(T1w_data[:, :, T1w_data.shape[-1]//2])
The nibabel header
Explore the ‘T1w_img’ object. How would you extract information about the parameters used to collect data? What information is missing?
The nibabel header
Information about the acquisition can be accessed using the image header:
hdr = T1w_img.get_header()
For example:
affine = hdr.get_zooms()
will usually provide the dimensions of the voxel (how do we know the units?)
Some information might be missing from the file header (or not make sense). For example, try running:
hdr.get_n_slices()
Affine transforms
The nibabel image header also contains the affine transformation between the image and a standard space (usually the scanner iso-center in mm). For more information on how and why this information is used, you might want to refer to this excellent tutorial in the nibabel documentation.
Key Points
Image processing operations are a central part of neuroimaging.
Images are homogenous arrays, where spatial relationships are important
Many different operations can be performed on images, and processing pipelines can be build combining them
Images can be efficiently and usefully represented as arrays
Arrays can be manipulated with numpy operations, and visualized using Matplotlib