Questions concerning DCM Data

Hello. I am currently looking into the dicom data that was included in the fastMRI dataset.
The paper does not go into detail about this data, only mentioning that they are there to give a more diverse variety of image. I would like to ask for more information on the dicom data. I would especially like to know about how this data was organized.

First, are there are serious differences between batch 1 and batch 2?
Why does batch 1 have a folder named “knee_mri_clinical_seq” while batch 2 does not?
Should we just treat the two different batches as equivalent to the training and validation sets?

Second, what is the meaning of the file organization? The files are organized in weirdly named folders such as “1FB_1001820591____1FB,_3331562518”. Also, these folders have subfolders of “study_2f43b031” and the like, which also have subfolders with names such as “MR4_53c76c27”.
These folders contain dicom images which obviously belong to a single volume.

However, on inspection with MATLAB, I have found that the different MR folders do not have the same images with different acquisition methods. They have very different looking images and also have different image sizes. I cannot tell if they are the same knees acquired with different acquisition patterns or just knees of different people.

My second question can be summarized as follows. What is the meaning of organizing seemingly unrelated files into single folders and what is the meaning of the names of these folders? The names of the innermost folders with names beginning with “MR” appear to indicate different acquisition methods but I am not at all sure of this.

Finally, are the dicom data included in the “NYU dataset” for use in the challenge? In the submission form, there is a section where we must check for “NYU data only”. Does using this dicom data count as using NYU data only?

It’s complicated. The DICOMS are provided ‘as is’ for use as an auxiliary data set not requiring careful curation. Analogously, large scrapes of images from the web have proved helpful in boosting performance on otherwise fully supervised image classification. Perhaps use of an unstructured, unlabeled data set such as the DICOMS could help similarly.

The DICOMs are included in the “NYU dataset”.

If you want further information please contact NYU directly at

Thank you. I will ask NYU for additional information.

I would like to ask one further question. Is the use of pre-trained models for part of the loss allowed?

For example, I had one model where a pre-trained VGG was used for part of the GAN loss. I did not use this model for the challenge but I would like to know if it is allowed.

You’re still allowed to make a submission by using a pre-trained model on a different dataset. Just don’t tick the “NYU dataset” option when in the submission form and it’ll be reflected in the leaderboard.