Inquiry related to the size of the Mimic-CXR V2.0.0 dataset #1474
Replies: 4 comments 3 replies
-
You can download a subset of records, e.g. just data for 100 subjects. I think clever use of edit: it wasn't |
Beta Was this translation helpful? Give feedback.
-
"** MIMIC-IV-CXR is over 4.7 TB, almost entirely due to the size of the DICOMs. Users should strongly consider not downloading the data, and instead using it within Google Cloud Platform (GCP), which we support natively. GCP does not charge for data transfer within a region in GCP (see this page for more details about network charges.)." I have access to the MIMIC-IV-CXR dataset on Google Cloud Storage but I am having issues in accessing the data. I am not sure how can I directly READ from the dataset stored at (https://console.cloud.google.com/storage/browser/mimic-cxr-2.0.0.physionet.org). $ !gcloud storage ls gs://mimic-cxr-2.0.0.physionet.org/
$ !gsutil ls gs://mimic-cxr-2.0.0.physionet.org/
|
Beta Was this translation helpful? Give feedback.
-
PhysioNet covers storage costs for datasets, but is unable to cover all compute/usage costs for the research community. We therefore use the Requestor Pays option on Google Cloud. The error message you are seeing indicates that the Google Cloud Storage bucket you are trying to access is set up as a "Requester Pays" bucket. This means that the requester (in this case, you) must provide a billing project to be charged for the data access and egress fees. To fix this error, you need to specify your billing project when using the gsutil command. You can do this by adding the -u flag followed by your project ID. Here's how you can modify your command:
Replace |
Beta Was this translation helpful? Give feedback.
-
Hi, @alistairewj @tompollard, can I use split csv "mimic-cxr-2.0.0-split.csv" and download only those jpeg images that belong to the test dataset using the wget command from mimic cxr jpg version? There are almost 5000 jpeg files. If it is possible can you share the command? |
Beta Was this translation helpful? Give feedback.
-
I am working with the Mimic-CXR dataset. I have already gained access. The problem is with size. The dataset size is 4.6 TB. Thus it becomes difficult to download the data. Is there any way where we can use a subset of the data in our local machine? Or if there is any other suggestion that you can help me with? Thank you.
Beta Was this translation helpful? Give feedback.
All reactions