Error from copying a large file
I try to copy about 900MB file to Notebook. However, I got this error message below.
Even after installing the crcmod using this command "pip install --no-cache-dir -U crcmod", I got the same error message.
Do you know how I can fix this?
Thanks,
Seung Hoan Choi
Copying gs://fc-306d0fc4-2f1d-4ea1-ae29-ea5b8fe0cb22/genotype/freeze8_gds/freeze.8.chr21.pass_only.phased.gds... ==> NOTE: You are downloading one or more large file(s), which would run significantly faster if you enabled sliced object downloads. This feature is enabled by default but requires that compiled crcmod be installed (see "gsutil help crcmod"). CommandException: Downloading this composite object requires integrity checking with CRC32c, but your crcmod installation isn't using the module's C extension, so the hash computation will likely throttle download performance. For help installing the extension, please see "gsutil help crcmod". To download regardless of crcmod performance or to skip slow integrity checks, see the "check_hashes" option in your boto config file. NOTE: It is strongly recommended that you not disable integrity checks. Doing so could allow data corruption to go undetected during uploading/downloading.
ThereforeI
Comments
9 comments
Hello Seung Hoan -
What is the configuration for your notebook cluster? I believe that by default it creates a cluster with 500 MB of storage. You might want to recreate the cluster with more storage.
My apologies, I was confounding disk size and storage size.
Hi Adelaide,
I found that I used 500GB of the storage. I tried with 1TB, but this does not work
Thanks,
Hi Seung Hoan Choi - What task are you trying to complete in the notebook? I a wondering if this larger file could be processed to make it smaller?
Otherwise, it may be possible to resize the notebook cluster using the Swagger API, but I will have to check with the workbench team to see what is the upper limit for our notebooks.
Hi Seung Hoan Choi - I was wondering what values you see in your cluster set up?
Hi Adelaide,
I used 4 CPUs, 1000GB disk size, and 15GB memory. It is not clear for me whether this is the memory issue
Thanks,
Hello Seung Hoan -
Your download should work, could you please post the command that you were using to do the download?
Adelaide
Hi Adelaide
Below is the script and response from the termial.
Thanks,
Seung Hoan
jupyter-user@saturn-d498b7e3-cefa-43e6-ba42-e9cb768e44e4-m:~$ gsutil cp gs://fc-306d0fc4-2f1d-4ea1-ae29-ea5b8fe0cb22/genotype/freeze8_gds/freeze.8.chr21.pass_only.phased.gds .
Copying gs://fc-306d0fc4-2f1d-4ea1-ae29-ea5b8fe0cb22/genotype/freeze8_gds/freeze.8.chr21.pass_only.phased.gds...
==> NOTE: You are downloading one or more large file(s), which would
run significantly faster if you enabled sliced object downloads. This
feature is enabled by default but requires that compiled crcmod be
installed (see "gsutil help crcmod").
CommandException:
Downloading this composite object requires integrity checking with CRC32c,
but your crcmod installation isn't using the module's C extension, so the
hash computation will likely throttle download performance. For help
installing the extension, please see "gsutil help crcmod".
To download regardless of crcmod performance or to skip slow integrity
checks, see the "check_hashes" option in your boto config file.
NOTE: It is strongly recommended that you not disable integrity checks. Doing so
could allow data corruption to go undetected during uploading/downloading.
Hi Adelaide,
I tried a different file with a larger file size. I worked !!! I think there is something wrong with this file.
I will create a new version of this file and try it again.
Thanks for help
Seung Hoan
Great! I will close the ticket for now.
Please sign in to leave a comment.