If you remember the days before Google docs, when you could work for hours on a paper only to have it vanish if your computer shut down and you hadn't saved it, you know firsthand the pain of losing work you thought was safe. Notebooks are wonderful for interactive data analysis, but there are a few quirks that can trip you up if you're not careful!
This article describes how to avoid two potential pitfalls when working in a Jupyter notebook:
- What and when you need to save so you don't lose parts of your analysis unintentionally
- How to avoid overwriting a collaborator's edits
Note: For a deeper dive into the back end of a Terra notebook and to understand why notebooks have these characteristics, see this article about key notebook components or this article about key notebook operations.
How to not lose output files
The key is to understand is that files generated by the notebook are not automatically saved in the Workspace. In particular, you will lose output data generated in a notebook if you delete or reconfigure a cluster without explicitly saving your output to the workspace bucket.
You will not lose your data if you pause (stop) a cluster, since the cluster goes away, but the persistent disk does not. In fact, when you re-open your notebook, the cluster creates more quickly as the disk does not need to be recreated. As an added bonus, you do not need to reinstall your software.
To avoid losing your data, make sure to explicitly save your outputs in the workspace bucket. You can find more information on how to do this within the notebook in this article.
Your notebooks and any data explicitly saved to your bucket are still in long term storage in the workspace bucket. This means you can rerun the notebook to regenerate any output data (though you will pay for this, of course).
How to not lose collaborator edits
If you are working on a notebook in a shared workspace at the same time as a colleague, there is a risk of overwriting each other’s work. This is because though your work in your cluster is private, when the notebook autosaves, it sends a copy to the shared workspace.
To avoid overwriting a colleagues edits, copy a notebook to a sandbox workspace or duplicate it with a unique name before editing.
The diagrams below illustrate what happens when two users edit the same notebook in a shared workspace:
User 1 saves the notebook they are working on in their cluster (or it autosaves):
User 2 saves the notebook they're working on (the same notebook), overwriting User 1's edits: