Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if JupyterLab doesn't cleanup temporary files #338

Open
tkilias opened this issue Oct 23, 2024 · 2 comments
Open

Investigate if JupyterLab doesn't cleanup temporary files #338

tkilias opened this issue Oct 23, 2024 · 2 comments
Assignees
Labels
bug Unwanted / harmful behavior

Comments

@tkilias
Copy link
Collaborator

tkilias commented Oct 23, 2024

Situation

  • We got a report that the disk of AI-Lab ran full and that there were many temporary directories (named tmp...) in /tmp
  • The content of these directories was some CSV files and some Jupyter files (maybe *.ipynb file, not completely clear)
  • The disk was 1 TB in size, and it was full after an uptime of around 20 days with usage of AI-Lab
  • The AI-Lab Version 2 was started as EC2 instance from our AMI

Tasks

  • Start Docker-AI-Lab
    1. Scenario
    • Check content and size of /tmp directory
    • Login into AI-Lab multiple times
    • Check content and size of /tmp directory
    1. Scenario
    • Check content and size of /tmp directory
    • Start Docker-DB
    • Run scikit-learn notebooks
    • Check content and size of /tmp directory
  1. Scenario
    • Check content and size of /tmp directory
    • Upload files
    • Check content and size of /tmp directory
  2. Scenario
    • Check content and size of /tmp directory
    • Upload big files and close browser tab
    • Check content and size of /tmp directory
  • Repeat with EC2 instance from our AMI if nothing comes up.
  • Keep EC2 running for several days if nothing came up before
@tkilias tkilias added the bug Unwanted / harmful behavior label Oct 23, 2024
@ahsimb ahsimb self-assigned this Oct 23, 2024
@ahsimb
Copy link
Collaborator

ahsimb commented Oct 24, 2024

I ran our notebooks while monitoring the content of the /tmp directory.
With regards to the files appearing in this directory, I have noticed the following:

  • /tmp/itde-ssh-access.lock (0 bytes)
  • /tmp/tmpxxxxxxxx/_remote_module_non_scriptable.py (2,355 bytes)
  • /tmp/tmpxxxxxxxx/__pycache__/_remote_module_non_scriptable.cpython-38.pyc (1,466 bytes)

The first one is a lock used by the ITDE.
The last two are the result of running the below two functions by the Transformer Extension when uploading a model using CLI.

AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir, **kwargs)
AutoModel.from_pretrained(model_name, cache_dir=cache_dir, **kwargs)

The presence of a big file, circa 200K, was noticed when a lengthy procedure was abandoned and the notebook closed. At that point, the underlying Python session somehow saved its state in a temp file. When the notebook was reopened it was re-connected to this same session. It was apparent since the notebook still couldn't produce any output. After restarting the kernel the notebook became operational again and the temporary file disappeared from /tmp.

Uploading a file into the JupyterLab didn't produce any trace in '/tmp`.

@ahsimb
Copy link
Collaborator

ahsimb commented Oct 25, 2024

I also did a very quick experiment with the AMI edition of the AI Lab.
I have logged in, configured the system to use the Docker-DB, started the Docker-DB, and created a schema. Then I wanted to run one of the scikit-learn notebooks. I began uploading the data, but at that point the inadequacy of the selected EC2 instance type (I chose the recommended t2.small) kicked in - the JupyterLab became unresponsive. At the end, I had to abandon the experiment.
I was monitoring the /tmp while the system remained operational and didn't notice any leftover files apart from the already mentioned lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unwanted / harmful behavior
Projects
None yet
Development

No branches or pull requests

2 participants