Difference between revisions of "How to work with large number of small files"

From RCSWiki
Jump to navigation Jump to search
Line 8: Line 8:
 
If you mount an archive in your job script, you have to unmount it in the same job script and actually wait until the operation is complete.
 
If you mount an archive in your job script, you have to unmount it in the same job script and actually wait until the operation is complete.
  
The possible complication here is that that the unmount operation takes time and  
+
The '''possible complication''' here is that that the unmount operation takes time and  
 
the command tries to do this in the background, that is it returns control to the script before its completion.
 
the command tries to do this in the background, that is it returns control to the script before its completion.
 
Once the job script is finished, SLURM will kill all the processes that belong to job's user,  
 
Once the job script is finished, SLURM will kill all the processes that belong to job's user,  

Revision as of 18:30, 4 October 2022

Background

Mounting Zip-arichives with fuse-zip

Please note, that mounting an archive is a system level operation, it must be properly completed to keep the compute nodes on ARC healthy. If you mount an archive on ARC it must be properly unmounted afterwards.

If you mount an archive in your job script, you have to unmount it in the same job script and actually wait until the operation is complete.

The possible complication here is that that the unmount operation takes time and the command tries to do this in the background, that is it returns control to the script before its completion. Once the job script is finished, SLURM will kill all the processes that belong to job's user, it will kill the unmount process as well, leaving the mount point occupied.

# Get a brief help.
$ fuse-zip --help
....

# Mount the archive as a file system.
$ fuse-zip -r archive.zip mountpoint

# Use the data inside
$ ls mountpoint/
$ ... 

# Unmount the archive.
$ fusermount -u mountpoint

/dev/shmem

From Python

Working with files inside a TAR archive

  • tarfile module:
https://www.askpython.com/python-modules/tarfile-module
https://stackoverflow.com/questions/27220376/python-read-file-within-tar-archive

Links

How-Tos