How to work with large number of small files
Jump to navigation
Jump to search
Background
Mounting Zip-arichives with fuse-zip
Please note, that mounting an archive is a system level operation, it must be properly completed to keep the compute nodes on ARC healthy. If you mount an archive on ARC it must be properly unmounted afterwards.
If you mount an archive in your job script, you have to unmount it in the same job script and actually wait until the operation is complete.
The possible complication here is that that the unmount operation takes time and the command tries to do this in the background, that is it returns control to the script before its completion. Once the job script is finished, SLURM will kill all the processes that belong to job's user, it will kill the unmount process as well, leaving the mount point occupied.
# Get a brief help.
$ fuse-zip --help
....
# Mount the archive as a file system.
$ fuse-zip -r archive.zip mountpoint
# Use the data inside
$ ls mountpoint/
$ ...
# Unmount the archive.
$ fusermount -u mountpoint
/dev/shmem
From Python
Working with files inside a TAR archive
tarfile
module:
- https://www.askpython.com/python-modules/tarfile-module
- https://stackoverflow.com/questions/27220376/python-read-file-within-tar-archive