Pyratefacts (read py-artifacts) is a very simple library. There are options which are usually linked to some proprietary version or just do half of the job.
Everybody needs sometime to download a file, check if it’s ok, usually uncompressed it, place somewhere and clean downloaded files. We all do this kind of stuff. And we may use this strategy as some strategy por postpone some build steps, add extra resources to your application or library that may not be used by everybody, or reduce container sizes at container hub repo. Pyratefacts is a python library that can be easily integrated with a python application or used in a script for setting things up.
pip install pyratefacts
Create an json files with a list of artifacts you wish to download within $“artifacts”$ entry. Optionally you can specify a $“datafile”$ which is basically a file which stores if you have already downloaded that artifact or not. Very useful if you are going to embedded pyratefacts in other application or library and you want to manage which artifacts are locally available.
{
"artifacts" : [
{
"name": "SomeLargeFileOverInternet",
"origin": "http",
"url": "http://a_strange_place_in_the_internet/a_very_big_file.zip",
"hash_type" : "sha256",
"digest" : "5B3BD68C4F0639446C6E6C2E2B0018BAB3D3FFE387081A6030C9EF505115B433",
"destiny" : "/tmp/a_very_big_file.zip",
"uncompress": true,
"uncompress_dir": "/tmp/my_uncompressed_files_folder/",
"teardown": "clear_destiny"
}
],
"datafile" : "/tmp/datafile.pkl"
}
To download your artifacts you will need a $pyratefacts.Manager$ objects. The two options to download artifacts are: - Download each one by name with the name specified in your $artifacts.json$ - Download them all
import pyratefacts
artifacts_json_path = '/tmp/artifacts.json'
manager = pyratefacts.Manager(artifacts_json_path)
# download one file by artifact name
manager.prepare('SomeLargeFileOverInternet')
# download all files in artifacts.json
manager.prepare_all()
# store the current state of files as view by the manager
manager.save_datafile()
# Query if a artifact is already loaded
for art in manager.artifact_list:
if art.name == name:
return art.available
print('done')
If datafile exists when artifacts.json is loaded, it will retrieve the last state saved.
Pyratefacts is available in my github repository: https://github.com/fsan/pyratefacts Feel free to contribute.
That’s it. Pretty simple I guess. There are lots of space for improvements, such as supporting S3, Blob and other remote storage systems. If you have any suggestions or find any issues, leave an issue in my github repo for this project or make some pull request. See you!