I am likely to suspend further work on this tool.
For PB-scale data onboarding onto Filecoin network, please consider using the Singularity Tool tool instead. Singularity tool should be the optimal client-side data preparation tool, moving forward.
Utility to perform packaging of files for Filecoin deals. Performs: file encryption, large file splitting, and generation of CAR files, in preparation of data storage movement. After data retrieval from Filecoin, performs: CAR file extraction, large file reassembly, file decryption.
Client-side tool for packaging large potentially proprietary data sets into the Filecoin network. Objective is to reduce friction for data movement, across both online deal and offline deal paths.
- Standardization of packaging toolset for large proprietary data sets scenarios.
- Removes lower-level undifferentiated heavy lifting, that the Filecoin ecosystem can reuse in multiple contexts such as "Data storage broker", "Data storage concentrator" e.g. Estuary, Sneakernet provider, Data Client using DIY offline path.
Packing:
- Split large files
- Encrypt files (RSA-AES asymmetric)
- Pack into CAR set
- Parallelism
Unpacking:
- Unpack from CAR set
- Decrypt files
- Join split files into original file
- Restore files into filesystem.
Current supported filesystems:
- POSIX NFS / DASD file system.
Future/Backlog options to support additional sources, particularly cloud object storage:
- Amazon S3, and S3-compatible cloud object storage.
- Azure Blob Storage
- Google Cloud Storage
- Alibaba Cloud Object Storage Service
- Huawei Cloud Object Storage Service
Cryptographic methods:
- RSA AES CBC (keypair)
Cryptographic methods TODO:
- RSA AES CBC (symmetric)
- GnuPGP
Initial testing suggests Packer packing rate on 1 instance of AWS EC2 r5d.2xlarge (8 vCPU, 64GB memory, 1x300GiB NVMe), EFS input with 100GB files, output to EBS, RSA-AES encryption, was approximately 120GiB/hr packing rate (2.88TiB/day).
If you more interested in automation to generate a CAR set from AWS EFS or AWS S3, a convenience, view the quickstart docs is available to launch an EC2 appliance, configure packer, execute a packing job, and host the packed CAR set on a web server.
To get started,
- Launch the quick-start stack into your AWS console using the CloudFormation template
- Convenience: Launch the quick-start into AWS Singapore Region
usage: python packer.py [--pack|--unpack] [-s SOURCE_PATH] [-t TEMP_PATH] [-o OUTPUT_PATH] [-b BIN_SIZE] [-k ENCRYPTION_KEY]
Filecoin filesystem packager/unpackager
options:
-h, --help show this help message and exit
-p, --pack Pack mode
-u, --unpack Unpack mode
-s SOURCE, --source SOURCE
In Pack mode, the path to the original source data. In Unpack mode, the path containing CAR files.
-t TMP, --tmp TMP Path to temporary staging directory. Currently, required temp size > 1x of source data size.
-o OUTPUT, --output OUTPUT
Path to write output of packaged or unpackaged content.
-k KEY, --key KEY RSA Cryptographic Key or Certificate
-b BINSIZE, --binsize BINSIZE
[optional] Bin size bytes (default: 32000000000)
-m FILEMAXSIZE, --filemaxsize FILEMAXSIZE
[optional] File max size bytes (default: 1073741824)
-j JOBS, --jobs JOBS [optional] Job concurrency suggestion (default: 1)
Dependencies.
- Linux OS (tested on Ubuntu 20 and MacOS)
- Python 3.10+, pip
- NodeJS 16.+
- ipfs-car
- rsync
- openssl
- stream-commp
Refer to Cloudformation yaml file for Ubuntu install commands.
Clone this repo.
make init_testdata
make test
Packer currently uses RSA AES encryption. Bring your own keys, or generate a key pair (explained below).
Users are responsible for observing key management best-practices, please store your keys securely.
Interactive
openssl req -x509 -nodes -days 36500 -newkey rsa:2048 -keyout private_key.pem -out certificate.pem
Non-interactive:
openssl req -x509 -nodes -days 36500 -newkey rsa:2048 -keyout private_key.pem -out certificate.pem -subj "/C=ZZ/O=protocol.ai/OU=outercore/CN=packer"
TODO:
- AWS Packer AMI with CloudFormation template using IAM instance profile for EFS use-case, on-prem NFS via DX use-case, S3 use-case.
- Manifest file containing file-car-CommPCID mappings.
- Preserve file mtime in Manifest file. (Prereq for future incremental backup use-case?)
- Toggle encryption on/off.
- S3 support (probably rclone).
- Additional cryptographic methods: RSA-AES symmetric; GnuPG
- Filename/dirname obfuscation/encryption. (current implementation preserves cleartext path names in the CAR)
- Metrics for job progress.
- Compression.
See issues.
- Path names are transparently stored in CAR files in clear, although individual files are encrypted. User is responsible for ensuring source filesystem pathnames are hidden (e.g. preprocess all data into TAR files), obfuscated, or otherwise does not contain privacy information.
- POSIX metadata (e.g. mtime) are discarded since CAR files do not preserve file metadata. Workaround is for the user to preprocess all data into TAR files (same workaround as for path name privacy)
- Packer and the Packer Appliance does NOT push data to Filecoin Storage Providers, nor push data to gateways such as Estuary. Packer does NOT prescribe or constrain the downstream data transfer process. Flexibility and pluggability is a design tenet of Packer.
- Packer and the Packer Appliance does NOT make Filecoin deals. Packer's primary purpose is file packaging into Content ARchives. Deals requires Lotus client, and can exectured by a a downstream process. Packer does NOT prescribe or constrain the downstream dealmaking process.
This project is licensed under the terms of the MIT license.