From a5f74f1b1618863b8489bd6fede8222cb9e6d400 Mon Sep 17 00:00:00 2001 From: Marc Vertes Date: Thu, 3 Oct 2024 22:31:22 +0200 Subject: add unflatenc and unchunkify --- README.md | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 50604d0..68d9a2b 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Incremental encrypted backup system -## Current design +## design v0 1. cksum original (sha256) 2. compress (gzip) @@ -17,17 +17,45 @@ Good: - chunks are named from their compressed/crypted hmac. Problems: -- the salt (or iv in aes) must be set to 0. Weak encryption. +- the salt (or iv in aes) must be static, to make the encryption + idempotent, otherwise no dedup. Weak encryption. - dedup occurs only for append only files. The same chunk content will lead to a different hmac if located at a different offset. -To fix: +## design v1 + - chunk before compression -- name chunks from cksum of uncompressed/unencrypted data. +- name chunks from checksum of uncompressed/unencrypted data (invariant => allow dedup). - then compress and encrypt (in this order). Chunk encryption can use randomized cipher, but a hmac must be added at end of file (before encrypt) to check integrity without having to decrypt/decompress. +This is achieved through aes-gcm. + +Problems: +- possible collisions of chunks with same name (same content) but encrypted + with a foreign key (different user), which would a user to download a block + which he could not decrypt. + +## design v2 + +Each user has a fixed unique id: random 96 bits (12 bytes). This id is added +to the content of each block / file prior to compute the invariant checksum +but is not transmitted (no storage overhead). + +It allows to avoid collisions between same original content blocks in different +users. Dedup should only happen in the same user space, as one can not decrypt +a block from another user. + +Problems: +- in this design, and all previous ones, there is no way to disgard data in an + archive. For example, tarsnap does not allow to suppress data. + +## Roadmap + +disgarded: +- encode checksums in base64 instead of hex. Wrong idea: incompatible with case + insensitive filesystems (macos). ## What tarsnap is doing @@ -37,7 +65,6 @@ file (before encrypt) to check integrity without having to decrypt/decompress. 4. compress chunk (deflate) 5. encrypt chunk (rsa2048) + HMAC - ## References - tarsnap: https://www.tarsnap.com https://github.com/tarsnap/tarsnap -- cgit v1.2.3