diff options
| author | Marc Vertes <mvertes@free.fr> | 2024-10-03 22:31:22 +0200 |
|---|---|---|
| committer | Marc Vertes <mvertes@free.fr> | 2024-10-03 22:31:22 +0200 |
| commit | a5f74f1b1618863b8489bd6fede8222cb9e6d400 (patch) | |
| tree | e19d599f8eb8cf9398934228d7eda47efa08e63e /README.md | |
| parent | 282149e530d1d19fc9903b0a688de5b794540f48 (diff) | |
add unflatenc and unchunkify
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 37 |
1 files changed, 32 insertions, 5 deletions
@@ -2,7 +2,7 @@ Incremental encrypted backup system -## Current design +## design v0 1. cksum original (sha256) 2. compress (gzip) @@ -17,17 +17,45 @@ Good: - chunks are named from their compressed/crypted hmac. Problems: -- the salt (or iv in aes) must be set to 0. Weak encryption. +- the salt (or iv in aes) must be static, to make the encryption + idempotent, otherwise no dedup. Weak encryption. - dedup occurs only for append only files. The same chunk content will lead to a different hmac if located at a different offset. -To fix: +## design v1 + - chunk before compression -- name chunks from cksum of uncompressed/unencrypted data. +- name chunks from checksum of uncompressed/unencrypted data (invariant => allow dedup). - then compress and encrypt (in this order). Chunk encryption can use randomized cipher, but a hmac must be added at end of file (before encrypt) to check integrity without having to decrypt/decompress. +This is achieved through aes-gcm. + +Problems: +- possible collisions of chunks with same name (same content) but encrypted + with a foreign key (different user), which would a user to download a block + which he could not decrypt. + +## design v2 + +Each user has a fixed unique id: random 96 bits (12 bytes). This id is added +to the content of each block / file prior to compute the invariant checksum +but is not transmitted (no storage overhead). + +It allows to avoid collisions between same original content blocks in different +users. Dedup should only happen in the same user space, as one can not decrypt +a block from another user. + +Problems: +- in this design, and all previous ones, there is no way to disgard data in an + archive. For example, tarsnap does not allow to suppress data. + +## Roadmap + +disgarded: +- encode checksums in base64 instead of hex. Wrong idea: incompatible with case + insensitive filesystems (macos). ## What tarsnap is doing @@ -37,7 +65,6 @@ file (before encrypt) to check integrity without having to decrypt/decompress. 4. compress chunk (deflate) 5. encrypt chunk (rsa2048) + HMAC - ## References - tarsnap: https://www.tarsnap.com https://github.com/tarsnap/tarsnap |
