[personal profile] mjg59
First off - nothing I'm going to talk about in this post is novel or overly surprising, I just haven't found a clear writeup of it before. I'm not criticising any design decisions or claiming this is an important issue, just raising something that people might otherwise be unaware of.

With that out of the way: Automatic deduplication of data is a feature of modern filesystems like zfs and btrfs. It takes two forms - inline, where the filesystem detects that data being written to disk is identical to data that already exists on disk and simply references the existing copy rather than, and offline, where tooling retroactively identifies duplicated data and removes the duplicate copies (zfs supports inline deduplication, btrfs only currently supports offline). In a world where disks end up with multiple copies of cloud or container images, deduplication can free up significant amounts of disk space.

What's the security implication? The problem is that deduplication doesn't recognise ownership - if two users have copies of the same file, only one copy of the file will be stored[1]. So, if user a stores a file, the amount of free space will decrease. If user b stores another copy of the same file, the amount of free space will remain the same. If user b is able to check how much free space is available, user b can determine whether the file already exists.

This doesn't seem like a huge deal in most cases, but it is a violation of expected behaviour (if user b doesn't have permission to read user a's files, user b shouldn't be able to determine whether user a has a specific file). But we can come up with some convoluted cases where it becomes more relevant, such as law enforcement gaining unprivileged access to a system and then being able to demonstrate that a specific file already exists on that system. Perhaps more interestingly, it's been demonstrated that free space isn't the only sidechannel exposed by deduplication - deduplication has an impact on access timing, and can be used to infer the existence of data across virtual machine boundaries.

As I said, this is almost certainly not something that matters in most real world scenarios. But with so much discussion of CPU sidechannels over the past couple of years, it's interesting to think about what other features also end up leaking information in ways that may not be obvious.

(Edit to add: deduplication isn't enabled on zfs by default and is explicitly triggered on btrfs, so unless it's something you've enabled then this isn't something that affects you)

[1] Deduplication is usually done at the block level rather than the file level, but given zfs's support for variable sized blocks, identical files should be deduplicated even if they're smaller than the maximum record size

Tahoe LAFS information confirmation attack

Date: 2020-07-28 01:31 am (UTC)
From: [personal profile] jsgf

This came up in Tahoe's use of convergent encryption which allows you do confirm some missing information. For example, if you know someone has a PDF template of a form which you know everything about except an SSN field, you can generate forms with all the SSNs and look for collisions, thereby confirming their SSN.

In this case, you could perform the same attack but look for dedups.

Edited (typos) Date: 2020-07-28 01:32 am (UTC)

Re: Tahoe LAFS information confirmation attack

Date: 2020-08-16 04:25 am (UTC)
From: (Anonymous)
It's fractionally worse than that: if there is a file that extends across multiple de-duplication blocks and some part of it changes, then you can confirm it's existence without having to figure out the changing bit by detecting the de-duplication of one of the unchanging blocks.

So, using your example, if the SSN is in the last few bytes of a PDF containing a couple of big images that make it large enough to span multiple de-duplication blocks, you can infer it's existence by detecting the de-duplication of the first block of the PDF.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Expand Cut Tags

No cut tags