Plain Binary: Detection of filesystem patterns in encrypted containers

As part of the latest journey into VeraCrypt, I noticed a few characteristics of the containers with modified volumes. The tests performed here are done with VeraCrypt 1.22 on Windows using default settings, but applies equally well to Linux. In order to spot these charateristics, you need to have at least 2 different samples of the container to analyze. If you don't have more than 1 sample, then you might not be able to observe these characteristics. Of cource, if you are provided with a password to the container you can normally just decrypt it and get the data. But you may not have access to any hidden volume there, and might still be wondering if a hidden volume could be present. By using this method, you may be able to say, even without the decryption key, with a little more confidence that;

It is likely that a hidden volume can be present in that container.
It is not likely that a hidden volume can be found in that container.
This chunk of random looking data might be an encrypted container with a filesystem such as NTFS or FAT.
Provide a point in time when the content within the encrypted container was modified, even though timestamps within the encrypted volume has been tampered with.
The mode of operation used in the encrypting process seems to be XTS and not CBC (or vice versa).

After the last round of playing with bitmaps in http://plainbinary.blogspot.com/2017/08/more-playing-with-veracrypt-and-hybrid.html it quickly became evident that a visual representation of the data in a bmp could become handy. Some more playing and looking at diffs of containers before and after content was modified, it suddenly got a whole lot more interesting. At this point it is appropriate to mention that I started searching for previous research on the topic, and found a few references. First and foremost, the documentation itself, has a warning. See the first bulletpoint at https://www.veracrypt.fr/en/Security%20Requirements%20for%20Hidden%20Volumes.html containing a brief section explanaining the security issue with exposing more than 1 sample of the container from different points in time. And some more significant references relating to this ;

Find a second version of container

Finding a second version may not be easy or possible at all. But some places they can be found in are;

Backups or older copies.
Left over like in unallocated on the filesystem.
Volume shadow copies.

Create the image

For the post I prepared a tool called MakeImage that will transform any piece of binary data into a bmp that can be visually inspected in a proper graphics viewer. Grab the tool at https://github.com/jschicht/MakeImage . The tool will let you specify the width of image. It is hardcoded to use ARGB 32 bits per pixel in the bmp. What we will do is create a bitmap representing the disk, and we need our bitmap to display sector size in width. Since each pixel is 4 bytes, the width must be 512/4=128 pixels (X). Now we have 1 sector from disk per line in the image. For the bytes to display properly in the image, the bmp is constructed such that the lines over Y are flipped upside down, whereas the the bytes per line on X are preserved in original order. The resulting image may thus be quite large (in heigth), and not all programs will display it properly. This is an example of what we will be dealing with;

That sample is just 4 mb in size and well suited for visual inspection. Not much can be deduced from that file when looking at it with no earlier version to compare with. It just looks like random data.

Produce a diff

Now we will decrypt the volume and store a file in there. Unmount the volume and let MakeImage create another bitmap that we can work on. Then let us compare the images and produce a diff. A very nice and easy and free tool for this diffing task is the compare component from ImageMagick. No need to reinvent the wheel here. Download the package from https://www.imagemagick.org/download/binaries/ImageMagick-7.0.6-7-portable-Q16-x64.zip and use a command like this;

compare image1.bmp image2.bmp diff.bmp

The resulting output file diff.bmp will have flashing red pixels for those that differ over the 2 images.

Identification of filesystem pattern

This is in itself not overly impressive, but merely a nice way of letting us visually analyze the difference. If we were to look at a bunch of numbers, it is sometimes hard to see properly what is there. Especially when dealing with a diff of encrypted bytes, where we don't have concrete things like size and timestamp to relate bytes to. Instead, we are left with the possibility to spot a vague pattern. The identification of such pattern may be programatically solved, but will not be covered here, at least this time. So what kind of pattern is there to spot? Let's first go through briefly how we can produce a diff. Since our pixels are built of 4 bytes we can't be more granular in the visual inspection than saying that 1 or all of these 4 bytes has changed. Turns out that VeraCrypt in the default setting will encrypt blocks of 0x10 bytes. That makes perfect sense as the tool reports a block size of 128 bits;

Which of course is expected; https://www.veracrypt.fr/en/Encryption%20Algorithms.html Therefore our 4 bytes per pixel restriction is not the determining factor for how specific we can be. Said differently, when looking at the diff of the encrypted container, we will see blocks of 4 red pixels where data has changed. If there was a need to be more granular with the diff, we could have used 1 byte per pixel, and used 2048 on width Y.

FAT

We will first investigate both the encrypted diff and then the unencrypted diff, and observe the similarity. This cropped image is from the encrypted diff taken before and after a file was copied into the FAT volume;

The first chunk of red pixels is seen at Y pixel 258 (pixels start at 0). That would mean sector number 259. Remember the VeraCrypt specification that says the standard header is from offset 0 - 0x10000 and the header for the hidden volume from offset 0x10000 - 0x20000. So the 2 headers are located on sector 1 - 256, which means that for the unencrypted volume the first 2 sectors are untouched, and the first modification is within the third sector. Let us take a closer look at the image above. Here is a crop from Y pixel 256 (and zoomed in 400%), which should be the start of the unencrypted volume;

The next thing I did was to produce dd images of both the unencrypted volumes, produce bitmaps of them and then generate a diff. This is the diff of the same region, but cropped from Y pixel 0, since there is no VeraCrypt header in the decrypted volumes;

We can immediately spot the clear similarity. Now let's take a look at what changed on the FAT volume specifically. Short version is that the first 2 chunks are for FAT 1 and FAT 2, while the third is for Root directory. It seems the location of these 3 pieces as well as the blocks between them, will highly depend on the size of the volume, cluster size and FAT version. It may not be possible to conclude on volume size by just looking at the diff. But we may possibly identify FAT 1 and FAT 2 based on their identical change, FAT 1 being close to volume start and their distance apart (in sectors) being in the multiple of 2. Here's some text applied to the encrypted diff.

Even though changes on FAT may not be very clear in pattern, it could still be determined based on elimination.

Physical location of standard vs hidden volume

Now let's take a step back and create a container with both standard and hidden volume. Same size of 4 mb for the standard (outer) volume and 500 kb for the hidden volume within the standard volume. Then put a file into both the volumes and made a diff. This image is scaled down so it fits within a reasonable window, and servers to illustrate the relative location of the standard vs hidden volume within the container;

No need to show the diff for the standard volume as it is almost identical to the diffs presented above. We will instead look at the diff for the hidden volume. Here's a crop of the hidden part zoomed in;

It clearly has some similarity to the standard volume observation. The main difference is that the hidden volume changes were found towards the end of the container. That's just how it needs to be as it resides within unallocated randomized data, and the further out in the file the hidden volume is placed, the less likely it is to be overwritten. Based on the pattern found and the fact that the changes were found towards end of container, it is likely a hidden volume. If the standard volume was unlocked and the location of changes are mapped to unallocated space within the standard volume, it is very likely the hidden volume.

Identification of pattern for NTFS

The FAT filesystem is less complex than NTFS, and will contain much less activity and diff changes than for NTFS. It might therefore be easier to spot an NTFS pattern than a FAT pattern. But that's up to the judging eye. I have myself more experience with NTFS analysis and will try to show some clear pattern that can be explained. The images are generated from volumes that have either a single file copied or a single file modified. But even then, the chunks of red are significant. There are more changes to NTFS than has been provided in the below images. Take a look at this cropped image part, which is zoomed in 500 %;

As you can see there are some patterns to spot. Some of these are very specific to NTFS. Here's some text applied to the same image part. Let me try to explain the image.

First a brief explanation of the abbreviations;

LSN: The $LogFile sequence number in MFT record header. It reflects the last FS transactions that conserned a given file.
USA: Update sequence array. Is the array of bytes that is checked and replace during fixups. Heavily used.
SI-TS: $STANDARD_INFORMATION timestamps. These timestamps are 4 x 8 bytes. That is why the red chunk is wider than the others. Few of these timestamps are updated regularly.
USN: Update sequence number found in $STANDARD_INFORMATION. Used with UsnJrnl. Think of it as LSN with $LogFile. Updated regularly when UsnJrnl is active.
FIXUPS: The 2 bytes at end of sector that are replaced with the value found in USA. Heavily used.

The strongest indicator is probably the column of red on the right side. This is the fixups, that NTFS uses for internal integrity checks in various metafiles including MFT records and INDX's. This 2 byte field is located at the end of each sector. For every remount of the volume these are regenerated, and is why we see this so clearly.

INDX

The first chunk is an INDX. We can deduce that since the height is 8 pixels, and INDX's are 8 sectors in size (4096 bytes) with fixups on each sector. The USA is normally located at offset 0x28 in INDX. That explains why we have 8 unmodified pixels (32 bytes) before it.

$MFT

The presence of $MFT is also fairly easy to spot. The MFT record has a somewhat fixed structure. Of course not all attributes are constant, but the header is always present while $STANDARD_INFORMATION is almost always present. The fixups are also always present. MFT records can be of size 1024 or 4096, though 1024 is the most usual seen. For 1024 byte records it means every second sector will most of the time have little changes. This is because attributes tend to fit within the first 512 bytes of the record for most regular files. This pattern is also very clear, with changes on every second line.

$LogFile

This file is also fairly easy to spot as it is heavily used on Windows.

The RSTR section of the $LogFile will not have much modifications except some values in the header and the restart area as well as the fixups. The RSTR is 2 times 0x1000 bytes, followed by a bunch of RCRD's. The RCRD content is updated frequently which is why you see all the red. All RCRD's have fixups that you can vaguely spot in the large red section.

$TxfLog.blf

Turned out this one had an easy to spot pattern worth mentioning as well;

This feature is rarely used by applications so I would assume that pattern to stay like that. However, if transaction were used, then it would look different. Though, it would still contain fixups, so maybe more in pattern like $LogFile but with less red. It can also be turned completely off during format, in which case it would not be present.

Password changes

What if we changed the password for the encrypted volumes? Turns out the byte changes are reflected identically for both standard and hidden volumes. There will be a complete red line like this;

So the complete sector changed. For a given volume type we will see 2 such lines, though at different offsets.

For the standard volume the offsets (Y pixels) are;

0x0 (Y = 0)
EOF - 0x20000 (max of Y - 128)

For the hidden volme the offsets are;

0x10000 (Y = 128)
EOF - 0x10000 (max of Y - 256)

Other container types

At this point I got curious what other type of containers would leave for diff. The only other encryption solutions I tried for the test was BestCrypt, ProxyCrypt and Bitlocker. All of them (going with default settings) showed similar behaviour to VeraCrypt on the diffs, with good patterns to spot. So, in the end nothing revolutionary, just block ciphers doing encryption on fixed sized blocks.

Changing Mode of Operation to CBC

This post is already way too long, but I'll throw in a brief comparison of XTS mode (which is covered in above examples) vs CBC mode. In theory it should distroy some of the pattern as the chaining process causes any block to be XOR'ed by the previous blocks ciphertext. The CBC test was done with BestCrypt. Surprisingly the NTFS pattern is still identifyable, but in a less distinct way. Here is how $TxfLog.blf now looks like;

As can be seen, it is still familiar. Now, take a look at this image covering an INDX and $MFT;

Still, the pattern is somewhat visible. Interestingly this could mean that when analyzing such diffs, it could be possible to identify which mode of operations that has been used in the encryption process, at least if you were to choose one of XTS and CBC. I could be off track with this one, but I thought it was an interesting enough observation to share anyway.

Plain Binary

Saturday, August 19, 2017

Detection of filesystem patterns in encrypted containers