Scanning the Alexa Top 1M for .DS_Store files

Some readers may remember our Analysis of .git folders in the Alexa Top 1M. WIth our tools we were able to discover and retrieve (hidden) directories and files (even without directory listing). We developed a similar approach of uncovering hidden files again, but this time with the help of .DS_Store files. In this blogpost we will share the methodology, the resulting security implications as well as our results from scanning the Alexa Top 1M and how we could have obtained sensitive files from several websites.

What is a “.DS_Store” file?

Lets have a look on what exactly a .DS_Store file is. Some people may have seen it after they handed an USB drive to an Apple-using colleague. It will most likely contain at least one (hidden) file with the name .DS_Store. The name stands for “Desktop Services Store” and the file contains meta information about a directory’s files and display options. On Mac-OS based operating systems the “Finder” will create those files automatically. Similar to other *NIX-like operating systems, the file name is prefixed with a dot to hide it from a normal user. Unfortunately, the file format is not open, but proprietary as it was developed by Apple. Therefore, it is usually only used on Apple devices. The blogpost “The origins of .DS_Store” by one of its inventors from 2006 discusses how he would have changed the name and that the file’s distribution on the file system is huger than expected. He says that instead of creating the file when a directory is viewed the first time, it was supposed to only be created when the directory’s (display) options changed. To this date, you will find the file all around your harddrive if you’re a Mac-OS user and you might think that this file is just an unless leftover.

.DS_Store file in a text editor

Some readers might know or might have noticed that the file is not human-readable, but consists binary data. Because a detailed explanation of the file’s structure and format would have been too much for this blogpost, [Sebastian published it on this 0day.work blog.

How can this file become an issue?

A .DS_Store file can become a (security) issue when it falls from the local file system into the hands of others. For example by uploading a website from a development system onto a server on the internet. If an attacker can obtain such a file from a webserver - that didn’t block the request - it could help her to learn about other (hidden) files on the webserver. We were curious to see if this issue arises in the real world and did some research that we will describe in the next sections.

We used the well-known Alexa top 1M list of the most visited websites on the internet. Should we find the file on those websites, they most-likely are prone to an information leak. Sebastian’s [library to parse .DS_Store files] (https://github.com/gehaxelt/ds_store/blob/master/ds_store.go) written in Go integrated well with our scanning tool that we have used for previous research. During the last hacker congress (34C3) in Leipzig, we used the fast internet connection to scan the list within the four days.

The methodology

The tool does the following:

  • Send a HTTP GET-request to http://domain.tld/.DS_Store
  • Parse the file and extract the file names
  • If the recursive mode is enabled: Check if any of the file names is a directory and if there’s another .DS_Store file accessible
  • For all obtained file paths: Send a HTTP HEAD-request to the resulting URL to check if the file is accessible

For our analysis, we used the tool with the recursive mode enabled, because usually the interesting and sensitive files are not in the document root. But luckily, there is often another .DS_Store file in a subfolder that allows to get a deeper insight. However, the parsing of the .DS_Store files and the resulting file names didn’t tell us if the file was also uploaded from the local system to the server. To answer the question what files are still accessible and potentially downloadable from the server, we used the following simple test: Send a HTTP HEAD-request to the URL where the file is expected to be. A webserver will only reply with the headers and omit the response body. This method might not be the most reliable one, because some webservers block HEAD requests or send a “OK” (200) status code even for “Not found” (404) errors. However, you cannot claim that we have accessed any (potentially) sensitive information from your server. When we received a statuscode of 200, we assumed that the file exists and can be downloaded.

The results

Our tool was verbose and we gathered the data in a huge logfile. From the 1M domain list, about 10 000 exposed a .DS_Store file. We were disappointed at first, because we expected more sites to be affected, but it turned out that even the small dataset gave interesting insights. Furthermore, the parser is probably not 100% bug free and compatible with all .DS_Store files, so that might as well be a reason for missing some websites. In the end, the logfile contained 1185671 URLs (due to the recursion the number exceeds 1M) that we will discuss now.

The HTTP response codes are distributed as follows (only the top 5):

diagram with the distribution of the http status codes

The majority of the discovered files seemed to be accessible. For more than 21 000 URLs we failed to get a statuscode and the 403, 404 or 500 status codes indicate that the files were likely not accessible. Furthermore, the number of accessible files was not distributed evenly between the websites. We observed that there are several big websites that have a .DS_Store file in almost all their subdirectories. In that case the developers apparently overlooked to remove the file what could allow an attacker to get a deep insight into webserver’s folder/file structure.

Domain names are masked for security reasons

1
2
3
4
5
6
7
8
9
10
  80957 domain1.tld
  67754 domain2.tld
  55143 domain3.tld
  19688 domain4.tld
  19520 domain5.tld
  18989 domain6.tld
  12525 domain7.tld
  12058 domain8.tld
  11521 domain9.tld
  11463 domain10.tld

Another interesting observation is the distribution among the top level domains. Here’s an excerpt from the top 25:

Distribution von diffrent TLDs

After we showed what domains and to what extent they are affected, we can go one and look into other details. For example the file names that we have obtained using this method. As explained earlier, the following numbers are based on a returned statuscode of 200. So there might be more (interesting) files that an attacker could download if she just tries to do so. The top 10 of all file endings is:

1
2
3
4
5
6
7
8
9
10
 256715 .jpg
  75177 .png
  64835 .php
  42422 .html
  39691 .gif
  23683 .htm
  16397 .pdf
   9736 .js
   9346 .txt
   6886 .css

If you look at the full 1500 entries long list, you will spot file endings that are more likely to pose a security risk and could contain sensitive information:

Selection of favorite data types and their count

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
661 .bak
569 .gz
549 .doc
464 .db
343 .csv
266 .eml
248 .log
240 .old
202 .docx
186 .inc
162 .config
129 .cfg
123 .sql
123 .sh
105 .htaccess
 55 .git
 35 .LOG
 23 .orig
 22 .tgz
 21 .pem
 18 .out
 16 .conf
 16 .cfs
 10 .php_old
 10 .php_
 10 .key
  8 .back
  6 .backup
  5 .bkp
  4 .php_bak
  3 .htpasswd
  2 .core
  2 .bash_history

For example, the following files fall into the “.bak” category:

1
2
3
4
  9 index.php.bak
  2 wp-config.php.bak
  2 php.ini.bak
  2 db.bak

Some of those files are likely to be easily downloadable, because not authentication was in place and they not only existed on the developer’s local environment, but also on the server. With the above described technique, we noticed several file endings and file names that indicated a leak of sensitive data. As far as it was possible for us, we tried to notify the affected parties. Most of the contacted administrators have fixed the issue by removing the files from the webserver, but other’s didn’t seem to bother. Hopefully this blogpost will help to increase the awareness about the issue.

More Interesting facts

It is very interesting to take a look on the history of this issue, because there have been several minor “fixes” - but most of them still do not address the core problem. In 2001 there have been the first discussions about the issue - today (17 years later) this file still leads to security problems.

In the past Apple had to stop the creation of .DS_Store files on shared network drives, due to a huge amount customer complaints. Furthermore they published a support article that describes how to deactivate this functionality. As mentioned in the beginning: Even one of the inventors of “.DS_Store” was not that happy about the chosen name and called its vast distribution “an unfortunate bug”. Adobe also discusses the “.DS_Store” in the Dreamweaver FAQ and recommends to create a cronjob in order to delete the .DS_Store-files periodically. We believe this “solution” is better than nothing, but still does not solve to core problem - there’s still a timespan in which these files are downloadable for an attacker. About two years ago a researcher found a .DS_Store on Twitter’s website and got a bug bounty reward of 560$. He was able to unveil a license key, a wifi certificate and a CA certificate.

Countermeasures

To avoid the information leaks of this kind, we recommend to obey the an important, general rule: Never upload data to the webserver’s document root that should not be (somehow) accessible. The rule might make sense to you, but our colleague Hanno Böck showed at the 34C3 that “wget” is often enough to obtain sensitive information ranging up to even full datasets. His research shows that there’s still a lot of room for improvement.

If you want to check if the discussed files can be found on your Linux-server, you can use the following commands:

1
2
cd /var/www # or wherever your document root is
find . -type f -iname "*.DS_Store*"

The command will search through all folders in /var/www for files with “.DS_Store” in their name and print it on the console. If any files were found and you did not explicitly put them there, then you should consider adding the -delete flag to the above command to delete all the found files.

Removing the files is a first step, but better security can be achieved by blocking the access to files with that file name. Here’s how to configure the two most common webservers:

Apache

Add the following lines to the httpd.conf to block access to the file:

1
2
3
4
<Files ~ "\.DS_Store$">
    Order allow,deny
    Deny from all
</Files>

Nginx

Add the following lines to your server blocks:

1
2
3
location ~ \.DS_Store$ {
      deny all;
}

In addition to that, you should check, that:

  • those files are not committed to your VCS (e.g. git/svn/etc) and then pulled on your server.
  • those files are excluded or removed prior to a rsync/(s)ftp or other file transfer to the server.

Proof of Concept

We have built a small demowebsite that lets you upload and parse out file names of .DS_Store files online to help you understand what information may have leaked somewhere.

Disclamer: Please use it only for educational purposes or to test your own files. Any malicious use is prohibited!

There is a small FAQ, but feel free to send us an email if you have further questions about this tool.

Further research

The .DS_Store files are sometimes also included in ZIP files when they were created on Mac-OS. Furthermore, there are files/directories like “.Trash”, “desktop.ini” or “Thumbs.db” that might as well contain pointers to file names. Our parser focused on the extraction of file names, but apparently there is also other information stored in a .DS_Store file, e.g comments. Those information might help to increase the attack surface.

The Team of Internetwache.org