2022-10-15

CVE-2007-4559: Directory traversal vulnerability

First, I want to make clear that I do not speak on behalf of the Python development team or in any other official capacity. I have stepped down as the tarfile maintainer in 2019, so this statement is purely personal.

In my opinion the claims that there is a security vulnerability in the tarfile module that has been ignored for 15 years are somewhat exaggerated and out of context.

However, I recognize that programmers may unwittingly run into problems when they use the tarfile module with archives from external sources, because they are not aware of the possible security implications. That is why there is a prominent warning in the official module documentation.

To be clear, it is completely safe to access tar archives from dubious origins with the tarfile module unless all of the following criteria are met:

  1. There are no security measures in place to make sure the archive is safe to extract, i.e. inspection, verifying a cryptographic signature, sandboxing etc.
  2. The archive is extracted to the filesystem in large parts or in its entirety without specific selection of members. (With tarfile it is possible to access members as file-like objects without them being extracted to the filesystem.)
  3. The archive is extracted to the filesystem with unrestricted user privileges.

Context

The Trellix Story

This blog post from Kasimir Schulz of Trellix shows a good example but not for a security vulnerability in tarfile but for very flawed application design.

As far as I understand it, the blog post illustrates a vulnerability of the Spyder IDE which uses the tar format for sharing and transferring a state of variables between different projects and people. The tarfile module is used to read these tar archives.

However, this feature is implemented very poorly:

  1. It is no good idea to feed data or code into your program from sources that are not trustworthy without prior inspection or other sufficient security measures.
  2. In the example code, tarfile is used to extract the entire archive to the filesystem although only one file from the archive is actually needed. It would be much safer, more efficient and less error-prone to use tarfile's extractfile() method to read the file's data directly from the tar archive without extracting anything to the filesystem.
  3. The extracted file contains a set of pickled Python objects which are read into memory. The documentation for the pickle module explicitly states that it is not secure to unpickle data from sources you do not trust. Even if you take tarfile out of the equation, Spyder IDE is still vulnerable to attacks via pickle.

Therefore, this blog post does not show a security vulnerability in the tarfile module but instead in the Spyder IDE. Both the tarfile and the pickle modules are used in ways they are not supposed to be used and that are strongly discouraged in the documentation.

It is peculiar that this example is used to attest the tarfile module a security vulnerability but not the pickle module.

Conclusion

I understand the community's desire to fix this issue no matter how. Unfortunately, fixing these kinds of issues is not trivial and requires a lot of effort.

After dismissing the first bug report in 2007, I proposed a patch for discussion in the Python bug tracker in 2014. At that time, it seemed to me that this was not the way most of the people wanted the problem to be fixed. The discussion instantly died down, so there was no clear vote and the patch was never implemented.

In 2018, the discussion about the patch was resumed, but due to time constraints I was no longer able to participate. I had increasing difficulty fulfilling my role as tarfile maintainer. Therefore, in 2019 I gave up my position as maintainer.

References

  1. CVE-2007-4559: Directory traversal vulnerability
  2. Trellix Blog Post: Tarfile: Exploiting the World With a 15-Year-Old Vulnerability
  3. Python-Dev Mailing-List: tarfile and directory traversal vulnerability
  4. Python Issue 45385: tarfile insecure pathname extraction (2007)
  5. Python Issue 65308: tarfile: Traversal attack vulnerability (2014)
  6. Python Issue 73974: tarfile: Add absolute_path option to tarfile, disabled by default (2017)