Subversion Windows Deduplication Bug

Windows Server 2012 introduced a valuable Data Deduplication feature to reduce physical storage needs on storage volumes where muliple copies of identical data reside. Unfortunately, the way this feature is implemented is visible to client programs in the form of “reparse points”. Older uses of reparse points included a symbolic link feature similar to that found on Unix and Linux systems. Because symbolic links may require special treatment, shared libraries like the Apache Portable Runtime (APR) detect most reparse points as symbolic links even if in reality these may just be Windows Deduplicated files (storage-optimized files). The popular file version control system “Subversion” relies on the APR library to handle files in a users working copy of project files. This APR behavior causes Subversion to treat deduplicated files on Windows as symbolic links which receive special treatment as either text files with symlink path (Unix/Linux created symlink), or as unsupported entities (Windows created symlink). In either case these don’t match the necessary behavior for Subversion to ignore the reparse point and treat the file as normal (not a symlink).

Related bugs include these issues which remain open as of 4 Jan 2017. Additional discussion of Subversion symlink behavior is posted on the Subversion FAQ.

  • NTFS Reparse Points are treated as [Unix/Linux] APR_LNK, only correct for junction/dir link, APR Bug # 47630.
    • Resolution of the APR bug might result in an acceptable fix for Subversion to permit windows-deduplication of a working copy.
  • Add support for Windows symlinks (junction points), SVN-3570 (issues.apache.org)
    • While unrelated to the deduplication problem, this bug does mention the same “Symbolic links are not supported on this platform” error that users will see if deduplication reparse points are detected in the working copy during a commit / check-in.

Until a fix is available, users need to avoid storing any working copy of subversion projects on a windows deduplicated volume. If files in a working copy become deduplicated, the resulting reparse points may lead to corruption of properties in the repository. Specifically the “svn:special” property will be set to “*” indicating the file as a symbolic link. Other users checking out or updating these marked files may receive partial (corrupted) files in working directory due to an apparent assumption by the SVN client that the file is a text representation of symbolic link – files may be truncated? Removing the special property from the affected files in repository and checking out a fresh copy should resolve the issue, but the corrupt working copy should be abandoned (delete after backing up any changed / uncommitted files). To avoid checking in a corrupt file, I recommend modifying the file properties with svnmucc or another tool like TortoiseSVN Repository Browser that can change properties directly against the repository URL without relying on a working copy. To view all file properties in a repository, use a command like “svn proplist …”

In order to resolve these problems, I have created Powershell scripts to assist in performing the following maintenance tasks:

  • detect and remove deduplication on working copies (working copy location must also be added to exclusion list in windows deduplication settings).
  • detect and remove svn:special property from files directly from repository using svnmucc

Unfortunately I don’t have time to clean up and post sample code at the moment but I hope to return in the future to add the powershell sample scripts.

Advertisements

About notesbytom

Keeping technology notes on WordPress.com to free up my mind to solve new problems rather than figuring out the same ones repeatedly :-).
This entry was posted in System Administration and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s