Looking for Software: Copy/Move/Make Files Sparse

Plug-in and third party software discussion.
Post Reply
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Looking for Software: Copy/Move/Make Files Sparse

Post by raccoon »

I am looking for a piece of software that can copy or move a Sparse File from one drive to another drive, and by the same token, will also convert a normal file into a Sparse File on that same drive.

Primer

Sparse Files are a feature of NTFS which allows an incompletely written file, such as a download-in-progress, to consume less disk space than it would on other simple file systems like FAT32 and exFAT. This is normally the behavior on any filesystem if the incomplete file is being written to sequentially, but, NTFS also allows for non-sequential writes anywhere within the file, without first having to allocate file space using zero-padding. On NTFS, a Sparse File lets you write chunks anywhere to the address mapping of a file, and the file will only consume the amount of disk space of the unique data that has been written. Consider how Bittorrent files are downloaded in random chunks, for example.

Microsoft Windows allows for Sparse Files to be renamed or moved from one directory to another on the same drive, but if you attempt to copy or move the file to a secondary file or another drive, a non-Sparse regular file is created that will consume the full disk allocation. If you want to move a folder with many Sparse Files from old Disk X to new Disk Y, then those files will consume a lot more disk space and there is no way around this.

My Request

I need a piece of software that can scan through the contents of a file, locate large sequences of 0x00, and then write a duplicate of this file while SEEKing (jumping) over those 0x00 sequences, thus ''compressing'' the normal file into a sparse file. The software should be able to do this with a file you wish to ultimately copy or move using a basic Source and Destination model. A move operation is equivalent to a Copy-then-Delete operation.

The software should be able to specify the minimum 0x00 sequence length to consider making Sparse. Eg: 512 bytes. 4096 bytes. 1048576 bytes.

I presently cannot find any such software that does this.

Why?

You can reasonably store 4 terabytes of incomplete sparse files on your 1 terabyte SSD (high speed) scratch disk. This is what I do. It is too costly to purchase a 4 terabyte SSD scratch disk, and it is impossible to move old projects from one scratch disk to another during hardware upgrades.

Who else would use this, though?

Sparse files can free up space on any NTFS disk. Any file can be, in theory, converted into a sparse file and remain fully functional without any performance hits. In fact, sparse files are faster to read into memory than normal files, given that the zero-fill does not have to be physically read from the source file... only implied. Normal users can try scanning for files that contain large 0x00 segments, and then have those files converted into sparse files to free up disk space.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Looking for Software: Copy/Move/Make Files Sparse

Post by NotNull »

Directory Opus does support copying sparse files (as sparse file, of course). But I never used it.

See also this thread and more specific: @therube's contribution.

A (very) quick websearch resulted in this (and the link in aforementioned thread).
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Looking for Software: Copy/Move/Make Files Sparse

Post by raccoon »

So a friend had written a small C program, and gave me his code, that will artificially "Sparsify" a file by scanning for large blocks of 0x00 (null) bytes and then write a copy clone of that file, where I can then delete the original bulky non-sparse normal file.

I would like to share the code with void, and ask for built-in Sparsify functionality. This could perhaps appear in the Advanced Copy dialog.

I have a few bucks kicking around in my PayPal and my Amazon account. How much of a bounty to make this happen?

[spoiler]

Code: Select all

/*
 * sparsify.c - Copy a file, making the new file sparse
 * Copyright (c) 2020, D.C. van Moolenbroek, XISE <dc@xise.nl>
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. The name of the author may not be used to endorse or promote products
 *    derived from this software without specific prior written permission.
 * 
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 *  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <winioctl.h>
#include <shellapi.h>

// Old SDK, need to define these
#define FSCTL_SET_SPARSE CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 49, METHOD_BUFFERED, FILE_SPECIAL_ACCESS)
#define FSCTL_SET_ZERO_DATA CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 50, METHOD_BUFFERED, FILE_WRITE_DATA)

#define BLOCK_SIZE 65536

void wprintf(TCHAR *fmt, ...) {
  TCHAR buf[1024];
  va_list va;
  DWORD written;

  va_start(va, fmt);
  wvsprintf(buf, fmt, va);
  buf[sizeof(buf)/sizeof(buf[0]) - 1] = 0;
  va_end(va);

  WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), buf, lstrlen(buf), &written,
   NULL);
}

int WINAPI WinMain(HINSTANCE hInstance,	HINSTANCE hPrev, LPSTR lpCmdLine,
 int nCmdShow) {
  HANDLE in_handle, out_handle;
  LPWSTR *argv, in_file, out_file;
  char *buf;
  DWORD i, last_bytes_read, bytes_read, bytes_written, flags;
  int argc, failed;

  argv = CommandLineToArgvW(GetCommandLineW(), &argc);
  if (argv == NULL || argc < 3) {
    wprintf(TEXT("Usage: sparsify.exe <infile> <outfile>\n"));
    return 1;
  }

  in_file = argv[1];
  out_file = argv[2];

  buf = LocalAlloc(0, BLOCK_SIZE);

  if (buf == NULL) {
    wprintf(TEXT("Please buy more RAM\n"));
    return 1;
  }

  in_handle = CreateFile(in_file, GENERIC_READ, FILE_SHARE_READ, NULL,
    OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL);

  if (in_handle == INVALID_HANDLE_VALUE) {
    wprintf(TEXT("Unable to open input file '%s': %u\n"), in_file, GetLastError());
    LocalFree(buf);
    LocalFree(argv);
    return 1;
  }

  out_handle = CreateFile(out_file, GENERIC_WRITE, FILE_SHARE_READ, NULL,
    CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

  if (out_handle == INVALID_HANDLE_VALUE) {
    wprintf(TEXT("Unable to open output file '%s': %u\n"), out_file, GetLastError());
    CloseHandle(in_handle);
    LocalFree(buf);
    LocalFree(argv);
    return 1;
  }

  failed = 0;
  last_bytes_read = BLOCK_SIZE;

  if (!DeviceIoControl(out_handle, FSCTL_SET_SPARSE, NULL, 0, NULL, 0,
      &bytes_read, NULL)) {
    wprintf(TEXT("Unable to make file sparse: %u\n"), GetLastError());
    failed = 1;
  }

  while (!failed) {
    if (!ReadFile(in_handle, buf, BLOCK_SIZE, &bytes_read, NULL)) {
      wprintf(TEXT("Read failure: %u\n"), GetLastError());
      failed = 1;
      break;
    }

    if (bytes_read == 0)
      break; // EOF

    if (last_bytes_read != BLOCK_SIZE) {
      wprintf(TEXT("Short read!?\n"));
      failed = 1;
      break;
    }

    for (i = 0; i < bytes_read; i++)
      if (buf[i] != '\0')
        break;

    if (i == bytes_read) { // sparse block
      if (!DeviceIoControl(out_handle, FSCTL_SET_ZERO_DATA, buf, bytes_read,
          NULL, 0, &bytes_written, NULL)) {
        wprintf(TEXT("Zero failure: %u\n"), GetLastError());
        failed = 1;
        break;
      }

      if (SetFilePointer(out_handle, bytes_read, NULL, FILE_CURRENT) ==
          INVALID_SET_FILE_POINTER) {
        wprintf(TEXT("Seek failure: %u\n"), GetLastError());
        failed = 1;
        break;
      }
    } else { // regular block
      if (!WriteFile(out_handle, buf, bytes_read, &bytes_written, NULL)) {
        wprintf(TEXT("Write failure: %u\n"), GetLastError());
        failed = 1;
        break;
      }

      if (bytes_written != bytes_read) {
        wprintf(TEXT("Short write!?\n"));
        failed = 1;
        break;
      }

    }
  }

  SetEndOfFile(out_handle);

  CloseHandle(out_handle);
  CloseHandle(in_handle);

  if (failed)
    DeleteFile(out_file);

  LocalFree(buf);
  LocalFree(argv);

  return failed;
}
[/spoiler]

CopyStream Sparse written by someone else might do a more efficient job of it. VC++ source available.

http://www.flexhex.com/docs/articles/sp ... l#download
therube
Posts: 4967
Joined: Thu Sep 03, 2009 6:48 pm

Re: Looking for Software: Copy/Move/Make Files Sparse

Post by therube »

I'm not clear, do the Holes (Sparse Zeros), actually end up being $00$ or could simply be any data all, simply not allocated to the file?


(I was sort of looking for a null finder to find potentially corrupt [media or otherwise] files. Thinking that if I read a file, & found a large swath of nulls, that could be an indicator of "corruptness". But then as I was thinking about it, a file could have space reserved for say a tag, & that could be, oh say 4096 bytes of null, so then how large does a "large swath" need to be to properly potentially point out "corruptness". And then, what if a file is in fact corrupt, but the corrupt areas are not nulls but rather simply "random" data.)
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: Looking for Software: Copy/Move/Make Files Sparse

Post by raccoon »

So, it's a little simpler than that. There is a layer of abstraction between the NTFS subsystem and the Windows File System. On the physical disk and within the NTFS filesystem, what you have are a sort of JUMP instruction that says "skip sector addresses 0x012f125 through 0x012f888". No data, but an actual programmatic instruction for the NTFS subsystem to interpret. When the Windows File System requests anything from those sectors, it gets back literal strings of zeros. The zeros aren't stored on the disk, they are generated out of thin air. It doesn't read any weird unallocated data from the disk.

Which is why Sparse Files are so difficult to manipulate without A) low level forensic disk tools while the drive is unmounted, or B) synthetic RE-SPARSIFYING of the file by scanning for long null-strings and asking Windows to flag them as "sparse" (basically just don't write to those sectors, just skip over them, and the actual file size on disk won't inflate) while writing out a new file copy of it.
therube
Posts: 4967
Joined: Thu Sep 03, 2009 6:48 pm

Re: Looking for Software: Copy/Move/Make Files Sparse

Post by therube »

Fast File Copy (FFC, a command-line copy utility), while not a file manager, has options for both sparse files & ADS.
Post Reply