* an rsync approach can probably be implemented by the worker node if
the schema supports it (later).
Why would we want to have this capability, probably because we don't want
to restart a transfer of a 1TB directory with 1000s of files. Commons VFS
doesn't really provide a way of checking what files have been successfully
transfered into the destination if the FileObject a client provided is a
directory. In order to make sure we are not copying everthing again, we'll
need to do it manually from the WN end. A quick and easy implementation
according to Pauline is to tranverse the directory tree and do a compare if
the file exists on the other end and a comparison on the size of the file.
I know might be an issue later on because doing this query and compare on
each of the files especially there are 1000s of them might not be the best
approach to supporting files in directory copy resumption.
* Russell Sim mentioned that some protocols that we plan to use might
already support file resumption. He gave an example of one big file (10TB
for example) being transfered and the process failed in the middle of the
copy. We probably might not want to restart the transfer from the beginnng
again. If there's already 5TB at the destination, http or ftp has ways
continuing the copy from the last byte of the file at the destination.
Some Rsync useful info (irods irsync in particular)...
The command compares the checksum values and file sizes of the source and
target files to determine whether synchronization is needed. Therefore, the
command will run faster if the checksum value for the specific iRODS file,
no matter whether it is a source or target, already exists and is
registered with iCAT. This can be achieved by using the -k or -K options of
the iput command at the time of ingestion, or by using the ichksum command
after the data have already been ingested into iRODS. If the -s option is
used, only the file size (instead of the the size and checksum value) is
used for determining whether synchronization is needed. This mode is gives
a faster operation but the result is less accurate.
Original issue reported on code.google.com by
gerson...@gmail.comon 6 Nov 2009 at 4:35