Skip to content
Open
17 changes: 9 additions & 8 deletions docs/explanations/manifest_csv.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Manifest CSV

The manifest is a CSV file with file locations and metadata used to bulk upload and download files in Synapse. It is the standard manifest format used by `Project.sync_from_synapse`, `Project.sync_to_synapse`, `Folder.sync_from_synapse`, `Folder.sync_to_synapse`, the Synapse UI download cart, and the `synapse get-download-list` CLI command.
The manifest is a CSV file with file locations and metadata used to bulk upload and download files in Synapse. It is the standard manifest format used by `Project.sync_from_synapse`, `Project.sync_to_synapse`, `Folder.sync_from_synapse`, `Folder.sync_to_synapse`, `Project.generate_sync_manifest`, `Folder.generate_sync_manifest`, the Synapse UI download cart, and the `synapse get-download-list` CLI command.

!!! note
This CSV manifest replaces the legacy TSV manifest produced by `synapseutils.syncFromSynapse`. The `syncFromSynapse` and `syncToSynapse` utility functions are deprecated and will be removed in v5.0.0. Use `Project.sync_from_synapse` / `Folder.sync_from_synapse` and `Project.sync_to_synapse` / `Folder.sync_to_synapse` instead. See the [legacy TSV manifest documentation](manifest_tsv.md) for details on the old format.
Expand Down Expand Up @@ -81,8 +81,6 @@ This is an annotation with 3 values:
|-----------------|----------|--------------|
| /path/file1.txt | syn1243 | "[a,b,c]" |



### Dates in the manifest file

Dates within the manifest file will always be written as [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format in UTC without milliseconds. For example: `2023-12-20T16:55:08Z`.
Expand All @@ -93,11 +91,12 @@ Dates can be written in other formats specified in ISO 8601 and they will be rec

The CSV manifest format is shared across multiple tools:

| Source | Filename |
|----------------------------------------------------------|---------------------------------|
| `Project.sync_from_synapse` / `Folder.sync_from_synapse` | manifest.csv |
| Synapse UI download cart | manifest.csv |
| CLI `synapse get-download-list` | `manifest_<timestamp>.csv` |
| Source | Filename |
|----------------------------------------------------------------------|---------------------------------|
| `Project.sync_from_synapse` / `Folder.sync_from_synapse` | manifest.csv |
| `Project.generate_sync_manifest` / `Folder.generate_sync_manifest` | user-specified `manifest_path` |
| Synapse UI download cart | manifest.csv |
| CLI `synapse get-download-list` | `manifest_<timestamp>.csv` |

A manifest generated by any of these sources can be used as input to `sync_to_synapse`, provided the `path` column is present with valid local file paths. Manifests from the Synapse UI do not include a `path` column by default, so users must add it before uploading.

Expand All @@ -113,7 +112,9 @@ A manifest generated by any of these sources can be used as input to `sync_to_sy

- [Project.sync_from_synapse][synapseclient.models.Project.sync_from_synapse]
- [Project.sync_to_synapse][synapseclient.models.Project.sync_to_synapse]
- [Project.generate_sync_manifest][synapseclient.models.Project.generate_sync_manifest]
- [Folder.sync_from_synapse][synapseclient.models.Folder.sync_from_synapse]
- [Folder.sync_to_synapse][synapseclient.models.Folder.sync_to_synapse]
- [Folder.generate_sync_manifest][synapseclient.models.Folder.generate_sync_manifest]
- [Manifest TSV (legacy)](manifest_tsv.md)
- [Managing custom metadata at scale](https://help.synapse.org/docs/Managing-Custom-Metadata-at-Scale.2004254976.html#ManagingCustomMetadataatScale-BatchUploadFileswithAnnotations)
1 change: 1 addition & 0 deletions docs/reference/experimental/async/folder.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ at your own risk.
- walk_async
- sync_from_synapse_async
- sync_to_synapse_async
- generate_sync_manifest_async
- flatten_file_list
- map_directory_to_all_contained_files
- get_permissions_async
Expand Down
1 change: 1 addition & 0 deletions docs/reference/experimental/async/project.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ at your own risk.
- walk_async
- sync_from_synapse_async
- sync_to_synapse_async
- generate_sync_manifest_async
- flatten_file_list
- map_directory_to_all_contained_files
- get_permissions_async
Expand Down
1 change: 1 addition & 0 deletions docs/reference/experimental/sync/folder.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ at your own risk.
- walk
- sync_from_synapse
- sync_to_synapse
- generate_sync_manifest
- flatten_file_list
- map_directory_to_all_contained_files
- get_permissions
Expand Down
1 change: 1 addition & 0 deletions docs/reference/experimental/sync/project.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ at your own risk.
- walk
- sync_from_synapse
- sync_to_synapse
- generate_sync_manifest
- flatten_file_list
- map_directory_to_all_contained_files
- get_permissions
Expand Down
28 changes: 12 additions & 16 deletions docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Here is where you'll find the code for the uploading data in bulk tutorial.
"""

# --8<-- [start:imports_and_constants]
import pandas as pd

import synapseclient
Expand All @@ -14,34 +15,26 @@
DIRECTORY_FOR_MY_PROJECT = "test_folder" # This should exist with your files in it
PATH_TO_MANIFEST_FILE = "test_manifest.csv" # This doesn't need to exist yet
SYNAPSE_PROJECT_ID = "" # Put your Synapse project ID here. This is the project where you want to upload your data.
project = Project(id=SYNAPSE_PROJECT_ID)
# --8<-- [end:imports_and_constants]

# TODO switch to using new version of synapseutils/sync.py.generate_sync_manifest
# https://sagebionetworks.jira.com/browse/SYNPY-1809

# --8<-- [start:generate_manifest]
# Step 2: Create a manifest CSV file with the paths to the files and their parent folders
# Note: When this command is run it will re-create your directory structure within
# Synapse. Be aware of this before running this command.
# If folders with the exact names already exists in Synapse, those folders will be used.


# old function generates a TSV
from synapseutils import generate_sync_manifest

generate_sync_manifest(
syn=syn,
project.generate_sync_manifest(
directory_path=DIRECTORY_FOR_MY_PROJECT,
parent_id=SYNAPSE_PROJECT_ID,
manifest_path=PATH_TO_MANIFEST_FILE,
)
# reformat the manifest file to work with sync_to_synapse
manifest_df = pd.read_csv(PATH_TO_MANIFEST_FILE, sep="\t")
manifest_df.rename(columns={"parent": "parentId"}, inplace=True)
manifest_df.to_csv(PATH_TO_MANIFEST_FILE, index=False)
# --8<-- [end:generate_manifest]

# --8<-- [start:sync_to_synapse]
# Step 3: After generating the manifest file, we can upload the data in bulk
project = Project(id=SYNAPSE_PROJECT_ID)
project.sync_to_synapse(manifest_path=PATH_TO_MANIFEST_FILE, send_messages=False)
# --8<-- [end:sync_to_synapse]

# --8<-- [start:add_annotation]
# Step 4: Let's add an annotation to our manifest file
# Pandas is a powerful data manipulation library in Python, although it is not required
# for this tutorial, it is used here to demonstrate how you can manipulate the manifest
Expand All @@ -57,7 +50,9 @@
df.to_csv(PATH_TO_MANIFEST_FILE, index=False)

project.sync_to_synapse(manifest_path=PATH_TO_MANIFEST_FILE, send_messages=False)
# --8<-- [end:add_annotation]

# --8<-- [start:add_provenance]
# Step 5: Let's create an Activity/Provenance
# First let's find the row in the CSV we want to update. This code finds the row number
# that we would like to update.
Expand Down Expand Up @@ -86,3 +81,4 @@
df.to_csv(PATH_TO_MANIFEST_FILE, index=False)

project.sync_to_synapse(manifest_path=PATH_TO_MANIFEST_FILE, send_messages=False)
# --8<-- [end:add_provenance]
22 changes: 11 additions & 11 deletions docs/tutorials/python/upload_data_in_bulk.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,18 +57,18 @@ tools to open and manipulate CSV files.

First let's set up some constants we'll use in this script
```python
{!docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py!lines=5-16}
--8<-- "docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py:imports_and_constants"
```

## 2. Create a manifest CSV file to upload data in bulk

We use `synapseutils.generate_sync_manifest` to walk our local directory and produce a
manifest that maps each file to the correct parent folder in Synapse (creating folders
as needed). The output is a TSV with a `parent` column, so we convert it to CSV and
rename `parent` to `parentId` for use with `sync_to_synapse`.
We call `Project.generate_sync_manifest` on the project we want to mirror into.
It walks our local directory and produces a CSV manifest that maps each file to
the correct parent folder in Synapse (creating folders as needed). The output
is ready to hand directly to `sync_to_synapse`.

```python
{!docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py!lines=20-39}
--8<-- "docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py:generate_manifest"
```

<details class="example">
Expand All @@ -89,7 +89,7 @@ path,parentId

## 3. Upload the data in bulk
```python
{!docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py!lines=41-43}
--8<-- "docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py:sync_to_synapse"
```


Expand All @@ -115,7 +115,7 @@ you are not comfortable with pandas you may use any tool that can open and manip
CSV files such as Excel or Google Sheets.

```python
{!docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py!lines=45-59}
--8<-- "docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py:add_annotation"
```

Now that you have uploaded and annotated your files you'll be able to inspect your data
Expand All @@ -137,7 +137,7 @@ Synapse. Additionally we'll link off to a sample URL that describes a process th
may have executed to generate the file.

```python
{!docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py!lines=61-88}
--8<-- "docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py:add_provenance"
```

After running this code we may again inspect the synapse web UI. In this screenshot i've
Expand All @@ -156,14 +156,14 @@ navigated to the Files tab and selected the file that we added a Provenance reco
<summary>Click to show me</summary>

```python
{!docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py!}
--8<-- "docs/tutorials/python/tutorial_scripts/upload_data_in_bulk.py"
```
</details>

## References used in this tutorial

- [syn.login][synapseclient.Synapse.login]
- [synapseutils.generate_sync_manifest][]
- [Project.generate_sync_manifest][synapseclient.models.mixins.StorableContainer.generate_sync_manifest]
- [Project.sync_to_synapse][synapseclient.models.mixins.StorableContainer.sync_to_synapse]
- [Manifest CSV format](../../explanations/manifest_csv.md)
- [Activity/Provenance](../../explanations/domain_models_of_synapse.md#activityprovenance)
75 changes: 75 additions & 0 deletions synapseclient/models/mixins/storable_container.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
)
from synapseclient.models.services.manifest import (
generate_manifest_csv,
generate_sync_manifest,
read_manifest_for_upload,
upload_sync_files,
)
Expand Down Expand Up @@ -702,6 +703,80 @@ async def main():
progress_bar.close()
return uploaded_files

@otel_trace_method(
method_to_trace_name=lambda self, **kwargs: f"{self.__class__.__name__}_generate_sync_manifest: {self.id}"
)
async def generate_sync_manifest_async(
self: Self,
directory_path: str,
manifest_path: str,
*,
synapse_client: Synapse | None = None,
) -> None:
"""Walk a local directory, mirror its folder hierarchy under this
container in Synapse, and write a CSV manifest ready for
[sync_to_synapse_async][synapseclient.models.mixins.StorableContainer.sync_to_synapse_async].

The manifest has two columns: path (absolute, symlink-resolved) and
parentId (the Synapse ID of the file's containing folder). Existing
Synapse folders with matching names and parents are reused. Directory
symlinks inside directory_path are not followed; file symlinks record
the symlink path and upload the target's contents. Zero-byte files are
skipped with a warning — Synapse rejects empty files. I/O errors during
walk are logged and skipped; an empty source directory produces a
warning and a header-only manifest.

Arguments:
directory_path: Path to the local directory to be pushed to
Synapse.
manifest_path: Path where the generated manifest CSV will be
written.
synapse_client: If not passed in and caching was not disabled by
Synapse.allow_client_caching(False) this will use the last
created instance from the Synapse class constructor.

Raises:
ValueError: If this container's id is None, or if directory_path
does not exist or is not a directory, or if this container's
id exists in Synapse but is not a Folder or Project.
SynapseHTTPError: If this container's id does not exist in
Synapse.

Example: Generate a manifest and upload the files
Mirror ./my_data under a Synapse project and then upload it.

```python
import asyncio
from synapseclient import Synapse
from synapseclient.models import Project

async def main():
syn = Synapse()
syn.login()

project = Project(id="syn12345")
await project.generate_sync_manifest_async(
directory_path="./my_data",
manifest_path="manifest.csv",
)
await project.sync_to_synapse_async(manifest_path="manifest.csv")

asyncio.run(main())
```
"""
if self.id is None:
raise ValueError(
f"Cannot generate a sync manifest for a {type(self).__name__}"
" that has not been stored in Synapse. Set id on this"
" container (or store it) first."
)
await generate_sync_manifest(
directory_path=directory_path,
parent_id=self.id,
manifest_path=manifest_path,
synapse_client=synapse_client,
)

def flatten_file_list(self) -> List["File"]:
"""
Recursively loop over all of the already retrieved files and folders and return
Expand Down
56 changes: 56 additions & 0 deletions synapseclient/models/protocols/storable_container_protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,3 +300,59 @@ def sync_to_synapse(
```
"""
return []

def generate_sync_manifest(
self: Self,
directory_path: str,
manifest_path: str,
*,
synapse_client: Synapse | None = None,
) -> None:
"""Walk a local directory, mirror its folder hierarchy under this
Comment thread
andrewelamb marked this conversation as resolved.
container in Synapse, and write a CSV manifest ready for
[sync_to_synapse][synapseclient.models.mixins.StorableContainer.sync_to_synapse].

The manifest has two columns: path (absolute, symlink-resolved) and
parentId (the Synapse ID of the file's containing folder). Existing
Synapse folders with matching names and parents are reused. Directory
symlinks inside directory_path are not followed; file symlinks record
the symlink path and upload the target's contents. Zero-byte files are
skipped with a warning — Synapse rejects empty files. I/O errors during
walk are logged and skipped; an empty source directory produces a
warning and a header-only manifest.

Arguments:
directory_path: Path to the local directory to be pushed to
Synapse.
manifest_path: Path where the generated manifest CSV will be
written.
synapse_client: If not passed in and caching was not disabled by
Synapse.allow_client_caching(False) this will use the last
created instance from the Synapse class constructor.

Raises:
ValueError: If this container's id is None, or if directory_path
does not exist or is not a directory, or if this container's
id exists in Synapse but is not a Folder or Project.
SynapseHTTPError: If this container's id does not exist in
Synapse.

Example: Generate a manifest and upload the files
Mirror ./my_data under a Synapse project and then upload it.

```python
from synapseclient import Synapse
from synapseclient.models import Project

syn = Synapse()
syn.login()

project = Project(id="syn12345")
project.generate_sync_manifest(
directory_path="./my_data",
manifest_path="manifest.csv",
)
project.sync_to_synapse(manifest_path="manifest.csv")
```
"""
return None
Loading
Loading