Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 2 additions & 28 deletions modules/querying/pages/data-types.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -591,39 +591,13 @@ A `FILE` object is a sequential data storage object, associated with a text file
When referring to a `FILE` object, we always capitalize the word `FILE` to distinguish it from ordinary files.
====

See xref:declaration-and-assignment-statements.adoc#_file_objects[FILE Object Declaration] to declare a `FILE` object.

=== Local disk file
When a `FILE` object is declared, associated with a particular text file, any existing content in the text file will be erased.
During the execution of the query, content written to the `FILE` will be appended to the `FILE`.
When the query where the `FILE` was declared finishes running, the `FILE` contents are saved to the text file.

=== S3 object
==== Define s3 file object
The path should start with `s3://`, followed by the bucket name and the S3 path, e.g., `s3://bucket-name/queryoutput/output.csv`. During the execution of the query, content will be uploaded to the S3 bucket. Note that the S3 object cannot be modified or appended, if an S3 object with the same path already exists, it will be overwritten.

==== Set S3 connection credentials
The S3 credentials can be set as GSQL session parameters, so they persist for a user for a full session.
[source,gsql]
----
set s3_aws_access_key_id = <AWS_KEY_ID>;
set s3_aws_secret_access_key = <AWS_ACCESS_KEY>;
----

These session parameters should be set within the GSQL Editor to enable read/write access to the specified S3 bucket for query results. Replace `<AWS_KEY_ID>` and `<AWS_ACCESS_KEY>` with your actual AWS credentials.

==== Output
Since S3 is a shared storage system, multiple nodes in a cluster can upload to the same S3 bucket. To handle potential conflicts and ensure unique output files, the S3 path can include a suffix based on the instance name, such as `\_GPE_{PartitionId}_{ReplicaId}`. For distributed queries, additional suffixes will be used to differentiate between the manager and worker roles on the same GPE. Specifically, suffixes like `_coordinator` and `_worker` will be added, where `_coordinator` refers to the worker manager and `_worker` refers to the worker node.

==== Error code
For S3 bucket connection errors, refer to error code `GSQL-5301`.

[NOTE]
====
A `FILE` object can be passed as a parameter to another query.
When a query receives a `FILE` object as a parameter, for a file on the local machine, it can append data to that `FILE`, as can every other query which receives this FILE object as a parameter.
However, an S3 bucket `FILE` object cannot be appended to.
When you write to an S3 path, any existing object will be overwritten.
====

== Query parameter types

A query can have one or more input parameters having any of the following types:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ Therefore, `S` must be declared as an ANY-type vertex set variable.
[#_file_objects]
=== `FILE` objects

A `FILE` object is a sequential text storage object, associated with a text file on the local machine.
A `FILE` object is a sequential text storage object, associated with a text file on the local machine or with an S3 bucket.

.EBNF for FILE object declaration
[source,ebnf]
Expand All @@ -419,6 +419,34 @@ See xref:querying:output-statements-and-file-objects.adoc#_file_println_statemen
include::appendix:example$work_net/declaration_file_object_query.gsql[]
----

=== S3 object
==== Define s3 file object
The path should start with `s3://`, followed by the bucket name and the S3 path, e.g., `s3://bucket-name/queryoutput/output.csv`. During the execution of the query, content will be uploaded to the S3 bucket. Note that the S3 object cannot be modified or appended, if an S3 object with the same path already exists, it will be overwritten.

==== Set S3 connection credentials
The S3 credentials can be set as GSQL session parameters, so they persist for a user for a full session.
[source,gsql]
----
set s3_aws_access_key_id = <AWS_KEY_ID>;
set s3_aws_secret_access_key = <AWS_ACCESS_KEY>;
----

These session parameters should be set within the GSQL Editor to enable read/write access to the specified S3 bucket for query results. Replace `<AWS_KEY_ID>` and `<AWS_ACCESS_KEY>` with your actual AWS credentials.

==== Output
Since S3 is a shared storage system, multiple nodes in a cluster can upload to the same S3 bucket. To handle potential conflicts and ensure unique output files, the S3 path can include a suffix based on the instance name, such as `\_GPE_{PartitionId}_{ReplicaId}`. For distributed queries, additional suffixes will be used to differentiate between the manager and worker roles on the same GPE. Specifically, suffixes like `_coordinator` and `_worker` will be added, where `_coordinator` refers to the worker manager and `_worker` refers to the worker node.

==== Error code
For S3 bucket connection errors, refer to error code `GSQL-5301`.

[NOTE]
====
A `FILE` object can be passed as a parameter to another query.
When a query receives a `FILE` object as a parameter, for a file on the local machine, it can append data to that `FILE`, as can every other query which receives this FILE object as a parameter.
However, an S3 bucket `FILE` object cannot be appended to.
When you write to an S3 path, any existing object will be overwritten.
====

== Assignment and Accumulate Statements

Assignment statements are used to set or update the value of a variable after it has been declared.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,7 @@ GSQL > RUN QUERY print_example_v2("person1")
=== Printing CSV to a FILE Object

Instead of printing output in JSON format, output can be written to a `FILE` object in comma-separated values (CSV) format by appending the keyword `TO_CSV` followed by the `FILE` object name to the `PRINT` statement:
The FILE object can be a local file or a S3 bucket storage object, allowing flexibility in how and where the output is stored.
The FILE object can be a local file or a S3 bucket storage object, allowing flexibility in how and where the output is stored. See xref:declaration-and-assignment-statements.adoc#_file_objects[FILE Object Declaration] to declare a `FILE` object.

[source,gsql]
----
Expand Down