GCS Get Blobs

Tools

Description

Use this tool to retrieve blobs from Google Cloud Storage, and save them as local files.

This tool can retrieve single or multiple blobs, using filters to determine which ones to retrieve. Each file is checked with an MD5 hash to make sure there was no data corruption during the transfer. When retrieving blobs in subdirectories, the tool recreates the directory structure locally.

To use this tool, define a Google Cloud Storage source from which you want to retrieve blobs, and a local target folder to save to. You can define the target path manually using tool parameters.

Usage

  1. Add the process action TOOL GCS Get Blobs from the Process Palette, under the Tools section.

  2. Select a source and target:

    • Drag and drop one of the following Google Cloud Storage metadata nodes onto the <SOURCE> field of the tool:

      • Storage

      • Bucket

      • Folder

    • Optionally, drag and drop the following File metadata nodes onto the <TARGET> field of the tool:

      • Folder

  3. If you did not drag and drop a folder node as a target, define the target folder in the tool’s parameters.

  4. Set other tool parameters as needed.

The tool inherits parameters from the metadata node you drag onto it.

Parameters

Name Default Description

XPath Expression For Source

$SOURCE

A valid XPath expression referencing a Bucket to use as a source location. The expression can return a storage, bucket, or folder node from a Google Cloud Storage metadata object.

Source Bucket Name

Manual entry of the source bucket name.

You can omit this parameter if XPath Expression For Source returns a valid reference to a bucket or one of its children.

Source Directory Path

Manual entry of the source directory. You can omit this parameter if XPath Expression For Source returns a valid reference to a directory or one of its children, or if the bucket itself is the root directory.

For better performance, use a directory as the source, or set this parameter for any static subdirectories. For example, specify this:

  • Source Directory Path → tmp

  • Source Blob Includes → *.txt

instead of this:

  • Source Directory Path → <empty>

  • Source Blob Includes → tmp/*.txt

Source Blob Includes

A list of blobs to include in the operation, as a semicolon-separated list of blob masks. An empty value matches all blobs.

When the source is a directory, or if you set the Source Directory Path, the blob mask evaluates inside this directory.

The following wildcard characters are supported:

?

Matches one character in a segment of the blob’s path.

*

Matches zero or more characters in a segment of the blob’s path.

**

Matches zero or more segments of the blob’s path of the blob.

Examples:

  • to retrieve XML and JSON blobs in the current directory: *.xml;*.json

  • to retrieve XML blobs in any test subdirectory: **/test/*.xml

When this parameter is set, the tool ignores the Source Blob Name parameter.

Source Blob Excludes

A list of blobs to exclude from the operation, as a semicolon-separated list of blob masks. An empty value matches all blobs.

When the source is a directory, or if you set the Source Directory Path, the blob mask evaluates inside this directory.

The following wildcard characters are supported:

?

Matches one character in a segment of the blob’s path.

*

Matches zero or more characters in a segment of the blob’s path.

**

Matches zero or more segments of the blob’s path of the blob.

Examples:

  • to ignore XML and JSON blobs in the current directory: *.xml;*.json

  • to ignore XML blobs in any test subdirectory: **/test/*.xml

When this parameter is set, the tool ignores the Source Blob Name parameter.

Source Metadata

One or more key-value pairs to filter blobs based on their metadata in Google Cloud Storage. The tool only processes source blobs that match these values. You can set this parameter in the form of Java properties.

For instance:

metadata1=value1
metadata2=value2
#comment
metadata3=value3

For information about metadata in Google Cloud Storage, see the official documentation.

Source Blob Name

Full path of a blob to use as the source. Use this parameter when you want to perform an operation on a single blob.

When this parameter is set, the tool ignores the Source Directory Path, Source Blob Includes, Source Blob Excludes, and Source Metadata parameters.

XPath Expression For Target

$TARGET

A valid XPath expression referencing a directory to use as a target location. The expression can return a file or folder node from a File metadata object.

This parameter is ignored if you set the Target Folder Path parameter.

Target Folder Path

Manual entry of the target directory.

You can omit this parameter if XPath Expression For Target returns a valid reference to a directory or one of its children.

Target File Name

Full path of a target file to save to. Use this parameter when you want to retrieve a single blob.

This parameter only works if you also set the Source Blob Name.

Existing Files Behavior

overwrite

Controls the tool’s behavior if files already exist at the target location. Possible options are:

overwrite

The tool overwrites target files.

ignore

The tool leaves target files alone, with no error.

throwError

The tool leaves target files alone, and also throws an error.