Skip to content

fix: use ambient tokio runtime in ParquetObjectReader::spawn (#8231)#10168

Open
sandy-sachin7 wants to merge 1 commit into
apache:mainfrom
sandy-sachin7:fix/parquet-object-reader-tokio-spawn
Open

fix: use ambient tokio runtime in ParquetObjectReader::spawn (#8231)#10168
sandy-sachin7 wants to merge 1 commit into
apache:mainfrom
sandy-sachin7:fix/parquet-object-reader-tokio-spawn

Conversation

@sandy-sachin7

Copy link
Copy Markdown

Which issue does this PR close?

Closes #8231.

Rationale for this change

When ParquetObjectReader is constructed without an explicit runtime (i.e., via ParquetRecordBatchStreamBuilder::new(reader)), the spawn method falls through to the None branch and runs async closures inline. For object store implementations (S3, HTTP, GCS via object_store crate), this breaks connection pooling and DNS resolution because the underlying reqwest client relies on tokio::spawn to propagate the runtime context.

The result is a panic: "there is no reactor running, must be called from the context of a Tokio 1.x runtime"

What changes are included in this PR?

Changed ParquetObjectReader::spawn to first check Handle::try_current() when self.runtime is None, discovering an ambient tokio runtime if one exists. Only falls back to inline execution when no tokio runtime context is available at all.

Before:

match &self.runtime {
    Some(handle) => { handle.spawn(...) }
    None => { /* inline — breaks S3/HTTP */ }
}

After:

let handle = self.runtime.clone().or_else(|| Handle::try_current().ok());
match handle {
    Some(handle) => { handle.spawn(...) }
    None => { /* inline — only when no runtime at all */ }
}

Are these changes tested?

1166 parquet lib tests pass. No new tests added as the existing test infrastructure doesn't exercise the spawn() method with async object store implementations (those are integration-level tests).

Are there any user-facing changes?

ParquetObjectReader will now correctly discover a tokio runtime when used inside #[tokio::main] or any other tokio runtime context, even without an explicit runtime being configured.

…8231)

When ParquetObjectReader is constructed without an explicit runtime,
spawn() would run async closures inline — this breaks object store
implementations (S3, HTTP) that rely on tokio::spawn for connection
pooling and DNS resolution. The fix tries Handle::try_current() to
discover an ambient tokio runtime before falling back to inline
execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to create ParquetRecordBatchStreamBuilder due to runtime issues

1 participant