Skip to content

HADOOP-19795: ABFS. GetPathStatus Optimization on OpenFileForRead#8212

Open
manika137 wants to merge 17 commits intoapache:trunkfrom
ABFSDriver:HADOOP-19795_gpsOpt
Open

HADOOP-19795: ABFS. GetPathStatus Optimization on OpenFileForRead#8212
manika137 wants to merge 17 commits intoapache:trunkfrom
ABFSDriver:HADOOP-19795_gpsOpt

Conversation

@manika137
Copy link
Copy Markdown
Contributor

Description of PR

JIRA: https://issues.apache.org/jira/browse/HADOOP-19795

We do a getPathStatus call during file open for read. This call is primarily used to fetch the file’s metadata properties before the actual read begins.
We are now introducing an optional, config-driven read flow that avoids the getPathStatus call during open and instead derives required metadata from the read response itself.

How was this patch tested?

New tests were added and test suite was run. Adding the results in comments below

@manika137
Copy link
Copy Markdown
Contributor Author

Test Results

============================================================
HNS-OAuth-DFS

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 908, Failures: 0, Errors: 0, Skipped: 222
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 8
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey-DFS

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 911, Failures: 0, Errors: 0, Skipped: 168
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 8
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 10

============================================================
AppendBlob-HNS-OAuth-DFS

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 908, Failures: 0, Errors: 0, Skipped: 233
[WARNING] Tests run: 135, Failures: 0, Errors: 0, Skipped: 9
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-SharedKey-Blob

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 758, Failures: 0, Errors: 0, Skipped: 155
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 11

============================================================
NonHNS-OAuth-Blob

[ERROR] Tests run: 250, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 755, Failures: 0, Errors: 0, Skipped: 156
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 24

============================================================
AppendBlob-NonHNS-OAuth-Blob

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 750, Failures: 0, Errors: 0, Skipped: 202
[WARNING] Tests run: 135, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 24

============================================================
HNS-Oauth-DFS-IngressBlob

[WARNING] Tests run: 250, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 782, Failures: 0, Errors: 0, Skipped: 231
[WARNING] Tests run: 158, Failures: 0, Errors: 0, Skipped: 8
[WARNING] Tests run: 271, Failures: 0, Errors: 0, Skipped: 23

@hadoop-yetus
Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 5m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 54m 33s trunk passed
+1 💚 compile 1m 6s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 4s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 checkstyle 0m 54s trunk passed
+1 💚 mvnsite 1m 12s trunk passed
-1 ❌ javadoc 1m 2s /branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in trunk failed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 58s /branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in trunk failed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.
+1 💚 spotbugs 1m 46s trunk passed
+1 💚 shadedclient 33m 55s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 29s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 38s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 38s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 23s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 16 new + 2 unchanged - 0 fixed = 18 total (was 2)
+1 💚 mvnsite 0m 42s the patch passed
-1 ❌ javadoc 0m 30s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 30s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.
-1 ❌ spotbugs 1m 28s /new-spotbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 32m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 22s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
146m 41s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.contentLength; locked 80% of time Unsynchronized access at AbfsInputStream.java:80% of time Unsynchronized access at AbfsInputStream.java:[line 629]
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/1/artifact/out/Dockerfile
GITHUB PR #8212
JIRA Issue HADOOP-19795
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux a1c89b292bca 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e13fd56
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/1/testReport/
Max. process+thread count 570 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/1/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT
|| ((VersionedFileStatus) fileStatus).getEncryptionContext()
!= null)) {
getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional space changes can be reverted

contextEncryptionAdapter = new ContextProviderEncryptionAdapter(
getClient().getEncryptionContextProvider(), getRelativePath(path),
fileEncryptionContext.getBytes(StandardCharsets.UTF_8));
getClient().getEncryptionContextProvider(), getRelativePath(path),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

encryptionContext.getBytes(StandardCharsets.UTF_8));
}
} else {
if (parseIsDirectory(resourceType)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to common part as is getting checked in both the cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

* - restrictGpsOnOpenFile config is enabled with null FileStatus and encryptionType not as ENCRYPTION_CONTEXT
* In this case, we don't need to call GetPathStatus API.
*/
else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this lead to going ahead and opening the stream without checks? Do we fail later for this case ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the checks with this config would be happening after the first read then. If the read fails there then we throw the appropriate exception

INVALID_APPEND_OPERATION("InvalidAppendOperation", HttpURLConnection.HTTP_CONFLICT, null),
UNAUTHORIZED_BLOB_OVERWRITE("UnauthorizedBlobOverwrite", HttpURLConnection.HTTP_FORBIDDEN,
"This request is not authorized to perform blob overwrites."),
INVALID_RANGE("InvalidRange", 416,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

416 should come from a constant defined in HttpURLConnection class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont have a 416 defined in HttpURLConnection

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can define it in our constants class then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's define in AbfsHttpConstants

// Reset Read Type back to normal and set again based on code flow.
getTracingContext().setReadType(ReadType.NORMAL_READ);
if (shouldAlwaysReadBufferSize()) {
if(shouldRestrictGpsOnOpenFile() && isFirstRead()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

// Reset Read Type back to normal and set again based on code flow.
getTracingContext().setReadType(ReadType.NORMAL_READ);
if (shouldAlwaysReadBufferSize()) {
if(shouldRestrictGpsOnOpenFile() && isFirstRead()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for this condition as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

private final int footerReadSize; // default buffer size to read when reading footer
private final int readAheadQueueDepth; // initialized in constructor
private final String eTag; // eTag of the path when InputStream are created
private String eTag; // eTag of the path when InputStream are created
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: InputStream is created

}
}

String getRelativePath(final Path path) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc missing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

tracingContext,
contextEncryptionAdapter).getResult();

String resourceType =
Copy link
Copy Markdown
Contributor

@anmolanmol1234 anmolanmol1234 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use client.checkIsDir method which handles null case as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, taken

}
contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE).
split(AbfsHttpConstants.FORWARD_SLASH)[1]);
eTag = op.getResult().getResponseHeader("ETag");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use extractEtagHeader method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

if (Objects.equals(resourceType, DIRECTORY)) {
throw directoryReadException();
}
contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same use extractContentLen method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using content range here instead of content length to extract the actual length. This is to avoid partial reads length. Added a new method for it

if (ere.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) {
throw new FileNotFoundException(ere.getMessage());
int status = ere.getStatusCode();
if(ere.getErrorMessage().contains(readOnDirectoryErrorMsg)){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if


} catch (AzureBlobFileSystemException gpsEx) {
AbfsRestOperationException gpsEre = (AbfsRestOperationException) gpsEx;
if(gpsEre.getErrorMessage().contains(readOnDirectoryErrorMsg)){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

}

// Default: propagate original error
throw new IOException(ex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done once only on line 715

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

This is required since contentLength is not available yet to determine prefetch block size.
*/
bytesRead = readInternal(getFCursor(), getBuffer(), 0, getBufferSize(), false);
if(shouldRestrictGpsOnOpenFile() && isFirstRead()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

@steveloughran
Copy link
Copy Markdown
Contributor

  • Document that it means that if the file isn't found, there isn't a failure until the first read.
  • Make sure that openFile() builder code does this too if no status (or the wrong status type) is passed in, and do it automatically. That filesystem api spec says "existence checks may be delayed".

@manika137
Copy link
Copy Markdown
Contributor Author

manika137 commented Mar 24, 2026

  • Document that it means that if the file isn't found, there isn't a failure until the first read.
  • Make sure that openFile() builder code does this too if no status (or the wrong status type) is passed in, and do it automatically. That filesystem api spec says "existence checks may be delayed".

Sure. Will ensure functionality for all edge cases including file not found with wrong filestatus type or without filestatus. Will document the expectation.
The config here is false by default.

@hadoop-yetus
Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 52m 45s trunk passed
+1 💚 compile 1m 7s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 8s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 9s trunk passed
+1 💚 mvnsite 1m 16s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 3s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 56s trunk passed
+1 💚 shadedclient 37m 53s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 31s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 42s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 42s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 0m 42s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 29s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 12 new + 9 unchanged - 0 fixed = 21 total (was 9)
+1 💚 mvnsite 0m 47s the patch passed
-1 ❌ javadoc 0m 32s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt hadoop-azure in the patch failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌ javadoc 0m 32s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt hadoop-azure in the patch failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
-1 ❌ spotbugs 1m 25s /new-spotbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 33m 15s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 10s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
144m 34s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.contentLength; locked 69% of time Unsynchronized access at AbfsInputStream.java:69% of time Unsynchronized access at AbfsInputStream.java:[line 648]
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/2/artifact/out/Dockerfile
GITHUB PR #8212
JIRA Issue HADOOP-19795
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 3dcff4da3d04 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b173e07
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/2/testReport/
Max. process+thread count 633 (vs. ulimit of 10000)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/2/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

if (getBCursor() == getLimit()) {
// If EOF, then return -1
if (getFCursor() >= getContentLength()) {
if (!(shouldRestrictGpsOnOpenFile() && isFirstRead()) && getFCursor() >= getContentLength()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we decouple !(shouldRestrictGpsOnOpenFile() && isFirstRead()) from the other conditions?

Currently, if both shouldRestrictGpsOnOpenFile() and isFirstRead() are true, the entire expression evaluates to false. This prevents the function from returning -1 even if getFCursor() >= getContentLength() is true, allowing the execution to proceed incorrectly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, for first reads we have other validation in place

AbfsRestOperation statusOp = null;
try {
// Check if the file already exists by calling GetPathStatus
//TODO: GBP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Todo if it is taken care

// If the overwrite flag is true, we must verify whether an empty directory exists at the specified path.
// However, if overwrite is false, we can skip this validation and proceed with blob creation,
// which will fail with a conflict error if a file or directory already exists at the path.
//todo: GBP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

String resourceType = result.getResponseHeader(
HttpHeaderConfigurations.X_MS_RESOURCE_TYPE);
return StringUtils.equalsIgnoreCase(resourceType, DIRECTORY);
return resourceType != null && StringUtils.equalsIgnoreCase(resourceType, DIRECTORY);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not needed here as equalsIgnoreCase internally checks if resourceType is null or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, returns NPE if not present

}
}

private long extractContentLength(AbfsHttpOperation op) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java Doc missing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

/** ABFS instance to be held by the input stream to avoid GC close. */
private final BackReference fsBackRef;
private final ReadBufferManager readBufferManager;
private final String readOnDirectoryErrorMsg = "Read operation not permitted on a directory.";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define this error in AbfsErrors file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

if (getBCursor() == getLimit()) {
// If EOF, then return -1
if (getFCursor() >= getContentLength()) {
if (!(shouldRestrictGpsOnOpenFile() && isFirstRead()) && getFCursor() >= getContentLength()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, for first reads we have other validation in place

return new AbfsRestOperationException(
AzureServiceErrorCode.PATH_NOT_FOUND.getStatusCode(),
AzureServiceErrorCode.PATH_NOT_FOUND.getErrorCode(),
"openFileForRead must be used with files and not directories",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define this error String in AbfsErrors file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT
|| ((VersionedFileStatus) fileStatus).getEncryptionContext()
!= null)) {
|| ((VersionedFileStatus) fileStatus).getEncryptionContext()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can we reverted

String.format(
"Filestatus path [%s] does not match with given path [%s]",
fileStatus.getPath(), path));
"Filestatus path [%s] does not match with given path [%s]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can we reverted

* @param path the {@link Path} to convert; must not be null
* @return the relative path as a {@link String}; never null
*/
String getRelativePath(final Path path) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store class already has this method, can we not define it in a common place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the method static

private long extractContentLength(AbfsHttpOperation op) {
// We need to use content range header instead of content length to take care of partial reads
String contentRange = op.getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE);
if (!StringUtils.isEmpty(contentRange)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can contentRange header be null ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if null, isEmpty method takes care of it and returns true

if (!StringUtils.isEmpty(contentRange)) {
contentLength = Long.parseLong(contentRange.split(AbfsHttpConstants.FORWARD_SLASH)[1]);
}
else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to initialize contentLength with 0 and return as is instead of else

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

throw ere;
}
boolean isHnsEnabled = client.getIsNamespaceEnabled();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra lines can be removed

@hadoop-yetus
Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 20m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 56s trunk passed
+1 💚 compile 0m 59s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 0s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 0m 54s trunk passed
+1 💚 mvnsite 1m 6s trunk passed
+1 💚 javadoc 0m 57s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 55s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 31s trunk passed
+1 💚 shadedclient 33m 37s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 11s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 0m 35s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 24s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 2 new + 9 unchanged - 0 fixed = 11 total (was 9)
+1 💚 mvnsite 0m 37s the patch passed
-1 ❌ javadoc 0m 29s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt hadoop-azure in the patch failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt hadoop-azure in the patch failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
+1 💚 spotbugs 1m 17s the patch passed
+1 💚 shadedclient 33m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 9s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
151m 12s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/3/artifact/out/Dockerfile
GITHUB PR #8212
JIRA Issue HADOOP-19795
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux d94ac6822c2f 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4c184d8
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/3/testReport/
Max. process+thread count 589 (vs. ulimit of 10000)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/3/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Copy Markdown
Contributor

@anujmodi2021 anujmodi2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some integration tests as well IMO

INVALID_APPEND_OPERATION("InvalidAppendOperation", HttpURLConnection.HTTP_CONFLICT, null),
UNAUTHORIZED_BLOB_OVERWRITE("UnauthorizedBlobOverwrite", HttpURLConnection.HTTP_FORBIDDEN,
"This request is not authorized to perform blob overwrites."),
INVALID_RANGE("InvalidRange", 416,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's define in AbfsHttpConstants

if (shouldAlwaysReadBufferSize()) {

// If restrictGpsOnOpenFile config is enabled, skip prefetch for the first read since contentLength
// is not available yet to determine prefetch block size.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment seem to be misleading. prefetch block size is configurable, this should say "yet to determine how much to prefetch" maybe

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the comment

*
* @return an {@link AbfsRestOperationException} indicating the operation is not permitted on a directory
*/
private IOException directoryReadException() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an UnsupportedOperationException wrapping the server returned exception?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@hadoop-yetus
Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 43s trunk passed
+1 💚 compile 1m 1s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 0s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 0m 53s trunk passed
+1 💚 mvnsite 1m 4s trunk passed
+1 💚 javadoc 0m 59s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 56s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 35s trunk passed
+1 💚 shadedclient 33m 36s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 10s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 0m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 25s the patch passed
+1 💚 mvnsite 0m 38s the patch passed
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt hadoop-azure in the patch failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt hadoop-azure in the patch failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
+1 💚 spotbugs 1m 17s the patch passed
+1 💚 shadedclient 33m 8s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 9s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
131m 26s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/4/artifact/out/Dockerfile
GITHUB PR #8212
JIRA Issue HADOOP-19795
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux f545ce2af26e 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 63f7ac1
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/4/testReport/
Max. process+thread count 573 (vs. ulimit of 10000)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8212/4/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants