Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

vikrambohra · 2024-11-19T00:46:16Z

Summary

This PR introduces the following changes

Fix CTAS for non-hdfs storage types
While extracting UUID from a snapshot, the code constructs the database path excluding the endpoint (scheme) when checking if it is a prefix of the manifestList that is part of the snapshot

Example
ManifestList (from snapshot): s3://bucket-name/database/table-uuid/file.avro
Database prefix: bucket-name/database (not a prefix of above)

Fix: Strip the endpoint(scheme) from the manifest list by resolving the correct storage from the tableLocation

After fix
ManifestList stripped : bucket-name/database/table-uuid/file.avro
Database prefix: bucket-name/database (is a prefix of above)

Fix multiple storage scenario
There are assumptions in code that storage is always cluster default. This fails when the default is a storage without scheme (hdfs) and the db.table is being stored in a storage with scheme (S3, BlobFs)

Fix: Resolve the correct storage for by extracting the tableLocation from table props and checking the scheme (endpoint)

Adds a method to storage interface to check if the tableLocation exists
Add a method to StorageClient to check if a specified path exists.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

Updated TableUUIDGeneratorTest
Added TableUUIDGeneratorMultiStorageTest

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

HotSushi

Agree with Storage API changes. One feedback on not introducing iceberg at rest layer.

HotSushi · 2024-11-21T11:21:28Z

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

+  public boolean pathExists(String path) {
+    try {
+      return fs.exists(new Path(path));
+    } catch (IOException e) {


does fs.exists throw an IOException for non-existing paths? thats weird

Looks like it returns true or false: https://github.com/apache/hadoop/blob/cd2cffe73f909a106ba47653acf525220f2665cf/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1860

But could throw an IOException if getFileStatus() throws an IOException.

I am wondering if it could throw an IOException in scenarios like HDFS is unavailable but the file exists?

HotSushi · 2024-11-21T11:31:54Z

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

-            storageManager, dbIdFromProps, tblIdFromProps, tableUUIDProperty);
-    if (TableType.REPLICA_TABLE != tableType && !doesPathExist(previousPath)) {
-      log.error("Previous tableLocation: {} doesn't exist", previousPath);
+    Storage storage = catalog.resolveStorage(TableIdentifier.of(dbIdFromProps, tblIdFromProps));


resolveStorage will use SelectStorage if the entry doesn't exist in HTS. That code is not guaranteed to be idempotent during migration, lets avoid using selectStorage again in the consecutive COMMIT/PutSnapshot call.

Can we instead use tableLocation and derive storage from there? ie:
if s3://a/b/c -> S3Storage
if abs://a/b/c -> ABSStorage
if a/b/c -> HDFSStorage

done. Please check. there is a catch though, because both local and hdfs storage have same paths

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/BaseStorage.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/BaseStorage.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/Storage.java

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

jainlavina · 2024-11-21T18:24:59Z

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/adls/AdlsStorageClient.java

+
+  @Override
+  public boolean pathExists(String path) {
+    // TODO: Support pathExists on ADLS


Use exists()?
https://learn.microsoft.com/en-us/java/api/com.azure.storage.file.datalake.datalakepathclient?view=azure-java-stable#com-azure-storage-file-datalake-datalakepathclient-exists()

Can we do this as part of separate PR? DLFC is not initialized: #148

nit: should we then throw not implement exception instead?

UnsupportedOperationException is what we throw usually for such cases. I've rephrased the exception msg to mention its not implemented yet.

jainlavina · 2024-11-21T18:34:06Z

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

+  public boolean pathExists(String path) {
+    try {
+      return fs.exists(new Path(path));
+    } catch (IOException e) {


Looks like it returns true or false: https://github.com/apache/hadoop/blob/cd2cffe73f909a106ba47653acf525220f2665cf/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1860

But could throw an IOException if getFileStatus() throws an IOException.

I am wondering if it could throw an IOException in scenarios like HDFS is unavailable but the file exists?

jainlavina · 2024-11-21T18:42:20Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/s3/S3StorageClient.java

@@ -55,4 +56,11 @@ public S3Client getNativeClient() {
  public StorageType.Type getStorageType() {
    return S3_TYPE;
  }
+
+  @Override
+  public boolean pathExists(String path) {


What is the intended usage of this existence check?

Do you want to check whether the bucket exists or not?

Or, do you want to check whether any blob for the table has been created yet or not?

The two checks above are very different. For 1 you should check existence of bucket. For 2 you should check existence of objects with a prefix that way you are doing it now.

I would imagine that you intend to check that the bucket or root location has been created and hence 1 and not 2. Is that right?

The intention is to whether directory exists so it is check 2.
Current code passes the absolute path until table
-> blobfs://bucket_name/data/openhouse/db/t-uuid (blobfs)
-> s3://bucket_name/data/openhouse/db/t-uuid (s3)
-> /data/openhouse/db/t-uuid (hdfs)

This is no longer valid. See #256 (comment)

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java

vikrambohra · 2024-11-22T02:00:18Z

@HotSushi @jainlavina Addressed the comments. Some changes in the latest commit

Removed tableLocationExists() from the storage api - we dont need to construct the table location since we fetch it from table properties.
Changed the pathExists in storageClient api to fileExists to be explicit about the check since we are now checking the absolute path of metadata,json file (table location) instead of only the path till table directory.

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

teamurko · 2024-11-22T16:40:19Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

+   * @param path absolute path to a file including scheme
+   * @return true if path exists else false
+   */
+  boolean fileExists(String path);


Suggested change

boolean fileExists(String path);

boolean exists(Path path);

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageManager.java

teamurko · 2024-11-22T16:58:59Z

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

+   * @return true if path exists else false
+   */
+  @Override
+  public boolean fileExists(String path) {


this could be default impl and go to BaseStorageClient

Lets keep BaseStorageClient FileSystem agnostic.

jainlavina · 2024-11-22T20:45:00Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageClient.java

+   * scheme. Scheme is not prefix for local and hdfs storage. See:
+   * https://github.com/linkedin/openhouse/issues/121
+   *
+   * @param path absolute path to a file including scheme


So this comment is not always true because the scheme may not have been passed along for hdfs files?
In that case, may be clarify that the path may or not have scheme specified. If scheme is not specified then it is assumed to be hdfs. For all other storage types, scheme must be specified.

jainlavina · 2024-11-22T20:47:06Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/StorageManager.java

+  public Storage getStorageFromPath(String path) {
+    for (Storage storage : storages) {
+      if (storage.isConfigured()) {
+        if (StorageType.LOCAL.equals(storage.getType())) {


Shouldn't this be fallback only if path does not start with any other configured endpoint?
What if the path is an S3 storage path but local storage is also configured for some other tables?

jainlavina · 2024-11-22T20:47:45Z

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/adls/AdlsStorageClient.java

+
+  @Override
+  public boolean pathExists(String path) {
+    // TODO: Support pathExists on ADLS


nit: should we then throw not implement exception instead?

jainlavina · 2024-11-22T20:49:09Z

...ter/storage/src/main/java/com/linkedin/openhouse/cluster/storage/hdfs/HdfsStorageClient.java

@@ -48,4 +49,20 @@ public FileSystem getNativeClient() {
  public StorageType.Type getStorageType() {
    return HDFS_TYPE;
  }
+
+  /**
+   * Checks if the path exists on the backend storage. Scheme is not prefix for local and hdfs


nit: I am having a hard time parsing the language "scheme is not prefix". Would you please consider paraphrasing to something like "Scheme is not specified in the path for local and hdfs storage".

jainlavina · 2024-11-22T20:56:05Z

cluster/storage/src/main/java/com/linkedin/openhouse/cluster/storage/s3/S3StorageClient.java

+  public boolean fileExists(String path) {
+    try {
+      HeadObjectRequest headObjectRequest =
+          HeadObjectRequest.builder().bucket(getRootPrefix()).key(path).build();


Did you check the headObject API spec that passing the full path including scheme and bucket name as key will work? It usually expects object name without the bucket name etc.

vikrambohra requested review from jainlavina and HotSushi November 19, 2024 00:46

vikrambohra added 2 commits November 19, 2024 16:19

Fix CTAS for non-hdfs storages, also fixes multi storage cases

712f8c3

Strip endpoint instead of appending during prefix comparison

1a49f14

vikrambohra force-pushed the fixCtasMultiStorage branch from e15e1dc to 1a49f14 Compare November 20, 2024 00:20

HotSushi reviewed Nov 21, 2024

View reviewed changes

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java Outdated Show resolved Hide resolved

HotSushi reviewed Nov 21, 2024

View reviewed changes

services/tables/src/main/java/com/linkedin/openhouse/tables/utils/TableUUIDGenerator.java Outdated Show resolved Hide resolved

HotSushi requested changes Nov 21, 2024

View reviewed changes

jainlavina reviewed Nov 21, 2024

View reviewed changes

Use tableLocation to parse storage instead of catalog

9580691

teamurko reviewed Nov 22, 2024

View reviewed changes

cosmetic changes

9d13d54

jainlavina reviewed Nov 22, 2024

View reviewed changes

fixed test cases and comments

75b3c04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

vikrambohra commented Nov 19, 2024 •

edited

Loading

HotSushi left a comment

HotSushi Nov 21, 2024

jainlavina Nov 21, 2024

HotSushi Nov 21, 2024

vikrambohra Nov 22, 2024

jainlavina Nov 21, 2024

vikrambohra Nov 22, 2024

jainlavina Nov 22, 2024

jainlavina Nov 22, 2024

vikrambohra Nov 23, 2024

jainlavina Nov 21, 2024

jainlavina Nov 21, 2024

vikrambohra Nov 21, 2024

vikrambohra Nov 22, 2024

vikrambohra commented Nov 22, 2024 •

edited

Loading

teamurko Nov 22, 2024

teamurko Nov 22, 2024

vikrambohra Nov 22, 2024

jainlavina Nov 22, 2024

vikrambohra Nov 23, 2024

jainlavina Nov 22, 2024

jainlavina Nov 22, 2024

jainlavina Nov 22, 2024

jainlavina Nov 22, 2024

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

Are you sure you want to change the base?

Fix CTAS for non-hdfs storages, also fixes multi storage cases #256

Conversation

vikrambohra commented Nov 19, 2024 • edited Loading

Summary

Changes

Testing Done

Additional Information

HotSushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikrambohra commented Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikrambohra commented Nov 19, 2024 •

edited

Loading

vikrambohra commented Nov 22, 2024 •

edited

Loading