-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDAP-20884: Multi push operation #15385
Conversation
8481aa5
to
2b6800b
Compare
bc7ba31
to
86c5121
Compare
9834df2
to
a64a412
Compare
cdap-app-fabric/src/main/java/io/cdap/cdap/internal/app/sourcecontrol/PushAppsRequest.java
Outdated
Show resolved
Hide resolved
...in/java/io/cdap/cdap/sourcecontrol/operationrunner/InMemorySourceControlOperationRunner.java
Show resolved
Hide resolved
...l/src/main/java/io/cdap/cdap/sourcecontrol/operationrunner/SourceControlOperationRunner.java
Outdated
Show resolved
Hide resolved
appVersions.add(new AppVersion(detail.getName(), detail.getAppVersion())); | ||
} catch (IOException | NotFoundException e) { | ||
throw new SourceControlException( | ||
String.format("Failed to fetch details for app %s", appRef)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always retain the original exception as the cause when rethrowing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
cdap-source-control/src/main/java/io/cdap/cdap/sourcecontrol/RepositoryManager.java
Outdated
Show resolved
Hide resolved
@@ -227,8 +228,8 @@ public static void validateConfig(final SecureStore secureStore, | |||
* @throws SourceControlException when failed to get the fileHash before | |||
* push | |||
*/ | |||
public String commitAndPush(final CommitMeta commitMeta, | |||
final Path fileChanged) | |||
public Map<Path, String> commitAndPush(final CommitMeta commitMeta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a class for the result instead of Map<Path, String> to better represent what the result is and easier to include more information in future.
Map<Path, String> fileHashes = new HashMap<>(); | ||
for (Path fileChanged : filesChanged) { | ||
try { | ||
String fileHash = getFileHash(fileChanged, commit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like should have a batch getFileHashes
method to do it more efficiently by doing a single TreeWalk instead of scanning it multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. Added a new method getFileHashes to get the hashes using a single TreeWalk.
repositoryManager.cloneRemote(); | ||
} catch (GitAPIException | IOException e) { | ||
throw new GitOperationException(String.format("Failed to clone remote repository: %s", | ||
e.getMessage()), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically when wrapping an exception, you don't use the e.getMessage() to form the wrapping message, otherwise it just result it duplicated message in the stacktrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
9db1dba
to
b970e5f
Compare
// update git metadata for the pushed application | ||
applicationManager.updateSourceControlMeta(namespaceId, getUpdateMetaRequest(responses)); | ||
} catch (NotFoundException | BadRequestException | IOException | SourceControlException e) { | ||
throw new OperationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a problem in the existing PullAppsOperation too, but OperationException should take another exception as the cause. As it is now, the root cause will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the OperationException class constructor to take the cause too. Updated the usage in PullAppsOperation.
try (TreeWalk walk = new TreeWalk(git.getRepository())) { | ||
walk.addTree(commit.getTree()); | ||
walk.setRecursive(true); | ||
if (relativePaths.size() >= 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change to > 1
for consistency with rest of the project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
String fileHash = walk.getObjectId(0).getName(); | ||
Path filePath = Paths.get(walk.getPathString()); | ||
if (fileHash == null) { | ||
throw new SourceControlException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this actually happen? If so, when? Would be good to add a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception should never happen, as the getName method on jgit.lib.ObjectId always returns a non-null string. Removed this check (unnecessary).
@@ -417,6 +422,39 @@ private String getFileHash(final Path relativePath, final RevCommit commit) | |||
} | |||
} | |||
|
|||
private Map<Path, String> getFileHashes(Set<Path> relativePaths, RevCommit commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a Path in the input doesn't actually exist, what is the expected behavior? Does it throw an exception? Is it just not returned in the Map? In the Map but with empty file hash? Should add a javadoc describing the behavior, and also when each type of exception is thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a Path in the input does not actually exist, then it's not included in the Map at all. Only the paths that are found in the filtered TreeWalk are added to the Map.
This method does not throw any exception in this case. This case is handled in the method that uses it (i.e. the commitAndPush method). If the returned map has different number of entries than expected (based on the input), then a SourceControlException is thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add this information as a javadoc for the method.
walk.addTree(commit.getTree()); | ||
walk.setRecursive(true); | ||
if (relativePaths.size() >= 2) { | ||
walk.setFilter(OrTreeFilter.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the git library we're using, but I feel like this would likely be implemented in an inefficient way, where it does N comparisons for each element in the walk.
It would be more efficient and simpler (don't need the if-else) to create a PathFilter that just checks if the path is in the input set with Set.contains().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the library implementation. Yes, it does N comparisons for each element in the walk.
Updated the method to filter the paths during the tree traversal based on their existence in the set of relative paths.
throws NoChangesToPushException { | ||
private void cloneAndCreateBaseDir(RepositoryManager repositoryManager) { | ||
try { | ||
repositoryManager.cloneRemote(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this clone into the base directory? Confused why we need to create the directory later if it was already cloned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repository root directory (this gets cloned) and the scm base directory are different. The scm base directory is a directory inside the repository root, configurable per namespace. This base directory may not exist if the user is pushing apps from the namespace for the first time. That's why we need to create it, if it does not exist already.
) { | ||
cloneAndCreateBaseDir(repositoryManager); | ||
|
||
LOG.info("Pushing application configs for : {}", pushRequest.getApps()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a really long log line, it's better to just log the number of apps that will be pushed. If we want to log each one, have a separate log statement within the loop below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) throws NoChangesToPushException { | ||
// get relative paths for apps to be pushed | ||
// we pass a set as we should have list of unique paths. | ||
Map<AppVersion, Path> appRelativePaths = appsToPush.stream().collect(Collectors.toMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this method, we're creating this Map, another gitFileHashes Map, and a new HashSet that's passed into commitAndPush(). That's a lot of extra passes through the data, a bit of extra memory, and extra mapping complexity that makes the method harder to understand. If the code needs to do a bunch of converting back and forth of collections, it's an indication that the APIs probably need improvement.
For example, this could all be done in one pass if we change commitAndPush() to look like:
public <T extends Supplier<Path>> void commitAndPush(Iterator<T> commitFiles, BiConsumer<T, String> consumer);
and call it something like:
private static class PushFile implements Supplier<Path> {
private final AppVersion appVersion;
private final Path path;
public Path get() {
return path;
}
}
Iterator<PushFile> iter = Iterators.transform(appsToPush, appToPush -> new PushFile(appToPush, getFilePath(...));
repositoryManager.commitAndPush(iter, (pushFile, fileHash) -> response.add(new PushAppResponse(...)));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
writeAppDetail(repositoryManager, pushRequest.getApp()); | ||
|
||
// it should never be empty as in case of any error we will get an exception | ||
return commitAndPush( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can change the entire method to call multiPush() and return the first response instead of just this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the push
method to use multiPush
internally does not simplify the implementation, as these 2 methods are logically different (by design).
In case of single application push
method, the request object (of type PushAppOperationRequest
) already contains the ApplicationDetail
of the app to push.
However, the multiPush
method takes a request containing a set of app names, and an ApplicationManager
. The applicationManager
fetches the application details when necessary.
43c5371
to
a5359b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
responses = scmOpRunner.multiPush(pushReq, applicationManager); | ||
context.updateOperationResources(getResources(namespaceId, responses)); | ||
} catch (SourceControlException | NoChangesToPushException e) { | ||
throw new OperationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should pass e as the cause. When wrapping an exception we will basically always want to set the cause if there is one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -417,6 +422,39 @@ private String getFileHash(final Path relativePath, final RevCommit commit) | |||
} | |||
} | |||
|
|||
private Map<Path, String> getFileHashes(Set<Path> relativePaths, RevCommit commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add this information as a javadoc for the method.
while (walk.next()) { | ||
if (walk.isSubtree()) { | ||
walk.enterSubtree(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a little more readable to reduce one layer of nesting like:
if (walk.isSubtree()) {
...
continue;
}
Path filePath...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* operations | ||
*/ | ||
//TODO: CDAP-20371, Add retry logic here in case the head at remote moved while we are doing push | ||
private <T extends Supplier<Path>> void commitAndPush( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous comment was about changing the RepositoryManager.commitAndPush() method to take the Iterator and BiConsumer. That way this method doesn't need to create the extra collections like the current pathMapping Map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RepositoryManager.commitAndPush
needs to construct a set of paths from the iterator anyway, as the set is necessary to efficiently filter the files in the TreeWalk. Also, it makes sense for the RepositoryManager.commitAndPush
to return a Map of filepaths to hashes (as it's one operation, that adds multiple files to a commit and pushes) instead of taking a consumer to pass the results one by one. Therefore, updating this method did not improve readability much.
However, the suggested change made perfect sense for the commitAndPush
method in the runner, as it simplified the implementation of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment was mostly about avoiding all of these extra collections, as they add time, memory, and complexity. I don't think it's actually good that commitAndPush returns a Map, as it almost always forces the caller to eventually transform it into some other type of collection.
If we want it to return information about what it committed instead of passing in a consumer, a better approach would be for it to return a Collection<O>
and to take a BiFunction<T, String, O>
, where O
would be a PushAppResponse
in this class. I think the current implementation of commitAndPush would make that difficult, but see the other comment I made about getting the file hash right when adding the file instead of walking through the tree after committing.
a5359b7
to
7239fe1
Compare
@@ -240,24 +241,24 @@ public String commitAndPush(final CommitMeta commitMeta, | |||
"No changes have been made for the applications to push."); | |||
} | |||
|
|||
git.add().addFilepattern(fileChanged.toString()).call(); | |||
for (Path fileChanged : filesChanged) { | |||
git.add().addFilepattern(fileChanged.toString()).call(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to get the file hash from whatever this returns? I see there is a DirCache.getEntry().getObjectId(), but not sure if that's the same as what's happening in getFileHashes.
If so it would be better than doing the tree walk later, and would allow passing in an iterator of paths instead of the Set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked in test and looks like the hashes are the same. We can skip the tree walk. Thanks for the suggestion @albertshau
@GnsP Please do a reverify before making the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the commitandpush
take a BiFunction<T, String, O>
and return Collection<O>
. Updated all the test accordingly
* operations | ||
*/ | ||
//TODO: CDAP-20371, Add retry logic here in case the head at remote moved while we are doing push | ||
private <T extends Supplier<Path>> void commitAndPush( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment was mostly about avoiding all of these extra collections, as they add time, memory, and complexity. I don't think it's actually good that commitAndPush returns a Map, as it almost always forces the caller to eventually transform it into some other type of collection.
If we want it to return information about what it committed instead of passing in a consumer, a better approach would be for it to return a Collection<O>
and to take a BiFunction<T, String, O>
, where O
would be a PushAppResponse
in this class. I think the current implementation of commitAndPush would make that difficult, but see the other comment I made about getting the file hash right when adding the file instead of walking through the tree after committing.
7239fe1
to
cae5edd
Compare
public String commitAndPush(final CommitMeta commitMeta, | ||
final Path fileChanged) | ||
public <T, S extends Supplier<Path>> Collection<T> commitAndPush(CommitMeta commitMeta, Collection<S> filesChanged, | ||
BiFunction<S, String, T> hashConsumer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing doc for hashConsumer
and the @return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, otherwise lgtm
throw new GitOperationException( | ||
String.format("Failed to get fileHash for %s", fileChanged), | ||
e); | ||
List<T> output = new LinkedList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use an ArrayList. It basically always performs better than a LinkedList, even if it needs to get resized underneath. This is especially true when we know the size beforehand (like we do here) and can pass it in to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
5052709
to
8ec7c41
Compare
) { | ||
cloneAndCreateBaseDir(repositoryManager); | ||
|
||
LOG.info("Pushing application configs for {} apps.", pushRequest.getApps().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
List<PushFile> filesToPush = new LinkedList<>(); | ||
|
||
for (String appToPush : pushRequest.getApps()) { | ||
LOG.info("Pushing application configs for : {}.", appToPush); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
ApplicationReference appRef = new ApplicationReference(pushRequest.getNamespaceId(), | ||
appToPush); | ||
try { | ||
ApplicationDetail detail = appManager.get(appRef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop would be very slow? It is getting one app at a time from the worker remotely? We should add a batch method to the manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this operation uses only the LocalApplicationManager
, which gets the ApplicationDetail
in memory. The RemoteApplicationManager
will be used here, only when the corresponding LongRunningOperationRunner is implemented. @samdgupi can provide more information on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s why the implementation here is not safe. We should have a batch fetch method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK We currently have no API/ which can fetch details by list of ids. We had plan to do this optimization after initial release of the feature. I will add this a TODO and take it up if possible as part of current release.
8ec7c41
to
21bfc91
Compare
🍒 CDAP 20884 multi push operation (#15385)
Implement Push LRO