Skip to content

Commit

Permalink
Merge pull request #61 from groboclown/branch-support-poc
Browse files Browse the repository at this point in the history
  • Loading branch information
twarit-waikar authored Aug 21, 2023
2 parents 3ee4824 + 1bb7979 commit fbd4cc3
Show file tree
Hide file tree
Showing 29 changed files with 1,672 additions and 256 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ helix-core-api/
# Depot conversion results
clones/
*.log

# Testing failures
core
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This tool solves some of the most impactful scaling and performance limitations

## Performance

Please be aware that this tool is fast enough to instantaneously generate a tremendous amount of load on your Perforce server (more than 150K requests in a few seconds if running with a couple hundred network threads). Since p4-fusion will continue generating load within the limits set using the runtime arguments, it needs careful monitoring to ensure that your Perforce server does not get impacted.
Please be aware that this tool is fast enough to instantaneously generate a tremendous amount of load on your Perforce server (more than 150K requests in a few seconds if running with a couple hundred network threads). Since p4-fusion will continue generating load within the limits set using the runtime arguments, it needs careful monitoring to ensure that your Perforce server does not get impacted.

However, having no rate limits and running this tool with several hundred network threads (or more if possible) is the ideal case for achieving maximum speed in the conversion process.

Expand Down Expand Up @@ -44,6 +44,12 @@ These execution times are expected to scale as expected with larger depots (mill
--lookAhead [Required]
How many CLs in the future, at most, shall we keep downloaded by the time it is to commit them?

--branch [Optional]
A branch to migrate under the depot path. May be specified more than once. If at least one is given and the noMerge option is false, then the Git repository will include merges between branches in the history. You may use the formatting 'depot/path:git-alias', separating the Perforce branch sub-path from the git alias name by a ':'; if the depot path contains a ':', then you must provide the git branch alias.

--noMerge [Optional, Default is false]
When false and at least one branch is given, then . If this is true, then the Git history will not contain any merges, except for an artificial empty commit added at the root, which acts as a common source to make later merges easier.

--maxChanges [Optional, Default is -1]
Specify the max number of changelists which should be processed in a single run. -1 signifies unlimited range.

Expand Down Expand Up @@ -75,6 +81,28 @@ These execution times are expected to scale as expected with larger depots (mill
Specify which P4USER to use. Please ensure that the user is logged in.
```
## Notes On Branches
When at least one branch argument exists, the tool will enable branching mode.
Branching mode currently only supports very simple branch layouts. The format must be `//common/depot/path/branch-name`. The common depot path is given as the `--path` argument, and each `--branch` argument specifies one branch name to inspect. Branch names must be a directory name immediately after the path (it replaces the `...`).
In branching mode, the generated Git repository will be initially populated with a zero-content commit. This allows branches to later be merged without needing the `--allow-unrelated-histories` flag in Git. All branches will have this in their history.
If a Perforce changelist contains an integration like action (move, integrate, copy, etc.) from another branch listed in a `--branch` argument, then the tool will mark the Git commit with the integration as having two parents - the current branch and the source branch. If a changelist contains integrations into one branch from multiple other branches, they are put into separate commits, each with just one source branch. If a changelist contains integrations into multiple branches, then each one of those is also its own commit.
Because Perforce integration isn't a 1-to-1 mapping onto Git merge, there can be situations where having the tool mark a commit as a merge, but not bringing over all the changes, leads to later merge logic not picking up every changed file correctly. To avoid this situation, the `--noMerge true` will ensure they only have the single zero-content root commit shared, so any merge done after the migration will force full file tree inspection.
If the Perforce tree contains sub-branches, such as `//base/tree/sub` being a sub-branch of `//base/tree`, then you can use the arguments `--path //base/... --branch tree/sub:tree-sub --branch tree`. The ordering is important here - provide the deeper paths first to have them take priority over the others. Because Git creates branches with '/' characters as implicit directories, you must provide the Git branch alias to prevent Git reporting an error where the branch "tree" can't be created because is already a directory, or "tree/sub" can't be created because "tree" isn't a directory.
## Checking Results
In order to test the validity of the logic, we need to run the program over a Perforce depot and compare each changelist against the corresponding Git commit SHA, to ensure the files match up.
The provided script [validate-migration.sh](validate-migration.sh) runs through every generated Git commit, and ensures the file state exactly matches the state of the Perforce depot.
Because of the extra effort the script performs, expect it to take orders of magnitude longer than the original p4-fusion execution.
## Build
0. Pre-requisites
Expand Down
281 changes: 281 additions & 0 deletions p4-fusion/branch_set.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
/*
* Copyright (c) 2022 Salesforce, Inc.
* All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
* For full license text, see the LICENSE.txt file in the repo root or https://opensource.org/licenses/BSD-3-Clause
*/
#include "branch_set.h"
#include <map>

static const std::string EMPTY_STRING = "";
static const std::array<std::string, 2> INVALID_BRANCH_PATH { EMPTY_STRING, EMPTY_STRING };

std::vector<std::string> BranchedFileGroup::GetRelativeFileNames()
{
std::vector<std::string> ret;
for (auto& fileData : files)
{
ret.push_back(fileData.GetRelativePath());
}
return ret;
}

ChangedFileGroups::ChangedFileGroups()
: totalFileCount(0)
{
}

ChangedFileGroups::ChangedFileGroups(std::vector<BranchedFileGroup>& groups, int totalFileCount)
: totalFileCount(totalFileCount)
{
branchedFileGroups = std::move(groups);
}

void ChangedFileGroups::Clear()
{
for (auto& fileGroup : branchedFileGroups)
{
for (auto& file : fileGroup.files)
{
file.Clear();
}
fileGroup.files.clear();
fileGroup.sourceBranch.clear();
fileGroup.targetBranch.clear();
}
totalFileCount = 0;
}

Branch::Branch(const std::string& branch, const std::string& alias)
: depotBranchPath(branch)
, gitAlias(alias)
{
if (depotBranchPath.empty())
{
throw std::invalid_argument("branch name is empty");
}
if (gitAlias.empty())
{
throw std::invalid_argument("branch alias is empty");
}
}

std::array<std::string, 2> Branch::SplitBranchPath(const std::string& relativeDepotPath) const
{
if (
// The relative depot branch, to match this branch path, must start with the
// branch path + "/". The "StartsWith" is put at the end of the 'and' checks,
// because it takes the longest.
relativeDepotPath.size() > depotBranchPath.size()
&& relativeDepotPath[depotBranchPath.size()] == '/'
&& STDHelpers::StartsWith(relativeDepotPath, depotBranchPath))
{
return { gitAlias, relativeDepotPath.substr(depotBranchPath.size() + 1) };
}
return { "", "" };
}

Branch createBranchFromPath(const std::string& depotBranchPath)
{
std::string branchPath = std::string(depotBranchPath);
std::string alias = std::string(depotBranchPath);

// The formatting using a ':' to separate the branch path from the git alias MUST be
// the last ':' in the string. This allows the command to work with branch paths that contain
// a ':' character, as long as the git alias does NOT contain a ':', and it implies that the git
// alias MUST be given.
size_t pos = depotBranchPath.rfind(':');
if (pos > 0 && depotBranchPath.size() > pos)
{
branchPath.erase(pos);
alias.erase(0, pos + 1);
}

STDHelpers::StripSurrounding(branchPath, '/');
STDHelpers::StripSurrounding(alias, '/');
return Branch(branchPath, alias);
}

std::vector<Branch> createBranchesFromPaths(const std::vector<std::string>& branches)
{
std::vector<Branch> parsed;
for (auto& branch : branches)
{
parsed.push_back(createBranchFromPath(branch));
}
return parsed;
}

BranchSet::BranchSet(std::vector<std::string>& clientViewMapping, const std::string& baseDepotPath, const std::vector<std::string>& branches, const bool includeBinaries)
: m_branches(createBranchesFromPaths(branches))
, m_includeBinaries(includeBinaries)
{
m_view.InsertTranslationMapping(clientViewMapping);
if (STDHelpers::EndsWith(baseDepotPath, "/..."))
{
// Keep the final '/'.
m_basePath = baseDepotPath.substr(0, baseDepotPath.size() - 3);
}
else if (baseDepotPath.back() != '/')
{
throw std::invalid_argument("Bad base depot path format: " + baseDepotPath);
}
else
{
m_basePath = baseDepotPath;
}
}

std::array<std::string, 2> BranchSet::splitBranchPath(const std::string& relativeDepotPath) const
{
// Check if the relative depot path starts with any of the branches.
// This checks the branches in their stored order, which can mean that having a branch
// order like "//a/b/c" and "//a/b" will only work if the sub-branches are listed first.
// To do this properly, the stored branches should be scanned based on their length - longest
// first, but that's extra processing and code for a use case that is rare and has a manual
// work around (list branches in a specific order).
for (auto& branch : m_branches)
{
auto split = branch.SplitBranchPath(relativeDepotPath);
if (!split[0].empty() && !split[1].empty())
{
return split;
}
}
return { "", "" };
}

std::string BranchSet::stripBasePath(const std::string& depotPath) const
{
if (STDHelpers::StartsWith(depotPath, m_basePath))
{
// strip off the leading '/', too.
return depotPath.substr(m_basePath.size());
}
return EMPTY_STRING;
}

struct branchIntegrationMap
{
std::vector<BranchedFileGroup> branchGroups;
std::unordered_map<std::string, int> branchIndicies;
int fileCount = 0;

void addMerge(const std::string& sourceBranch, const std::string& targetBranch, const FileData& rev);
void addTarget(const std::string& targetBranch, const FileData& rev);

// note: not const, because it cleans out the branchGroups.
std::unique_ptr<ChangedFileGroups> createChangedFileGroups() { return std::unique_ptr<ChangedFileGroups>(new ChangedFileGroups(branchGroups, fileCount)); };
};

void branchIntegrationMap::addTarget(const std::string& targetBranch, const FileData& fileData)
{
addMerge(EMPTY_STRING, targetBranch, fileData);
}

void branchIntegrationMap::addMerge(const std::string& sourceBranch, const std::string& targetBranch, const FileData& fileData)
{
// Need to store this in the integration map, using "src/tgt" as the
// key. Because stream names can't have a '/' in them, this creates a unique key.
// source might be empty, and that's okay.
const std::string mapKey = sourceBranch + "/" + targetBranch;
const auto entry = branchIndicies.find(mapKey);
if (entry == branchIndicies.end())
{
const int index = branchGroups.size();
branchIndicies.insert(std::make_pair(mapKey, index));
branchGroups.push_back(BranchedFileGroup());
BranchedFileGroup& bfg = branchGroups[index];
bfg.sourceBranch = sourceBranch;
bfg.targetBranch = targetBranch;
bfg.hasSource = !sourceBranch.empty();
bfg.files.push_back(fileData);
}
else
{
branchGroups.at(entry->second).files.push_back(fileData);
}
fileCount++;
}

// Post condition: all returned FileData (e.g. filtered for git commit) have the relativePath set.
std::unique_ptr<ChangedFileGroups> BranchSet::ParseAffectedFiles(const std::vector<FileData>& cl) const
{
branchIntegrationMap branchMap;
for (auto& clFileData : cl)
{
FileData fileData(clFileData);

// First, filter out files we don't want.
const std::string& depotFile = fileData.GetDepotFile();
if (
// depot file should always be present.
// The left side of the client view is the depot side.
!m_view.IsInLeft(depotFile)
|| (!m_includeBinaries && fileData.IsBinary())
|| STDHelpers::Contains(depotFile, "/.git/") // To avoid adding .git/ files in the Perforce history if any
|| STDHelpers::EndsWith(depotFile, "/.git") // To avoid adding a .git submodule file in the Perforce history if any
)
{
continue;
}
std::string relativeDepotPath = stripBasePath(depotFile);
if (relativeDepotPath.empty())
{
// Not under the depot path. Shouldn't happen due to the way we
// scan for files, but...
continue;
}

// If we have branches, then possibly sort the file into a branch group.
if (HasMergeableBranch())
{
// [0] == branch name, [1] == relative path in the branch.
std::array<std::string, 2> branchPath = splitBranchPath(relativeDepotPath);
if (
branchPath[0].empty()
|| branchPath[1].empty())
{
// not a valid branch file. skip it.
continue;
}

// It's a valid destination to a branch.
// Make sure the relative path is set.
fileData.SetRelativePath(branchPath[1]);

bool needsHandling = true;
if (fileData.IsIntegrated())
{
// Only add the integration if the source is from a branch we care about.
// [0] == branch name, [1] == relative path in the branch.
std::array<std::string, 2> fromBranchPath = splitBranchPath(stripBasePath(fileData.GetFromDepotFile()));
if (
!fromBranchPath[0].empty()
&& !fromBranchPath[1].empty()

// Can't have source and target be pointing to the same branch; that's not
// a branch operation in the Git sense.
&& fromBranchPath[0] != branchPath[0])
{
// This is a valid integrate from a known source to a known target branch.
branchMap.addMerge(fromBranchPath[0], branchPath[0], fileData);
needsHandling = false;
}
}
if (needsHandling)
{
// Either not a valid integrate, or a normal operation.
branchMap.addTarget(branchPath[0], fileData);
}
}
else
{
// It's a non-branching setup.
// Make sure the relative path is set.
fileData.SetRelativePath(relativeDepotPath);
branchMap.addTarget(EMPTY_STRING, fileData);
}
}
return branchMap.createChangedFileGroups();
}
Loading

0 comments on commit fbd4cc3

Please sign in to comment.