-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new RFC for monorepo #31
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
- Feature Name: monorepo | ||
- Start Date: 2021-03-30 | ||
- RFC PR: [amundsen-io/rfcs#0030](/~https://github.com/amundsen-io/rfcs/pull/30) | ||
- Amundsen Issue: [amundsen-io/amundsen#0000](/~https://github.com/amundsen-io/amundsen/issues/0000) (leave this empty for now) | ||
|
||
# Move to Monorepo | ||
|
||
## Summary | ||
|
||
At the moment, Amnundsen project is divided into 6 repositories: | ||
- amundsen | ||
- amundsenfrontendlibrary | ||
- amundsenmetadatalibrary | ||
- amundsensearchlibrary | ||
- amundsencommon | ||
- amudnsendatabuilder | ||
|
||
This RFC proposes merging all of these repositories into a single repo. | ||
|
||
There will be no change to the built artifacts. Users who use PyPi packages or Docker images will have no change to their deployments. Users with forks or submodule based customization will have to make minor changes to their CI/CD pipelines to pull from the correct directories, but the required changes will be minor. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Smooth change management, nice ! However, with current structure, when someone is using the submodule approach can freely choose versions (even refs) for each product separately (I can use frontend in different version than metadata and search). This change somehow limits that capability. It's worth to explicitly mention it in drawbacks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, good point, added |
||
|
||
|
||
## Motivation | ||
|
||
The submodule architecture creates the following issues and is making it hard for companies to adopt Amundsen, and to customize it accordingly. It also causes substantial developer toil in the form of having to create multiple small PRs to merge one | ||
|
||
1. Dependency Management: Python packages are pretty much the same for each proxy, which makes it hard to sync across all the above repositories. As a result most of the time, each proxy is running its own version of the dependency. | ||
A few examples: | ||
|
||
| Package | amundsenmetadata | amundsenfrontend | | ||
| -------------- |:------------------:| -----------------:| | ||
| amundsen-common | >=0.8.1 | ==0.6.0 | | ||
| flake8 | ==3.5.0 | ==3.8.4 | | ||
| flake8-tidy-imports | ==1.1.0 | ==4.2.1 | | ||
|
||
and many more... This change will put all the packages in just one place, making practically to keep dependencies similar accross verisons, simply because it will be possible to open a single PR that updates multiple packages. Practically speaking this will improve the status quo slightly. | ||
|
||
1. Development Efforts: Having multiple repositories makes it really hard to implement a feature. Implementation and testing require efforts to synchronize and then code reviews, and finally, all the PRs across multiple repositories need to land in master at a certain time or at the same time. This change results in faster development, one PR to fix a bug or a feature, no dependency hell as we are facing today, hence attract more contributors. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just a question - do we have any hard data (like results of questionnaires) to back the validity of those issues ? or is it more the result of interacting with users and observation ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't -- we did the call for feedback in December or January from community members but only got a handful of detailed responses. This is from our interactions with many community members (both longer-time installs) as well as people new to the project. Both groups are impacted, but in different ways. |
||
|
||
1. Customization & Deployment: One of the frequently mentioned topics re Amundsen is the complexity of the architecture, and the efforts required to customize Amundsen. Having multiple repositories results in multiple components to install, customize, manage and fix. If you have forks, there is more work to apply your changes on a per-repo basis. Having a single repository to work from will reduce toil. | ||
|
||
1. Code Duplication: From requirements to config files, models, helper functions, exceptions handling, CI/CD pipelines, Docker files, and other repository management like licenses, PR/Issues templates, etc., is pretty much the same in all these microservices. Change in one repository deviates that repository from others at the moment. This change will move all the above into a single repository making it easy to change anything without keeping track of multiple repositories, and redundant code. | ||
|
||
|
||
### Contrib/incubating packages | ||
|
||
From time-to-time, the Amundsen project creates third party packages to allow substantial new additions to be developed by non-maintainers. The monorepo per se doesn't preclude us having incubator repositories, they can be handled similarly as today. | ||
|
||
We currently have two approaches for proxy development: | ||
|
||
1. 'old' proxies like neo4j, elasticsearch or atlas are kept directly in amundsenmetadata and amundsensearch respectively | ||
1. 'new' proxies like rds or gremlin are separate git repositories. | ||
|
||
Our intention in the monorepo would be to keep incubating repos separate only until they are stable, upon which they should be merged into the main repo. Any officially-supported proxies should get CR by maintainers, not exist outside of that process. | ||
|
||
|
||
## Transition Path | ||
|
||
The transition will be done on a new branch on the `amundsen` repo, named `main`. We will move all code there and leave `master` as the default branch in GitHub until the migration is complete (plus a waiting period). | ||
|
||
### Phase 1: | ||
We create a new branch on the `amundsen` repo, named `main`. All work that follows in this description will occur in this branch. | ||
|
||
### Phase 2: | ||
Raw code import and CI. We import all of the code as a snapshot into the repository. | ||
|
||
At this point, we migrate the CI/CD pipeline, and automated release mechanisms, from the consituent repositories to the main repository. They will publish packages using a `-beta` postfix. | ||
|
||
We will merge the codeowners files at the same time here. | ||
|
||
### Phase 3: | ||
We freeze code in all the submodules: no new PRs in. Existing PRs will have 1 week to land. We will assist authors in getting nearly-mergable PRs landed. For PRs that don't make it, we will provide instructions to port their patches to the new repo, if they choose to re-submit. | ||
|
||
### Phase 4: | ||
|
||
We re-copy all code from the submodules into the main repository using [/~https://github.com/shopsys/monorepo-tools](/~https://github.com/shopsys/monorepo-tools) to preserve git commit history for each of the branches. | ||
|
||
PRs against the main repo will now be allowed from the community. This means there was a 1 week period where no new PRs were accepted to the project. This is necessary for us to merge out the old remaining PRs, and to avoid any annoying merge conflicts once those land. | ||
|
||
The old repositories will now be frozen to new contributions from non-maintainers (maintainers can still create critical hotfixes if absolutely necessary). There will be a deprecation notice added to the top of the README for each subrepo. | ||
|
||
### Phase 4: | ||
We mark the `main` branch on GitHub as the main branch, so that new pulls use the new project. We update the documentation pipeline to build from the Main branch, so that the new documentation goes onto the website. | ||
|
||
|
||
### Phase 5: | ||
After 1 year, we will remove all of the sub-repos, as well as the `master` branch on the Amundsen repository. | ||
|
||
## How We Communicate This | ||
|
||
- We'll promote this RFC frequently on Amundsen Slack and other social media channels, to get feedback from the community. | ||
- During the community meetings, update on the progress and phases. | ||
- We'll write a Blog/Tutorial about this change on how it's recommended to do customizations (we have been workshopping it here https://medium.com/stemma/amundsen-deployment-best-practices-740a1800518e and would like to bring it upstream once it's better tested with the monorepo). | ||
|
||
## Drawbacks | ||
|
||
There is a transition cost both internally, and for certain users who do customizations. | ||
|
||
Currently, it is possible for implementors who do code-level customization to keep different versions of the various microservices. This will make that challenging or impossible within a single repo. | ||
|
||
|
||
## Alternatives | ||
|
||
We've discussed alternatives that also bundle together service-level | ||
|
||
|
||
## Unresolved questions | ||
|
||
None, yet | ||
|
||
## Future possibilities | ||
|
||
It may be desirable to make architectural changes after this code structure change is done, for example consolidating components or changing how shared dependencies are managed. However, we'd like to keep that out-of-scope for this RFC and keep those discussions separate (unless if it impacts this portion). | ||
|
||
We will also have the option to use [pip constraint files](https://pip.pypa.io/en/stable/user_guide/#constraints-files), which would likely help with the dependency issues we've encountered. This could be considered deeper and implemented as a PR after the transition occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to add amundsengremlin to the list as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention for
amundsengremlin
to get merged into one of the proxies, or is the long-term intention to leave it as a separate package? I'm not super familiar with the code split thereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a point that could have more attention in this rfc. We currently have two approaches for proxy development:
Maybe we should unify the approach ? afaik the idea with having proxies in different repositories made contributing easier - you could have a team that would be maintainers only for proxy repo, not core amundsen repos. @feng-tao would be more knowledgeable here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make more extensive use of codeowners files for each package, so we keep the ownership clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed some language to consider the incubating repos. Open to feedback!
My thought is the separate packages should only be used for incubating, unsupported packages. Once it it officially supported by the project, it should require CR and RFCs from approved maintainers. Otherwise, we will have officially-supported parts of the project that aren't necessarily governed by our normal processes (RFC and CR). If the outside contributor is still making meaningful contributions after the package is landed, we should consider suggesting them as a maintainer (in which case @Golodhros's suggestion of stronger codeowners would make sense -- it might make sense to have that new maintainer start as specialized to the specific area they were previously working on, and if they want to contribute outside of that, we add that on later via codeowner additions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently neptune and RDS backend is introduced as part of RFC process. Though I think it is easier for maintain with different repos. But I wouldn't call them incubating repos. This is same as Atlas which the Lyft engineering team built the whole integration with neo4j while atlas is maintained by @bolkedebruin ING team. Both are officially supported backend in Amundsen for long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense.
Can you elaborate on what is made easier here? I'd like to capture the benefits of those packages without the overhead of actually having separate repos, if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC here is to propose mono repo with different packages. For RDS, gremlin based, it seemed to be easier to maintain them separately given the people who developed those are not part of the core committer team if that makes sense. It will help to iterate those features faster without asking existing committer to signoff.
But once the repos/features are mature, we should bring them into same repo each of which to be a new package.