Skip to content

Commit

Permalink
Generate package set on-the-fly
Browse files Browse the repository at this point in the history
Instead of a library containing a manually written build-depends
corresponding to a stackage snapshot, we now have an executable that
queries stackage directly, and then uses the response to generate
the desired cabal file. The executable then builds that project.

The executable also includes the ability to split the package set into
smaller groups, where each group is built sequentially. This allows
for scenarios where building the entire set at once is not feasible,
at the cost of performance.

We also add 'postgresql-libpq' to linux/osx (requires postgres dep),
and 'hfsevents' to osx.
  • Loading branch information
tbidne authored and Bodigrim committed Oct 25, 2024
1 parent 0cb9711 commit be67e91
Show file tree
Hide file tree
Showing 42 changed files with 3,409 additions and 2,731 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.golden -text
59 changes: 59 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: ci
on:
push:
branches:
- master

pull_request:
branches:
- master

workflow_dispatch:
jobs:
cabal:
strategy:
fail-fast: false
matrix:
os:
- "macos-latest"
- "ubuntu-latest"
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: haskell-actions/setup@v2
with:
ghc-version: "9.8.2"
- name: Configure
run: |
cabal configure --enable-tests --ghc-options -Werror
- name: Build executable
run: cabal build clc-stackage

- name: Unit Tests
id: unit
run: cabal test unit

- name: Print unit failures
if: ${{ failure() && steps.unit.conclusion == 'failure' }}
run: |
cd test/unit/goldens
for f in $(ls); do
echo "$f"
cat "$f"
done
- name: Functional Tests
id: functional
run: cabal test functional

- name: Print functional failures
if: ${{ failure() && steps.functional.conclusion == 'failure' }}
run: |
cd test/functional/goldens
for f in $(ls); do
echo "$f"
cat "$f"
done
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
/dist-newstyle
/bin
/dist-newstyle
/generated/cabal.project.local
/generated/dist-newstyle
/generated/generated.cabal
/output
96 changes: 80 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## How to?

This is a meta-package to facilitate impact assessment for [CLC proposals](/~https://github.com/haskell/core-libraries-committee). The package `clc-stackage.cabal` lists almost entire Stackage as `build-depends`, so that `cabal build` transitively compiles them all.
This is a meta-package to facilitate impact assessment for [CLC proposals](/~https://github.com/haskell/core-libraries-committee).

An impact assessment is due when

Expand All @@ -13,32 +13,96 @@ An impact assessment is due when
The procedure is as follows:

1. Rebase changes, mandated by your proposal, atop of `ghc-9.8` branch.

2. Compile a patched GHC, say, `~/ghc/_build/stage1/bin/ghc`.
3. `git clone /~https://github.com/Bodigrim/clc-stackage`, then `cd clc-stackage`.
4. Run `cabal build -w ~/ghc/_build/stage1/bin/ghc --keep-going` and wait for a long time.
* On a recent Macbook Air it takes around 12 hours, YMMV.
* You can interrupt `cabal` at any time and rerun again later.
* Consider setting `--jobs` to retain free CPU cores for other tasks.
* Full build requires roughly 7 Gb of free disk space.
5. If any packages fail to compile:
* copy them locally using `cabal unpack`,
* patch to confirm with your proposal,
* link them from `packages` section of `cabal.project`,
* return to Step 4.
6. When everything finally builds, get back to CLC with a list of packages affected and patches required.

3. `git clone /~https://github.com/haskell/clc-stackage`, then `cd clc-stackage`.

4. Build the exe: `cabal install clc-stackage --installdir=./bin`.

> :warning: **Warning:** Use a normal downloaded GHC for this step, **not** your custom built one. Why? Using the custom GHC can force a build of many dependencies you'd otherwise get for free e.g. `vector`.
5. Uncomment and modify the `with-compiler` line in [generated/cabal.project](generated/cabal.project) e.g.

```
with-compiler: /home/ghc/_build/stage1/bin/ghc
```
6. Run `./bin/clc-stackage` and wait for a long time. See [below](#the-clc-stackage-exe) for more details.
* On a recent Macbook Air it takes around 12 hours, YMMV.
* You can interrupt `cabal` at any time and rerun again later.
* Consider setting `--jobs` to retain free CPU cores for other tasks.
* Full build requires roughly 7 Gb of free disk space.
To get an idea of the current progress, we can run the following commands
on the log file:
```sh
# prints completed / total packages in this group
$ grep -Eo 'Completed|^ -' output/logs/current-build/stdout.log | sort -r | uniq -c | awk '{print $1}'
110
182
# combine with watch
$ watch -n 10 "grep -Eo 'Completed|^ -' output/logs/current-build/stdout.log | sort -r | uniq -c | awk '{print \$1}'"
```
7. If any packages fail to compile:
* copy them locally using `cabal unpack`,
* patch to confirm with your proposal,
* link them from `packages` section of `cabal.project`,
* return to Step 6.
8. When everything finally builds, get back to CLC with a list of packages affected and patches required.
### The clc-stackage exe
Previously, this project was just a single (massive) cabal file that had to be manually updated. Usage was fairly simple: `cabal build clc-stackage --keep-going` to build the project, `--keep-going` so that as many packages as possible are built.
This has been updated so that `clc-stackage` is now an executable that will automatically generate the desired cabal file based on the results of querying stackage directly. This streamlines updates, provides a more flexible build process, and potentially has prettier output (with `--batch` arg):
![demo](example_output.png)
In particular, the `clc-stackage` exe allows for splitting the entire package set into subset groups of size `N` with the `--batch N` option. Each group is then built sequentially. Not only can this be useful for situations where building the entire package set in one go is infeasible, but it also provides a "cache" functionality, that allows us to interrupt the program at any point (e.g. `CTRL-C`), and pick up where we left off. For example:
```
$ ./bin/clc-stackage --batch 100
```
This will split the entire downloaded package set into groups of size 100. Each time a group finishes (success or failure), stdout/err will be updated, and then the next group will start. If the group failed to build and we have `--write-logs save-failures` (the default), then the logs and error output will be in `./output/logs/<pkg>/`, where `<pkg>` is the name of the first package in the group.
See `./bin/clc-stackage --help` for more info.
#### Optimal performance
On the one hand, splitting the entire package set into `--batch` groups makes the output easier to understand and offers a nice workflow for interrupting/restarting the build. On the other hand, there is a question of what the best value of `N` is for `--batch N`, with respect to performance.
In general, the smaller `N` is, the worse the performance. There are several reasons for this:
- The smaller `N` is, the more `cabal build` processes, which adds overhead.
- More packages increase the chances for concurrency gains.
Thus for optimal performance, you want to take the largest group possible, with the upper limit being no `--batch` argument at all, as that puts all packages into the same group.
> [!TIP]
>
> Additionally, the `./output/cache.json` file can be manipulated directly. For example, if you want to try building only `foo`, ensure `foo` is the only entry in the json file's `untested` field.
## Getting dependencies via `nix`
For Linux based systems, there's a provided `flake.nix` and `shell.nix` to get a nix shell
with an approximation of the required dependencies (cabal itself, C libs) to build `clc-stackage`.
Note that it is not actively maintained, so it may require some tweaking to get working, and conversely, it may have some redundant dependencies.
## Misc
* Your custom GHC will need to be on the PATH to build the `stack` library i.e.
* Your custom GHC will need to be on the PATH to build the `stack` library e.g.
```
export PATH=/path/to/custom/ghc/stage1/bin/:$PATH
export PATH=/home/ghc/_build/stage1/bin/:$PATH
```
Nix users can uncomment (and modify) this line in the `flake.nix`.
Nix users can uncomment (and modify) this line in the `flake.nix`.
27 changes: 27 additions & 0 deletions app/Main.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
module Main (main) where

import CLC.Stackage.Runner qualified as Runner
import CLC.Stackage.Utils.Logging qualified as Logging
import Data.Text qualified as T
import Data.Time.LocalTime qualified as Local
import System.Console.Terminal.Size qualified as TermSize
import System.IO (hPutStrLn, stderr)

main :: IO ()
main = do
mWidth <- (fmap . fmap) TermSize.width TermSize.size

case mWidth of
Just w -> Runner.run $ mkLogger w
Nothing -> do
let hLogger = mkLogger 80
Logging.putTimeInfoStr hLogger False "Failed detecting terminal width"
Runner.run hLogger
where
mkLogger w =
Logging.MkHandle
{ Logging.getLocalTime = Local.zonedTimeToLocalTime <$> Local.getZonedTime,
Logging.logStrErrLn = hPutStrLn stderr . T.unpack,
Logging.logStrLn = putStrLn . T.unpack,
Logging.terminalWidth = w
}
60 changes: 20 additions & 40 deletions cabal.project
Original file line number Diff line number Diff line change
@@ -1,44 +1,24 @@
index-state: 2024-03-27T00:32:46Z
index-state: 2024-10-11T23:26:13Z

packages: .

constraints:
al < 0,
alsa-pcm < 0,
alsa-seq < 0,
ALUT < 0,
btrfs < 0,
fft < 0,
flac < 0,
glpk-headers < 0,
hmatrix-gsl < 0,
hopenssl < 0,
hpqtypes < 0,
hsdns < 0,
hsndfile < 0,
HsOpenSSL < 0,
hw-kafka-client < 0,
jack < 0,
lame < 0,
lapack-ffi < 0,
lmdb < 0,
magic < 0,
mysql < 0,
nfc < 0,
pcre-light < 0,
postgresql-libpq < 0,
primecount < 0,
pthread < 0,
pulse-simple < 0,
rdtsc < 0,
regex-pcre < 0,
re2 < 0,
text-icu < 0,
program-options
ghc-options:
-Wall -Wcompat
-Widentities
-Wincomplete-record-updates
-Wincomplete-uni-patterns
-Wmissing-deriving-strategies
-Wmissing-export-lists
-Wmissing-exported-signatures
-Wmissing-home-modules
-Wmissing-import-lists
-Wpartial-fields
-Wprepositive-qualified-module
-Wredundant-constraints
-Wunused-binds
-Wunused-packages
-Wunused-type-patterns
-Wno-unticked-promoted-constructors

allow-newer:
aura:bytestring,
aura:time

constraints: hlint +ghc-lib
constraints: ghc-lib-parser-ex -auto
constraints: stylish-haskell +ghc-lib
optimization: 2
Loading

0 comments on commit be67e91

Please sign in to comment.