-
-
Notifications
You must be signed in to change notification settings - Fork 163
Compiler Engineer Job
andychu edited this page Mar 31, 2022
·
84 revisions
The Oil project needs a compiler engineer with experience in C++ and garbage collection to help "finish" the project! We're funding it through a mix of grants and donations, and I encourage discussions about compensation. (More information is expected in April 2022.)
- What is it?
- Oil is a new Unix shell. It's our upgrade path from bash to a better shell and runtime! It's also for Python and JavaScript users who avoid shell.
- What do we need done?
- Write a 4K-8K line compiler in Python, and a 3K-10K lines garbage-collected runtime in C++. (There is a lot of working code, some of which may need to be rewritten. We can talk about it.)
- There is nothing fancy here -- it's very much a job in need of solid engineering!
- Why are you doing it this way? What progress have you made?
- TODO: See Oil is Being Implemented "Middle Out" for a plot of progress.
- How do I apply?
- For now, send mail to andy@oilshell.org introducing yourself, interest, and experiences. We can chat on https://oilshell.zulipchat.com/ and then have a video call.
- I may want to do some kind of "paid interview" which involves making a failing test pass in Oil. Details to come on the blog.
- How long does it last?
- At least 3 months. It depends on funding, but I could easily imagine 12 - 24 months of work.
I made an HTML page that lists the code you'll be working with and working near: https://www.oilshell.org/release/latest/pub/metrics.wwz/line-counts/for-translation.html.
Note the line counts are quite small. This is not a 100K line project; it's more like 10K lines. (The big components are inputs and outputs to the compiler, not code we need to write.)
In order of importance:
- Hard-won C++ experience and knowledge
- Generating correct C++ code with a translator (i.e. C++ that works with all compilers)
- Debugging it, analyzing its performance, and optimizing it
- Comfortable using standard tools like gdb / CLion, ASAN, etc.
- Understanding of Garbage Collection
- We have a working garbage collector, but I found this to be one of the most difficult parts of the project!
-
Test-driven and terminal-based workflow (on some kind of Unix)
- The job is very metrics-driven; the idea is to "make more tests pass". I've found that this strategy enables a lot of creativity and productivity!
-
Type systems, and the relationship between types and garbage collection.
- It's likely that we want to write our own type checker rather than relying on MyPy.
- If you understand this Mozilla blog post, that's a good sign: Clawing Our Way Back to Precision (2013)
-
Python
- Most of the code is written in Python. However I think this can be learned on the job, whereas the C++ parts can't.
General attributes desired:
- You should consider yourself a "finisher". You should be able to prioritize work and not get lost in micro-optimization (although there are plenty of opportunities for such skills on the project). This is not a research project; the goal is to make a production quality shell.
- You should have good communication skills, and be able to explain your work. (We encourage applicants in any country; however English is used for all docs and communication.)
- Bonus: if you can write nice blog posts. I frequently do this, e.g. with posts tagged #project-updates, and I find it helps me organize work and attract new contributors.
- Generally speaking, you should be excited about the high level goals of the Oil project. The blog should not be boring to you :-)
- If you think our C++ is ugly! That means you have ideas on how to make it better. What exists is a proof of concept, designed to show the strategy will work and can perform well. There are many improvements that can be made. If you are convinced a complete rewrite is necessary, then please make a case that it's feasible justified by a survey of the code.
- If you enjoy debugging C++ code! And then writing tests to make sure the bug never comes back.
- If you like using ASAN, profilers, and other such tools (uftrace). Maybe you have a nice debugger configuration.
- (note: Oil has a GDB pretty printer for ASDL data structures)
- If you can read the existing code in
oil-native
! If not, the job isn't probably a good fit. - If you understand how Rust is influenced by C++ (positively and negatively) and ML, that's a good sign. In a similar way, Oil is written with algebraic data types at the core, but we also want it to be efficient.
- Understanding the Mozilla blog post above -- or better, pointing to even more relevant references!
- This post is relevant since we also have a precise collector. I didn't find that many documents describing such issues on real world, deployed language projects. Our GC is also meant to be 100% portable C++.