-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support a whitelist of crates ... somehow #4
Comments
Why just a whitelist? |
@tbu- mostly centering around the fact that a crate of unknown provenance can do just about anything crazy in a build script. Thinking about it now though, anything a malicious user could do in a build script, they could do in the main code... perhaps whatever sandboxing we have would just need to protect us in both cases... |
I thought the whitelisting was for performance reasons. One could build a Docker image with all the whitelisted crates already built, so using these crates would just link them in, not triggering a rebuild. Whitelist also saves us the trouble of generating a new Cargo.toml with just the right crates. Instead we might reuse a single Cargo.toml containing the whitelisted dependencies. On the other hand, it might be cool to be able to use just about any crate in the playground instead of just the whitelisted. |
Oh, right, thank you for reminding me. Additionally, at compilation / program run time, we don't have network access, so that's kind of a non-starter. My hope is to generate a list of the top-N most popular crates, then add them to a Cargo.toml as well as all of the dependencies (if they have to be installed, we might as well allow people to use them). Any pointers or suggestions on how to generate such a list would be appreciated! |
I definitely think this needs to happen. Crates aren't just nice-to-haves, a lot of critical functionality is put into crates, like JSON encoding/decoding, URL manipulation, logging, etc. Even without an internet connection these packages are sometimes required to reproduce issues, not having access to at least some external crates also makes demonstrating things (like JSON decoding) impossible. |
Compiling the top 100 crates takes about 35 minutes (debug and release modes). Doing that for stable/beta/nightly is going to eat up a huge chunk of the time that EC2 hands out for free... |
How about stable? Most examples out there are tailored to run on stable anyway. |
@ArtemGr do you mean only supporting crates in the stable channel? I'd expect to get a lot of pushback because all the "cool" things happen in nightly (especially things like serde and friends) |
That kind of pushback might be solved by supporting the serde_codegen (build.rs) in playground. Just my two cents. I imagine all kind of setups are possible. What I'm saying is that supporting crates on a stable channel is still better than not being able to support them at all. |
Thinking about it:
If we measure conservatively and say that there is one stable release every 6 weeks, 1 beta release a week, and a nightly build every other day, then it really won't be using too much CPU. |
What I'm a bit more worried about is user experience time now. With the 100+ top crates, compiling "hello world" takes about 2 seconds (presumably due to the linking overhead), which is just about double the 1 second of calling It's possible we may want to offer both modes, with crates downplayed a bit, in order to give good impressions to first-time users. |
For reference, here are the top 100 crates: [dependencies]
advapi32-sys = "0.2.0"
aho-corasick = "0.5.2"
ansi_term = "0.7.5"
aster = "0.20.0"
bitflags = "0.7.0"
byteorder = "0.5.3"
cfg-if = "0.1.0"
chrono = "0.2.22"
clap = "2.9.2"
color_quant = "1.0.0"
cookie = "0.2.5"
crossbeam = "0.2.9"
deque = "0.3.1"
docopt = "0.6.81"
enum_primitive = "0.1.0"
env_logger = "0.3.3"
flate2 = "0.2.14"
gcc = "0.3.28"
gdi32-sys = "0.2.0"
getopts = "0.2.14"
gif = "0.9.0"
gl_generator = "0.5.2"
glob = "0.2.11"
hpack = "0.3.0"
httparse = "1.1.2"
hyper = "0.9.9"
idna = "0.1.0"
image = "0.10.1"
inflate = "0.1.1"
itertools = "0.4.16"
itoa = "0.1.1"
jpeg-decoder = "0.1.5"
kernel32-sys = "0.2.2"
khronos_api = "1.0.0"
language-tags = "0.2.2"
lazy_static = "0.2.1"
libc = "0.2.13"
libz-sys = "1.0.4"
log = "0.3.6"
lzw = "0.10.0"
matches = "0.1.2"
memchr = "0.1.11"
mime = "0.2.1"
miniz-sys = "0.1.7"
net2 = "0.2.24"
nix = "0.6.0"
nom = "1.2.3"
num = "0.1.32"
num-bigint = "0.1.32"
num-complex = "0.1.32"
num-integer = "0.1.32"
num-iter = "0.1.32"
num-rational = "0.1.32"
num-traits = "0.1.32"
num_cpus = "0.2.13"
openssl = "0.7.14"
openssl-sys = "0.7.14"
openssl-sys-extras = "0.7.14"
openssl-verify = "0.1.0"
phf = "0.7.15"
phf_generator = "0.7.15"
phf_shared = "0.7.15"
pkg-config = "0.3.8"
png = "0.5.1"
quasi = "0.14.0"
quasi_codegen = "0.14.0"
rand = "0.3.14"
rayon = "0.4.0"
regex = "0.1.71"
regex-syntax = "0.3.3"
rust-crypto = "0.2.36"
rustc-serialize = "0.3.19"
rustc_version = "0.1.7"
semver = "0.2.3"
serde = "0.7.12"
serde_codegen = "0.7.12"
serde_codegen_internals = "0.2.0"
serde_json = "0.7.4"
shared_library = "0.1.4"
slab = "0.2.0"
solicit = "0.4.4"
strsim = "0.4.1"
syntex = "0.37.0"
syntex_errors = "0.37.0"
syntex_pos = "0.37.0"
syntex_syntax = "0.37.0"
tempdir = "0.3.4"
tempfile = "2.1.4"
term = "0.4.4"
term_size = "0.1.0"
thread-id = "2.0.0"
thread_local = "0.2.6"
time = "0.1.35"
toml = "0.1.30"
traitobject = "0.0.3"
typeable = "0.1.2"
unicase = "1.4.0"
unicode-bidi = "0.2.3"
unicode-normalization = "0.1.2"
unicode-width = "0.1.3"
unicode-xid = "0.0.3"
unsafe-any = "0.4.1"
url = "1.1.1"
user32-sys = "0.2.0"
utf8-ranges = "0.1.3"
uuid = "0.2.2"
vec_map = "0.6.0"
void = "1.0.2"
winapi = "0.2.7"
winapi-build = "0.1.1"
ws2_32-sys = "0.2.1"
xml-rs = "0.3.4"
[package]
authors = ["The Rust Playground"]
name = "playground"
version = "0.0.1" Any strong objections to any of those? Any obvious missing pieces? |
I've switched http://play.integer32.com/ to using cargo, and there's just a single crate (rand) available at the moment. |
Cool!
|
A small update on the times Running
|
Hmm, I thought the extra time was because of all the linking, but maybe somehow just one crate isn't being built ahead of time:
What's strange is that I recall seeing this output before - it was I've opened rust-lang/cargo#2874 |
Some raw timing numbers with a single crate (rand) and then the top 100. This is compiling a program that uses
That's 1.477 seconds on average before and 1.573 after. Using a bunch of crates would add about 100 ms. Maybe that's acceptable. extern crate rand;
use rand::Rng;
fn main() {
let mut rng = rand::thread_rng();
println!("{}", rng.gen::<u8>());
} |
Mumble, mumble. Actually running on EC2 there's a much more pronounced speed loss. Running an arbitrary program gets me 2.75 s with crates and 1.13 s without. 😠 |
Ok, maybe that's just because I exhausted my CPU credits compiling all those crates. Let's let it sit overnight and build up some buffer... 🙏 |
Why not do a simple heuristic and use the crate toml only when there is an "extern crate " line in the source code? When there's no "extern crate " - switch to using a different toml file, one that links no crates. |
If you write a tool with a real parser to find crate names, then you could mark all of the crates optional in the toml, and just enable exactly those that are used. |
Yup, these are all good ideas but require far more work. The current sandbox essentially works by:
Any additional processing has to go somewhere. If it lives in the server, then we have to make the container more flexible and configurable; perhaps by to writing out a If it lives in the container, then it becomes trickier to rely on any given language other than what happens to live in there by default. It might mean writing it in Perl! 😱 There is one tool that I know of that parses Rust code fairly well; the compiler! I wonder if this is the kind of thing that would be welcomed as an actual PR there. Maybe I should open an issue over there to begin discussion. |
Now that some credits have built up, we can see the difference: Before
After
The averages are 0.681 and 1.02 seconds, respectively. So it's definitely slower, but I feel that 1 second is a reasonable time for the improved functionality and not terrible for user response time. |
Oh, this also has the difficulty of reverse-mapping the dash-to-underscore transformation... |
If I create a project on my laptop, add the top 100 crates, compile them, then touch main.rs,
And running
If I then remove all the crates and touch main.rs again:
And a second time:
The time to run the compiler is 286ms when there aren't any crates and 248ms when there are crates; this seems to indicate that making any changes to the compiler would not be useful to save time. It appears that the majority of the time would be inside Cargo itself, so manipulating the Cargo.toml would be the effective solution. |
Closed in #21 |
No description provided.
The text was updated successfully, but these errors were encountered: