Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a whitelist of crates ... somehow #4

Closed
shepmaster opened this issue Jun 28, 2016 · 26 comments
Closed

Support a whitelist of crates ... somehow #4

shepmaster opened this issue Jun 28, 2016 · 26 comments
Labels
enhancement Something new the playground could do

Comments

@shepmaster
Copy link
Member

No description provided.

@shepmaster shepmaster added the enhancement Something new the playground could do label Jun 28, 2016
@tbu-
Copy link

tbu- commented Jun 29, 2016

Why just a whitelist?

@shepmaster
Copy link
Member Author

@tbu- mostly centering around the fact that a crate of unknown provenance can do just about anything crazy in a build script. Thinking about it now though, anything a malicious user could do in a build script, they could do in the main code... perhaps whatever sandboxing we have would just need to protect us in both cases...

@ArtemGr
Copy link

ArtemGr commented Jun 29, 2016

I thought the whitelisting was for performance reasons. One could build a Docker image with all the whitelisted crates already built, so using these crates would just link them in, not triggering a rebuild.

Whitelist also saves us the trouble of generating a new Cargo.toml with just the right crates. Instead we might reuse a single Cargo.toml containing the whitelisted dependencies.

On the other hand, it might be cool to be able to use just about any crate in the playground instead of just the whitelisted.

@shepmaster
Copy link
Member Author

I thought the whitelisting was for performance reasons

Oh, right, thank you for reminding me. Additionally, at compilation / program run time, we don't have network access, so that's kind of a non-starter.

My hope is to generate a list of the top-N most popular crates, then add them to a Cargo.toml as well as all of the dependencies (if they have to be installed, we might as well allow people to use them).

Any pointers or suggestions on how to generate such a list would be appreciated!

@jess-sol
Copy link

I definitely think this needs to happen. Crates aren't just nice-to-haves, a lot of critical functionality is put into crates, like JSON encoding/decoding, URL manipulation, logging, etc. Even without an internet connection these packages are sometimes required to reproduce issues, not having access to at least some external crates also makes demonstrating things (like JSON decoding) impossible.

@shepmaster
Copy link
Member Author

Compiling the top 100 crates takes about 35 minutes (debug and release modes). Doing that for stable/beta/nightly is going to eat up a huge chunk of the time that EC2 hands out for free...

@ArtemGr
Copy link

ArtemGr commented Jul 4, 2016

How about stable? Most examples out there are tailored to run on stable anyway.

@shepmaster
Copy link
Member Author

@ArtemGr do you mean only supporting crates in the stable channel? I'd expect to get a lot of pushback because all the "cool" things happen in nightly (especially things like serde and friends)

@ArtemGr
Copy link

ArtemGr commented Jul 4, 2016

That kind of pushback might be solved by supporting the serde_codegen (build.rs) in playground.

Just my two cents. I imagine all kind of setups are possible. What I'm saying is that supporting crates on a stable channel is still better than not being able to support them at all.

@shepmaster
Copy link
Member Author

Thinking about it:

  • Stable releases are every 6 weeks
  • There were 4 betas for 1.10.0
  • Nightly builds aren't actually nightly, but closer to every 2 or 3 days.
  • EC2 free tier offers "enough hours to run continuously each month"

If we measure conservatively and say that there is one stable release every 6 weeks, 1 beta release a week, and a nightly build every other day, then it really won't be using too much CPU.

@shepmaster
Copy link
Member Author

shepmaster commented Jul 9, 2016

What I'm a bit more worried about is user experience time now. With the 100+ top crates, compiling "hello world" takes about 2 seconds (presumably due to the linking overhead), which is just about double the 1 second of calling rustc with no crates.

It's possible we may want to offer both modes, with crates downplayed a bit, in order to give good impressions to first-time users.

@shepmaster
Copy link
Member Author

For reference, here are the top 100 crates:

[dependencies]
advapi32-sys = "0.2.0"
aho-corasick = "0.5.2"
ansi_term = "0.7.5"
aster = "0.20.0"
bitflags = "0.7.0"
byteorder = "0.5.3"
cfg-if = "0.1.0"
chrono = "0.2.22"
clap = "2.9.2"
color_quant = "1.0.0"
cookie = "0.2.5"
crossbeam = "0.2.9"
deque = "0.3.1"
docopt = "0.6.81"
enum_primitive = "0.1.0"
env_logger = "0.3.3"
flate2 = "0.2.14"
gcc = "0.3.28"
gdi32-sys = "0.2.0"
getopts = "0.2.14"
gif = "0.9.0"
gl_generator = "0.5.2"
glob = "0.2.11"
hpack = "0.3.0"
httparse = "1.1.2"
hyper = "0.9.9"
idna = "0.1.0"
image = "0.10.1"
inflate = "0.1.1"
itertools = "0.4.16"
itoa = "0.1.1"
jpeg-decoder = "0.1.5"
kernel32-sys = "0.2.2"
khronos_api = "1.0.0"
language-tags = "0.2.2"
lazy_static = "0.2.1"
libc = "0.2.13"
libz-sys = "1.0.4"
log = "0.3.6"
lzw = "0.10.0"
matches = "0.1.2"
memchr = "0.1.11"
mime = "0.2.1"
miniz-sys = "0.1.7"
net2 = "0.2.24"
nix = "0.6.0"
nom = "1.2.3"
num = "0.1.32"
num-bigint = "0.1.32"
num-complex = "0.1.32"
num-integer = "0.1.32"
num-iter = "0.1.32"
num-rational = "0.1.32"
num-traits = "0.1.32"
num_cpus = "0.2.13"
openssl = "0.7.14"
openssl-sys = "0.7.14"
openssl-sys-extras = "0.7.14"
openssl-verify = "0.1.0"
phf = "0.7.15"
phf_generator = "0.7.15"
phf_shared = "0.7.15"
pkg-config = "0.3.8"
png = "0.5.1"
quasi = "0.14.0"
quasi_codegen = "0.14.0"
rand = "0.3.14"
rayon = "0.4.0"
regex = "0.1.71"
regex-syntax = "0.3.3"
rust-crypto = "0.2.36"
rustc-serialize = "0.3.19"
rustc_version = "0.1.7"
semver = "0.2.3"
serde = "0.7.12"
serde_codegen = "0.7.12"
serde_codegen_internals = "0.2.0"
serde_json = "0.7.4"
shared_library = "0.1.4"
slab = "0.2.0"
solicit = "0.4.4"
strsim = "0.4.1"
syntex = "0.37.0"
syntex_errors = "0.37.0"
syntex_pos = "0.37.0"
syntex_syntax = "0.37.0"
tempdir = "0.3.4"
tempfile = "2.1.4"
term = "0.4.4"
term_size = "0.1.0"
thread-id = "2.0.0"
thread_local = "0.2.6"
time = "0.1.35"
toml = "0.1.30"
traitobject = "0.0.3"
typeable = "0.1.2"
unicase = "1.4.0"
unicode-bidi = "0.2.3"
unicode-normalization = "0.1.2"
unicode-width = "0.1.3"
unicode-xid = "0.0.3"
unsafe-any = "0.4.1"
url = "1.1.1"
user32-sys = "0.2.0"
utf8-ranges = "0.1.3"
uuid = "0.2.2"
vec_map = "0.6.0"
void = "1.0.2"
winapi = "0.2.7"
winapi-build = "0.1.1"
ws2_32-sys = "0.2.1"
xml-rs = "0.3.4"

[package]
authors = ["The Rust Playground"]
name = "playground"
version = "0.0.1"

Any strong objections to any of those? Any obvious missing pieces?

@shepmaster
Copy link
Member Author

I've switched http://play.integer32.com/ to using cargo, and there's just a single crate (rand) available at the moment.

@ArtemGr
Copy link

ArtemGr commented Jul 9, 2016

Cool!
One can test it with.

extern crate rand;
use rand::Rng;
fn main() {println! ("Hello, random world number {}!", rand::thread_rng().gen::<u32>());}

@shepmaster
Copy link
Member Author

shepmaster commented Jul 14, 2016

A small update on the times

Running build.sh which builds the stable, beta, and nightly channels with the top 100 crates in debug and release mode, plus the clippy and rustfmt containers took:

real    74m23.071s
user     0m 0.740s
sys      0m 0.497s

@shepmaster
Copy link
Member Author

shepmaster commented Jul 14, 2016

Hmm, I thought the extra time was because of all the linking, but maybe somehow just one crate isn't being built ahead of time:

# time cargo build
   Compiling tempfile v2.1.4
   Compiling playground v0.0.1 (file:///playground)

real    0m2.330s
user    0m2.040s
sys     0m0.210s

# touch src/main.rs
# time cargo build
   Compiling playground v0.0.1 (file:///playground)

real    0m0.849s
user    0m0.670s
sys     0m0.110s

What's strange is that I recall seeing this output before - it was tempfile then as well.

I've opened rust-lang/cargo#2874

@shepmaster
Copy link
Member Author

Some raw timing numbers with a single crate (rand) and then the top 100. This is compiling a program that uses rand and the numbers are those reported by Chrome's network pane:

Before

2.80
1.87
1.28
1.25
1.31
1.25
1.23
1.24
1.24
1.30

After

2.89
1.42
1.43
1.41
1.43
1.43
1.48
1.41
1.41
1.42

That's 1.477 seconds on average before and 1.573 after. Using a bunch of crates would add about 100 ms. Maybe that's acceptable.


extern crate rand;

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    println!("{}", rng.gen::<u8>());
}

@shepmaster
Copy link
Member Author

Mumble, mumble. Actually running on EC2 there's a much more pronounced speed loss. Running an arbitrary program gets me 2.75 s with crates and 1.13 s without. 😠

@shepmaster
Copy link
Member Author

Ok, maybe that's just because I exhausted my CPU credits compiling all those crates. Let's let it sit overnight and build up some buffer... 🙏

@ArtemGr
Copy link

ArtemGr commented Jul 15, 2016

Why not do a simple heuristic and use the crate toml only when there is an "extern crate " line in the source code?

When there's no "extern crate " - switch to using a different toml file, one that links no crates.

@cuviper
Copy link
Member

cuviper commented Jul 15, 2016

If you write a tool with a real parser to find crate names, then you could mark all of the crates optional in the toml, and just enable exactly those that are used.

@shepmaster
Copy link
Member Author

Why not do a simple heuristic and use the crate toml only when there is an "extern crate " line in the source code?

When there's no "extern crate " - switch to using a different toml file, one that links no crates.

If you write a tool with a real parser to find crate names, then you could mark all of the crates optional in the toml, and just enable exactly those that are used.

Yup, these are all good ideas but require far more work. The current sandbox essentially works by:

  1. The server code writes the user source file to a tempfile.
  2. That tempfile is mounted into a fresh docker container as src/main.rs.
  3. cargo run is called in that container.

Any additional processing has to go somewhere. If it lives in the server, then we have to make the container more flexible and configurable; perhaps by to writing out a Cargo.toml and mounting that in. That would mean that we'd have some amount of synchronization

If it lives in the container, then it becomes trickier to rely on any given language other than what happens to live in there by default. It might mean writing it in Perl! 😱

There is one tool that I know of that parses Rust code fairly well; the compiler! I wonder if this is the kind of thing that would be welcomed as an actual PR there. Maybe I should open an issue over there to begin discussion.

@shepmaster
Copy link
Member Author

Now that some credits have built up, we can see the difference:

Before

0.806   
0.664   
0.669   
0.658   
0.666   
0.666   
0.661   
0.680   
0.668   
0.669

After

1.07
0.974
0.994
1.04
1.07
1.04
1.01
1.01
0.977
0.976

The averages are 0.681 and 1.02 seconds, respectively. So it's definitely slower, but I feel that 1 second is a reasonable time for the improved functionality and not terrible for user response time.

@shepmaster
Copy link
Member Author

find crate names, then you could mark all of the crates optional in the toml

Oh, this also has the difficulty of reverse-mapping the dash-to-underscore transformation...

@shepmaster
Copy link
Member Author

If I create a project on my laptop, add the top 100 crates, compile them, then touch main.rs, cargo build takes:

real    0m0.674s
user    0m0.483s
sys     0m0.175s

And running cargo build a second time:

real    0m0.426s
user    0m0.312s
sys     0m0.108s

If I then remove all the crates and touch main.rs again:

real    0m0.413s
user    0m0.284s
sys     0m0.113s

And a second time:

real    0m0.127s
user    0m0.075s
sys     0m0.044s

The time to run the compiler is 286ms when there aren't any crates and 248ms when there are crates; this seems to indicate that making any changes to the compiler would not be useful to save time. It appears that the majority of the time would be inside Cargo itself, so manipulating the Cargo.toml would be the effective solution.

@shepmaster
Copy link
Member Author

Closed in #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something new the playground could do
Projects
None yet
Development

No branches or pull requests

5 participants