Support a whitelist of crates ... somehow #4

shepmaster · 2016-06-28T21:05:10Z

No description provided.

tbu- · 2016-06-29T00:00:06Z

Why just a whitelist?

shepmaster · 2016-06-29T04:56:00Z

@tbu- mostly centering around the fact that a crate of unknown provenance can do just about anything crazy in a build script. Thinking about it now though, anything a malicious user could do in a build script, they could do in the main code... perhaps whatever sandboxing we have would just need to protect us in both cases...

ArtemGr · 2016-06-29T11:25:59Z

I thought the whitelisting was for performance reasons. One could build a Docker image with all the whitelisted crates already built, so using these crates would just link them in, not triggering a rebuild.

Whitelist also saves us the trouble of generating a new Cargo.toml with just the right crates. Instead we might reuse a single Cargo.toml containing the whitelisted dependencies.

On the other hand, it might be cool to be able to use just about any crate in the playground instead of just the whitelisted.

shepmaster · 2016-06-30T04:42:55Z

I thought the whitelisting was for performance reasons

Oh, right, thank you for reminding me. Additionally, at compilation / program run time, we don't have network access, so that's kind of a non-starter.

My hope is to generate a list of the top-N most popular crates, then add them to a Cargo.toml as well as all of the dependencies (if they have to be installed, we might as well allow people to use them).

Any pointers or suggestions on how to generate such a list would be appreciated!

jess-sol · 2016-06-30T20:58:58Z

I definitely think this needs to happen. Crates aren't just nice-to-haves, a lot of critical functionality is put into crates, like JSON encoding/decoding, URL manipulation, logging, etc. Even without an internet connection these packages are sometimes required to reproduce issues, not having access to at least some external crates also makes demonstrating things (like JSON decoding) impossible.

shepmaster · 2016-07-04T00:35:43Z

Compiling the top 100 crates takes about 35 minutes (debug and release modes). Doing that for stable/beta/nightly is going to eat up a huge chunk of the time that EC2 hands out for free...

ArtemGr · 2016-07-04T00:37:43Z

How about stable? Most examples out there are tailored to run on stable anyway.

shepmaster · 2016-07-04T00:40:43Z

@ArtemGr do you mean only supporting crates in the stable channel? I'd expect to get a lot of pushback because all the "cool" things happen in nightly (especially things like serde and friends)

ArtemGr · 2016-07-04T00:47:56Z

That kind of pushback might be solved by supporting the serde_codegen (build.rs) in playground.

Just my two cents. I imagine all kind of setups are possible. What I'm saying is that supporting crates on a stable channel is still better than not being able to support them at all.

shepmaster · 2016-07-09T13:56:40Z

Thinking about it:

Stable releases are every 6 weeks
There were 4 betas for 1.10.0
Nightly builds aren't actually nightly, but closer to every 2 or 3 days.
EC2 free tier offers "enough hours to run continuously each month"

If we measure conservatively and say that there is one stable release every 6 weeks, 1 beta release a week, and a nightly build every other day, then it really won't be using too much CPU.

shepmaster · 2016-07-09T13:58:35Z

What I'm a bit more worried about is user experience time now. With the 100+ top crates, compiling "hello world" takes about 2 seconds (presumably due to the linking overhead), which is just about double the 1 second of calling rustc with no crates.

It's possible we may want to offer both modes, with crates downplayed a bit, in order to give good impressions to first-time users.

shepmaster · 2016-07-09T15:46:29Z

For reference, here are the top 100 crates:

[dependencies]
advapi32-sys = "0.2.0"
aho-corasick = "0.5.2"
ansi_term = "0.7.5"
aster = "0.20.0"
bitflags = "0.7.0"
byteorder = "0.5.3"
cfg-if = "0.1.0"
chrono = "0.2.22"
clap = "2.9.2"
color_quant = "1.0.0"
cookie = "0.2.5"
crossbeam = "0.2.9"
deque = "0.3.1"
docopt = "0.6.81"
enum_primitive = "0.1.0"
env_logger = "0.3.3"
flate2 = "0.2.14"
gcc = "0.3.28"
gdi32-sys = "0.2.0"
getopts = "0.2.14"
gif = "0.9.0"
gl_generator = "0.5.2"
glob = "0.2.11"
hpack = "0.3.0"
httparse = "1.1.2"
hyper = "0.9.9"
idna = "0.1.0"
image = "0.10.1"
inflate = "0.1.1"
itertools = "0.4.16"
itoa = "0.1.1"
jpeg-decoder = "0.1.5"
kernel32-sys = "0.2.2"
khronos_api = "1.0.0"
language-tags = "0.2.2"
lazy_static = "0.2.1"
libc = "0.2.13"
libz-sys = "1.0.4"
log = "0.3.6"
lzw = "0.10.0"
matches = "0.1.2"
memchr = "0.1.11"
mime = "0.2.1"
miniz-sys = "0.1.7"
net2 = "0.2.24"
nix = "0.6.0"
nom = "1.2.3"
num = "0.1.32"
num-bigint = "0.1.32"
num-complex = "0.1.32"
num-integer = "0.1.32"
num-iter = "0.1.32"
num-rational = "0.1.32"
num-traits = "0.1.32"
num_cpus = "0.2.13"
openssl = "0.7.14"
openssl-sys = "0.7.14"
openssl-sys-extras = "0.7.14"
openssl-verify = "0.1.0"
phf = "0.7.15"
phf_generator = "0.7.15"
phf_shared = "0.7.15"
pkg-config = "0.3.8"
png = "0.5.1"
quasi = "0.14.0"
quasi_codegen = "0.14.0"
rand = "0.3.14"
rayon = "0.4.0"
regex = "0.1.71"
regex-syntax = "0.3.3"
rust-crypto = "0.2.36"
rustc-serialize = "0.3.19"
rustc_version = "0.1.7"
semver = "0.2.3"
serde = "0.7.12"
serde_codegen = "0.7.12"
serde_codegen_internals = "0.2.0"
serde_json = "0.7.4"
shared_library = "0.1.4"
slab = "0.2.0"
solicit = "0.4.4"
strsim = "0.4.1"
syntex = "0.37.0"
syntex_errors = "0.37.0"
syntex_pos = "0.37.0"
syntex_syntax = "0.37.0"
tempdir = "0.3.4"
tempfile = "2.1.4"
term = "0.4.4"
term_size = "0.1.0"
thread-id = "2.0.0"
thread_local = "0.2.6"
time = "0.1.35"
toml = "0.1.30"
traitobject = "0.0.3"
typeable = "0.1.2"
unicase = "1.4.0"
unicode-bidi = "0.2.3"
unicode-normalization = "0.1.2"
unicode-width = "0.1.3"
unicode-xid = "0.0.3"
unsafe-any = "0.4.1"
url = "1.1.1"
user32-sys = "0.2.0"
utf8-ranges = "0.1.3"
uuid = "0.2.2"
vec_map = "0.6.0"
void = "1.0.2"
winapi = "0.2.7"
winapi-build = "0.1.1"
ws2_32-sys = "0.2.1"
xml-rs = "0.3.4"

[package]
authors = ["The Rust Playground"]
name = "playground"
version = "0.0.1"

Any strong objections to any of those? Any obvious missing pieces?

shepmaster · 2016-07-09T21:24:38Z

I've switched http://play.integer32.com/ to using cargo, and there's just a single crate (rand) available at the moment.

ArtemGr · 2016-07-09T22:41:17Z

Cool!
One can test it with.

extern crate rand;
use rand::Rng;
fn main() {println! ("Hello, random world number {}!", rand::thread_rng().gen::<u32>());}

shepmaster · 2016-07-14T12:03:53Z

A small update on the times

Running build.sh which builds the stable, beta, and nightly channels with the top 100 crates in debug and release mode, plus the clippy and rustfmt containers took:

real    74m23.071s
user     0m 0.740s
sys      0m 0.497s

shepmaster · 2016-07-14T12:48:13Z

Hmm, I thought the extra time was because of all the linking, but maybe somehow just one crate isn't being built ahead of time:

# time cargo build
   Compiling tempfile v2.1.4
   Compiling playground v0.0.1 (file:///playground)

real    0m2.330s
user    0m2.040s
sys     0m0.210s

# touch src/main.rs
# time cargo build
   Compiling playground v0.0.1 (file:///playground)

real    0m0.849s
user    0m0.670s
sys     0m0.110s

What's strange is that I recall seeing this output before - it was tempfile then as well.

I've opened rust-lang/cargo#2874

shepmaster · 2016-07-14T16:09:56Z

Some raw timing numbers with a single crate (rand) and then the top 100. This is compiling a program that uses rand and the numbers are those reported by Chrome's network pane:

Before

2.80
1.87
1.28
1.25
1.31
1.25
1.23
1.24
1.24
1.30

After

2.89
1.42
1.43
1.41
1.43
1.43
1.48
1.41
1.41
1.42

That's 1.477 seconds on average before and 1.573 after. Using a bunch of crates would add about 100 ms. Maybe that's acceptable.

extern crate rand;

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    println!("{}", rng.gen::<u8>());
}

shepmaster · 2016-07-15T03:02:04Z

Mumble, mumble. Actually running on EC2 there's a much more pronounced speed loss. Running an arbitrary program gets me 2.75 s with crates and 1.13 s without. 😠

shepmaster · 2016-07-15T03:20:47Z

Ok, maybe that's just because I exhausted my CPU credits compiling all those crates. Let's let it sit overnight and build up some buffer... 🙏

ArtemGr · 2016-07-15T04:56:04Z

Why not do a simple heuristic and use the crate toml only when there is an "extern crate " line in the source code?

When there's no "extern crate " - switch to using a different toml file, one that links no crates.

cuviper · 2016-07-15T05:16:02Z

If you write a tool with a real parser to find crate names, then you could mark all of the crates optional in the toml, and just enable exactly those that are used.

shepmaster · 2016-07-15T12:51:01Z

Why not do a simple heuristic and use the crate toml only when there is an "extern crate " line in the source code?

When there's no "extern crate " - switch to using a different toml file, one that links no crates.

If you write a tool with a real parser to find crate names, then you could mark all of the crates optional in the toml, and just enable exactly those that are used.

Yup, these are all good ideas but require far more work. The current sandbox essentially works by:

The server code writes the user source file to a tempfile.
That tempfile is mounted into a fresh docker container as src/main.rs.
cargo run is called in that container.

Any additional processing has to go somewhere. If it lives in the server, then we have to make the container more flexible and configurable; perhaps by to writing out a Cargo.toml and mounting that in. That would mean that we'd have some amount of synchronization

If it lives in the container, then it becomes trickier to rely on any given language other than what happens to live in there by default. It might mean writing it in Perl! 😱

There is one tool that I know of that parses Rust code fairly well; the compiler! I wonder if this is the kind of thing that would be welcomed as an actual PR there. Maybe I should open an issue over there to begin discussion.

shepmaster · 2016-07-15T12:54:49Z

Now that some credits have built up, we can see the difference:

Before

After

The averages are 0.681 and 1.02 seconds, respectively. So it's definitely slower, but I feel that 1 second is a reasonable time for the improved functionality and not terrible for user response time.

shepmaster · 2016-07-15T13:47:30Z

find crate names, then you could mark all of the crates optional in the toml

Oh, this also has the difficulty of reverse-mapping the dash-to-underscore transformation...

shepmaster · 2016-07-15T13:56:50Z

If I create a project on my laptop, add the top 100 crates, compile them, then touch main.rs, cargo build takes:

real    0m0.674s
user    0m0.483s
sys     0m0.175s

And running cargo build a second time:

real    0m0.426s
user    0m0.312s
sys     0m0.108s

If I then remove all the crates and touch main.rs again:

real    0m0.413s
user    0m0.284s
sys     0m0.113s

And a second time:

real    0m0.127s
user    0m0.075s
sys     0m0.044s

The time to run the compiler is 286ms when there aren't any crates and 248ms when there are crates; this seems to indicate that making any changes to the compiler would not be useful to save time. It appears that the majority of the time would be inside Cargo itself, so manipulating the Cargo.toml would be the effective solution.

shepmaster · 2016-07-20T02:57:24Z

Closed in #21

shepmaster added the enhancement Something new the playground could do label Jun 28, 2016

shepmaster closed this as completed Jul 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a whitelist of crates ... somehow #4

Support a whitelist of crates ... somehow #4

shepmaster commented Jun 28, 2016

tbu- commented Jun 29, 2016

shepmaster commented Jun 29, 2016

ArtemGr commented Jun 29, 2016

shepmaster commented Jun 30, 2016

jess-sol commented Jun 30, 2016

shepmaster commented Jul 4, 2016

ArtemGr commented Jul 4, 2016

shepmaster commented Jul 4, 2016

ArtemGr commented Jul 4, 2016

shepmaster commented Jul 9, 2016

shepmaster commented Jul 9, 2016 •

edited

Loading

shepmaster commented Jul 9, 2016

shepmaster commented Jul 9, 2016

ArtemGr commented Jul 9, 2016

shepmaster commented Jul 14, 2016 •

edited

Loading

shepmaster commented Jul 14, 2016 •

edited

Loading

shepmaster commented Jul 14, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

ArtemGr commented Jul 15, 2016

cuviper commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 20, 2016

Support a whitelist of crates ... somehow #4

Support a whitelist of crates ... somehow #4

Comments

shepmaster commented Jun 28, 2016

tbu- commented Jun 29, 2016

shepmaster commented Jun 29, 2016

ArtemGr commented Jun 29, 2016

shepmaster commented Jun 30, 2016

jess-sol commented Jun 30, 2016

shepmaster commented Jul 4, 2016

ArtemGr commented Jul 4, 2016

shepmaster commented Jul 4, 2016

ArtemGr commented Jul 4, 2016

shepmaster commented Jul 9, 2016

shepmaster commented Jul 9, 2016 • edited Loading

shepmaster commented Jul 9, 2016

shepmaster commented Jul 9, 2016

ArtemGr commented Jul 9, 2016

shepmaster commented Jul 14, 2016 • edited Loading

shepmaster commented Jul 14, 2016 • edited Loading

shepmaster commented Jul 14, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

ArtemGr commented Jul 15, 2016

cuviper commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 15, 2016

shepmaster commented Jul 20, 2016

shepmaster commented Jul 9, 2016 •

edited

Loading

shepmaster commented Jul 14, 2016 •

edited

Loading

shepmaster commented Jul 14, 2016 •

edited

Loading