Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages in abstractarray.jl #11797

Merged
merged 1 commit into from
Jul 14, 2015

Conversation

ScottPJones
Copy link
Contributor

This change improves the error messages for AbstractArray, and also (until one of the genius Julians fixes it, apparently quite soon) improves the coverage information.

@ScottPJones ScottPJones changed the title Address #11792 in abstractarray.jl Improve error messages in abstractarray.jl Jun 21, 2015
@ScottPJones
Copy link
Contributor Author

@mbauman @carnaval @timholy I know you weren't fans of #11792 (which I didn't like either, but felt driven to it by a desire for better coverage information), could you please look at this PR on it's own merits?
@kshyatt Imitation is the greatest form of flattery... You've done so much to improve error messages and coverage, I had thought I'd try my hand at it as well. Does this look good to you?

Also, I wanted to also see about adding more testing, because the coverage results I've been getting have been rather dismal. I'm not sure if it is because simply more tests are needed, or because the instructions for getting accurate coverage information require deleting sys.dylib, but AbstractArray lives in inference.dylib/.ji.

If you wonder why I picked abstractarray.jl, I think it is a great piece of enabling infrastructure.

@mbauman
Copy link
Member

mbauman commented Jun 21, 2015

I agree some of those lines are ridiculously long (e.g., /~https://github.com/JuliaLang/julia/pull/11797/files#diff-2264bb51acec4e7e2219a3cb1c733651L65), and restructuring is a definite gain in readability. And I like that you've added arguments to some BoundsError()s.

But:

I really don't like that there is a change in semantics in some places. (x | y | z) && error() is different from if x; error(); end; if y; error(); end; if z; error(); end;. This file is very performance sensitive, and any little change like that could result in a difference between getting inlined or not (which is huge). Or it could simply change the performance slightly. Either way, I really don't like doing this blindly.

I don't like all the replacements from throw(BoundsError(…)) to throw_boundserror, assuming they were done blindly. Just like I hope we'll eventually get multi-line code coverage up and running, it also looks hopeful that there will be a solution to the GC frame issues in throw. So I really don't want to be using throw_boundserror more than we need to, and I especially don't want to go blindly replacing their usages without profiling or closer inspection. There could easily be other things in the functions that are causing a GC frame in any case, which means that the change isn't necessary.

I also find some of these rearrangements much less readable. checkbounds(…) && checkbounds(…) || error() is a very common idiom throughout this file. I find it's very nice to be able to quickly identify a guard line at the top of a function and see that it's checking bounds. I find the alternative much more taxing — my mind takes a few seconds to parse out what case is getting selected by !(x && y):

if !(checkbounds(…) && checkbounds(…))
   error()
end

There are other very short lines that were very readable before and now are harder for me to parse. I'm sure I would get used to it eventually, but if we're judging this independently from code coverage changes I don't see the purpose.

Finally, I frankly don't like spending time on this. I know you want everything in Julia to be done in the best way possible, but my time here is donated. Discussions like this are taxing and, to my mind, relatively unproductive. There are much better uses of all of our time than surface cosmetics and vetting blind changes.

@ScottPJones
Copy link
Contributor Author

The splitting up of some of the tests was in order to be able to get good error messages.
Almost all of the BoundsError before did not give any arguments at all.
The throw_boundserror change was simply because I felt it was better to be safe than sorry... Is there downside to using throw_boundserror? If there really will be a solution soon, then no problem, I'll put the back.
About the short lines converted to ifs, that's just from what a number of people have told me, that they find if easier to read... you are the first who's come out to say they preferred it the short way.
I hadn't seen any objections to @kshyatt's changes that did exactly that, so I'm surprised that I get them,
for exactly the same type of changes.
In particular, the case with checkbounds you mentioned is exactly the sort of case that seems very difficult to parse with just the short circuit idiom (when there are more than one short-circuit operator,
I'd always prefer the if, for just a single one, if the condition is short, I've accustomed myself to the idiom)
I'm sorry, but I didn't think this was an issue of surface cosmetics... the only reason I looked at this was because 1) AbstractArray seems like something very fundamental, which deserves full coverage testing. 2) I wanted to improve the error messages, which I think is useful.
I can put the ifs back on one line, since it looks like the coverage issue will be fixed.
I'm very sorry that you think this is unproductive. I'm sorry I asked you to look at it.

@mbauman
Copy link
Member

mbauman commented Jun 21, 2015

Perhaps I misunderstood your request to view this PR on its own merits. And I'm quite certain I'm having trouble distancing this from #11792.

When I looked through the diff, I saw lots of surface syntax changes, changes in semantics, and what looked to be blind usage of a quirky performance trick. Just like I said in #11792, if this is accompanied by an active push to improve testing coverage, I would definitely view this in a different light (I don't mean the same PR, just ongoing work). Or if this were just the improvements in error messages, again, I'd view that differently.

@ScottPJones
Copy link
Contributor Author

I've been running the coverage tests on this, since I started (with the problem of the results may not be good because this is part of inference.dylib and I can't delete that like sys.dylib), and I wanted to start with this because this seems like such an important piece of work. I take all your points about not wanting stuff that might adversely affect performance, and if the GC issue is going to be resolved soon, like the coverage issue, then it just boils down to adding more information to the errors, adding more tests,
and seeing that they show up on the coverage... For me, it was a way of trying to grok some interesting work, and the last thing I wanted to do was to waste any of your time... on the contrary, I'd hoped that it would save you some time checking coverage and adding tests... I left the 3 line ifs, simply because of people saying they preferred it that way, and not seeing any objections to Kate's nice work. I'll take that out if you prefer (and definitely back out the throw_boundserror and semantic changes... I wonder though, if it were done with the same check as before, but then separate checks it that is true, to display different bounds error information, would be OK? Could that affect performance adversely?
Remember, I opened #11792 only because people had told me that line coverage was all that could be done, but that's rapidly been proven incorrect, much to my pleasure.

@ScottPJones
Copy link
Contributor Author

This now only has the improved error messages, and a couple of lines with added whitespace where it was over the 92 character limit.

@ScottPJones
Copy link
Contributor Author

Bump: @mbauman just wondering if you'd have a chance to look at this, thanks!

@mbauman
Copy link
Member

mbauman commented Jul 7, 2015

Yes, this looks much better. I've been meaning to get to running a few cursory performance checks, but I think this should be good to go once I get the chance. Sorry I haven't gotten to it yet.

throw(ArgumentError("source and destination must have same size"))
length(ir_dest) != length(ir_src) &&
throw(ArgumentError("source ($(length(ir_src))) and destination ($(length(ir_dest))) must have same size"))
throw(ArgumentError("source ($(length(jr_src))) and destination ($(length(jr_dest))) must have same size"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message seems wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What seems wrong to you? It is the same as the comparision, in this case, ir_dest compared to ir_src, and jr_dest compared to jr_src.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the error message would be sth like

ArgumentError: source (2) and destination (3) must have same size

but 2 and 3 are not the source or destination.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a common pattern is to write source and destination must have the same size (got X and Y)

@ScottPJones
Copy link
Contributor Author

NP! Thanks!

@@ -676,7 +696,7 @@ function hcat{T}(A::AbstractVecOrMat{T}...)
for j = 1:nargs
Aj = A[j]
if size(Aj, 1) != nrows
throw(ArgumentError("number of rows must match"))
throw(ArgumentError("number of rows of A[$j] ($(size(Aj,1))) must match $nrows"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to avoid the A in the error message since it doesn't mean anything to the user of the function. Printing the object is probably too much, just print out the index of the argument is probably ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think you can say number of rows of arrays must match (got $(map(x->size(x, 1), A))).

@nalimilan
Copy link
Member

Great idea!

@ScottPJones
Copy link
Contributor Author

I hope I've addressed all (but one) of the great review comments, thanks very much @nalimilan and @yuyichao for your time!
The only thing that I don't know how to completely address is the message for hvcat(nbc::Integer, as...).
I tried to improve it along the lines of other review comments, but I don't know how to display all the lengths (and it doesn't seem to be checking that, at least as far as I can tell).

@@ -62,7 +62,11 @@ ndims{T<:AbstractArray}(::Type{T}) = ndims(super(T))
length(t::AbstractArray) = prod(size(t))::Int
endof(a::AbstractArray) = length(a)
first(a::AbstractArray) = a[1]
first(a) = (s = start(a); done(a, s) ? throw(ArgumentError("collection must be non-empty")) : next(a, s)[1])
function first(arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather pedantic, I know, but I'd prefer the argument here to remain as a and not arr. This definition isn't just for arrays — it's defined for all iterables. Naming it arr strongly implies that it's just for arrays. If you really want to steer away from the one-letter name, itr would be better.

Similarly, s is commonly used as the iterable's state throughout Julia. str reads as string to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that'd be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fixed now, you're right.
To me, the a also strongly implied it was for arrays, so I do think itr is clearer,
and given that I've spent most all of my time looking at string handle code, where s reads as string and not state, you can see why I like a little clearer names.
(I know you know exactly what you meant, but the idea is to make things clearer for somebody new to this part of the code)

end
st = start(src)
for j = 1:(soffs-1)
done(src, st) && throw(BoundsError())
done(src, st) && throw(BoundsError(src, j))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about this. In this method, src is a general iterable and not an abstract array (there are later specializations for abstract array sources). Are there other places where we use bounds errors for iterables (that aren't arrays) terminating earlier than expected? If not, a more accurate message might be something along the lines of: throw(ArgumentError("source has fewer elements than required, expected at least $soffs, got $(j-1)")).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll fix those up after lunch! Thanks!

@@ -690,7 +710,7 @@ function hcat{T}(A::AbstractVecOrMat{T}...)
for j = 1:nargs
Aj = A[j]
if size(Aj, 1) != nrows
throw(ArgumentError("number of rows must match"))
throw(ArgumentError("number of rows of arrays must match (got $(map(x->size(x,1), A)))"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "number of rows in each array must match (…)"?

@mbauman
Copy link
Member

mbauman commented Jul 9, 2015

Besides a few very minor comments, this looks good. No perf regressions that I can measure, either. Thanks, Scott!

@ScottPJones ScottPJones force-pushed the spj/cover1 branch 2 times, most recently from 6442a56 to e833fb9 Compare July 10, 2015 13:18
@ScottPJones
Copy link
Contributor Author

@mbauman All done (took a while to handle the breakage in the tests from returning ArgumentError instead of BoundsError, esp. doing it the way @tkelman wanted to check that it was the specific error thrown) (I hope the way I did it is OK, I couldn't figure out anything better yet)

@mbauman
Copy link
Member

mbauman commented Jul 10, 2015

The error type changing based upon the type of the source does seem unfortunate, especially in the context of that test file. It is technically more accurate, but I wonder if it'll just be a headache for folks to handle (e.g., if you know that the source might not have enough elements and want to catch the error, this will make your life very difficult).

throw(BoundsError())
dmax > length(dest) && throw(BoundsError(dest, dmax))
doffs < 1 && throw(BoundsError(dest, doffs))
throw(BoundsError(src, soffs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your tests will be a little more sane if this also throws an ArgumentError. But I'm not totally convinced that this is the way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be consistent. Either we should change this to be an ArgumentError here, too, or we should revert the other back to a BoundsError. I don't really have strong feelings either way, but we gotta do one or the other.

@ScottPJones
Copy link
Contributor Author

Ok, this looks clean now, thanks so much for all your time reviewing!!!

@ScottPJones
Copy link
Contributor Author

I've rebased this (because of the other tests being added for abstractarrays), so now it can again me merged. Thanks again for the reviewing!

@mbauman
Copy link
Member

mbauman commented Jul 13, 2015

We still need to resolve #11797 (comment) before this can be merged.

@ScottPJones
Copy link
Contributor Author

Ah, sorry, I missed that one (away disconnecting with family this weekend!).
I think that the ArgumentError is more general, I'll change it that way, and fix the tests to match.

@ScottPJones
Copy link
Contributor Author

I've been digging this, and unfortunately, it's a bit of a rathole. The copy! functions in general can return either BoundsError or ArgumentError (and do so currently, that's not new with this PR).
BoundsError is unfortunately rather limited when it comes to dealing with returning a message for an iterable, which is why I preferred ArgumentError.
Any particular consideration for which ones I should change?
Thanks, Scott.

@mbauman
Copy link
Member

mbauman commented Jul 13, 2015

It's not that bad, is it? I think it's just lines 242 and 276. All the other bounds errors are called on an AbstractArray (dest), which is actually indexable. I don't mind having the function throw two different types of errors, but those pseudo-out-of-bounds errors from the iterable should be the same type. The point is that in your tests you shouldn't need to change error types depending upon the index value (instead just test if the source is an abstract array).

@ScottPJones
Copy link
Contributor Author

Ok, thanks, I'll take care of that when I get home

@ScottPJones
Copy link
Contributor Author

All updated, rebased, tests simpler (and passed), anything else?

@mbauman
Copy link
Member

mbauman commented Jul 14, 2015

LGTM. Squash it, and then this should be good to merge.

Updated messages based on review

Update variable names per Matt

Updated tests to handle ArgumentError instead of BoundsError when passed iterable

More specific exceptions for test_throws

Fix dangling right parenthesis

Update non-abstract array error messages and tests
@ScottPJones
Copy link
Contributor Author

Done!

mbauman added a commit that referenced this pull request Jul 14, 2015
Improve error messages in abstractarray.jl
@mbauman mbauman merged commit cecee20 into JuliaLang:master Jul 14, 2015
@ScottPJones ScottPJones deleted the spj/cover1 branch July 14, 2015 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants