-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use generic java.nio.files interfaces in FileSystemStorage #21
Comments
Is this enhancement being worked on, or is it up for grabs? |
Another advantage of using |
@MorrisLaw , Working with files is completely isolated inside Storage, so it should not be hard to write an alternative version, making it a good candidate for contributing |
Also see #9 and the last comment about jimfs, might be worth looking into |
@MorrisLaw I wrote this ticket, but didn't start anything yet. So from what I can see, it's indeed up for grabs. While Additionally, the download itself can use java.nio as well to take the full power of multithreading. |
This is a very good point. While network, message encoders/decoders and storage already use NIO channels and buffers, between those the data is transferred as a binary array (encapsulated in This can be done in parallel with #8 (buffering several subsequent blocks before flushing them to disk) |
@ogregoire @atomashpolskiy Ok guys, I'm going to attempt to tackle this one. |
@ogregoire @atomashpolskiy Just updating everyone on the progress. I'm running into issues with the buildClient(int port) class in examples and tests are failing. Here are some of the changes so far. Any input on the branch I'm working on is also welcome. /~https://github.com/MorrisLaw/bt/tree/performance/file-storage |
@MorrisLaw , I've looked through your changes. I appreciate your effort, but there is a couple of issues with your code. I understand that you're a beginner, so it's fine. Let me give you some tips:
// bt/cli/CliClient.java:119
Files.getFileStore((Path) options); // => java.lang.ClassCastException: bt.cli.Options cannot be cast to java.nio.file.Path You might want to skim through the corresponding JLS Chapter 5. Conversions and Promotions, specifically section 5.1.6. Narrowing Reference Conversion. Keep in mind that in most cases casting is bad and probably means that you're doing something wrong.
// bt/data/file/FileSystemStorageUnit.java:53
if (Files.createFile(file) == null) { // => always false
...
} If you look inside these methods, you'll see that they always return the first argument (
// bt/data/file/FileSystemStorageUnit.java:185
public long size() {
...
return (long)Files.size(file); // => casting is redundant, Files.size(Path) already returns `long`
...
}
/*
* @since 1.0
*/
// bt/data/file/FileSystemStorage.java:62
public FileSystemStorage(File rootDirectory) { // => can't just change the parameter type to java.nio.file.Path
...
} This one is especially bad: // bt/data/StorageUnit.java:42
ByteBuffer readBlock(long offset, int length); // => wrapping a byte array into buffer does not make much sense Of course the API is not limited to methods only; it includes class and package names, thrown exceptions, behavior that is promised in the javadoc, even side effects. So in most cases it's preferable to only add new API without changing/removing the old one (if old API is obsolete, it should be marked as deprecated and noticed in upgrade notes, so that the client code can migrate safely before it's finally removed).
To sum up: I suggest that you discard your changes and start over, rebasing onto the latest commit in master. Keeping the above comments in mind, you need to do just the following:
|
Thank you for the thorough response !
…On Aug 5, 2017 8:50 AM, "Andrei Tomashpolskiy" ***@***.***> wrote:
@MorrisLaw </~https://github.com/morrislaw> , I've looked through your
changes. I appreciate your effort, but there is a couple of issues with
your code. I understand that you're a beginner, so it's fine. Let me give
you some tips:
1. You can't cast an object reference to some arbitrary type, just to
the same type or a supertype (plus a few more specific cases), otherwise
you'll get a ClassCastException in runtime. E.g.
// bt/cli/CliClient.java:119Files.getFileStore((Path) options); // => java.lang.ClassCastException: bt.cli.Options cannot be cast to java.nio.file.Path
You might want to skim through the corresponding JLS Chapter 5.
Conversions and Promotions
<https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html>,
specifically section *5.1.6. Narrowing Reference Conversion*.
*Keep in mind that in most cases casting is bad and probably means that
you're doing something wrong.*
1. Some nullity checks in FileSystemStorageUnit don't make sense, e.g.
// bt/data/file/FileSystemStorageUnit.java:53if (Files.createFile(file) == null) { // => always false
...
}
If you look inside these methods, you'll see that they always return the
first argument (Path to file) or throw an exception.
1. Another issue with casting:
// bt/data/file/FileSystemStorageUnit.java:185public long size() {
...
return (long)Files.size(file); // => casting is redundant, Files.size(Path) already returns `long`
...
}
1. As a general rule you shouldn't (and per this issue don't need to)
change public API. Some client code may depend on certain
methods/constructors having specific signatures, and changing the signature
means breaking the client code, e.g.
/* * @SInCE 1.0 */
// bt/data/file/FileSystemStorage.java:62
public FileSystemStorage(File rootDirectory) { // => can't just change the parameter type to java.nio.file.Path
...
}
This one is especially bad:
// bt/data/StorageUnit.java:42ByteBuffer readBlock(long offset, int length); // => wrapping a byte array into buffer does not make much sense
Of course the API is not limited to methods only; it includes class and
package names, thrown exceptions, behavior that is promised in the javadoc,
even side effects. So in *most* cases it's preferable to only *add* new
API without changing/removing the old one (if old API is obsolete, it
should be marked as deprecated and noticed in upgrade notes, so that the
client code can migrate safely before it's finally removed).
1. dependency-reduced-pom.xml is generated by maven-shade-plugin and
should not be checked into git repository. It's my fault that I haven't
added the corresponding rule to .gitignore, I've updated it in add949a
<add949a>
.
*To sum up*: I suggest that you discard your changes and start over,
rebasing onto the latest commit in master. Keeping the above comments in
mind, you need to do just the following:
- add a new constructor FileSystemStorage(Path). Old constructor
FileSystemStorage(File) should delegate object creation to the new
constructor by calling this(rootDirectory.toPath())
- update FileSystemStorageUnit internals to use java.nio.files
interfaces
- verify that all tests pass
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
</~https://github.com/notifications/unsubscribe-auth/AG0LT5QX5ZgaiYiDpxzSYCTp6zPNJThCks5sVGT7gaJpZM4Orqid>
.
|
By the way, it turns out |
There is no need to use Here's how to use nio's RAF:
You can get a Finally regarding the interface(s), indeed, you need to keep the existing, but you can create another method next to it with a
Will become:
Then, of course, you need to rewrite all the code that use If needed, I can work on this issue, but not before next week. Ideally the only |
We do want to get rid of raf.seek(offset);
raf.getChannel().write(buffer);
Valid point, but we were discussing constructors ¯_(ツ)_/¯ |
Yeah, but I was mistaken (it's been a long time I worked with io in general), there is no need for a |
@ogregoire , makes sense now, but this behavior is not always beneficial, e.g. when streaming media I want the data to be written to disk immediately. So it should be explicitly configurable by the user (either by some flag or better by instantiating a different implementation class). Common parts can be extracted into an abstract supertype |
You can simply use |
I get that but my point was that whether flush is eventual or immediate should be conditional and based on user's preference and/or current mode of operation. There is also a problem of how much memory can/should be used. We can't just map all data into memory, so there should be some centralized management of what is loaded and what is discarded at each given moment. Initially I was thinking about buffering data at the I/O worker level, which has (indirect) access to the storage and can decorate it as needed. Additionally it has the statistics of when/how often each piece is requested by peers, so it can act like a full-fledged cache, re-using the same mapped buffers. |
I guess there must be some misunderstanding here: it's entirely possible to have a storage that works as you expect using nio and channels and mappedbytebuffer. The thing is that currently you always go through a byte[], which is far from efficient. The idea is to get rid of this temporary buffer by using the buffers we get from the MappedByteBuffer. Regarding memory, it's a mappedbytebuffer, so it's memory that's directly mapped to the drive. No system calls. I don't see why it should be a user-preference: the user will want it to be written to disk, according to the best way to handle the disk. This is what is done by default. Know what? I'll fork, put my changes in, and show them so if you like that, let's merge, if not, maybe use it as a base? But as I said, I can't start before Monday at the earliest. Oh, and I can't guarantee I'll be able to answer anymore today or tomorrow. |
@ogregoire If you already have a solution maybe you can complete the ticket Monday. Otherwise as far as @atomashpolskiy original suggestion, itseemed like something I can achieve sometime this weekend. But reading through these comments, it looks like it's going to be more involved. Is this still the case: Or is there a different issue or just different opinions on how to fix the same issue ? |
There might be a misunderstanding, because
I'm not sure where this comes from; there's a
I completely understand your idea. I'm just trying to argue, that:
I might be communicating not clearly enough though, excuse me if so, because English is not my first language :)
Absolutely. Whatever we do with regards to intermediate buffering will be built upon the initial refactoring of |
Wow, this whole topic is a hell of a lot of words. I personally am going to go and read something about mapped memory, because it might be that I just misunderstand the concept and how it works. Cheers |
Here's another crack at the enhancement. Is this acceptable ? |
Actually some tests failed, so I'm looking into that now. I only tested bt-core at first, then when running all of them it failed the build.
|
Ahaha nvm, it looks like I ran
|
Still working on errors, it appears that I'm failing the CI tests. |
@MorrisLaw Just a small note from a very superficial review: you used |
Thank you for spotting that. I will refactor the code.
…On Aug 6, 2017 6:55 PM, "Olivier Grégoire" ***@***.***> wrote:
@MorrisLaw </~https://github.com/morrislaw> Just a small note from a very
superficial review: you used Files.createDirectory(parent) where you
should actually use Files.createDirectories(parent). The difference
between the two is that if you call the first method with a/b/c when only
a exists, you'll get an error because b doesn't exist, where with the
second method, you'll get c and all its missing parents created without
troubles.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
</~https://github.com/notifications/unsubscribe-auth/AG0LT8HQVr65b8AtWe4AZMzidAnn5w3aks5sVkRdgaJpZM4Orqid>
.
|
Okay, I reviewed the code, and added several comments (see list with a comment icon). I still do think it's better to use |
@ogregoire , I'm genuinely interested, what you have in mind, and I'm all for experimentation, as long as it's not the default behavior. |
Well, basically there's currently a design flaw. It's a subtle one, so what you did was actually good, but in case of using So I would suggest using such an interface (comments removed for brievity):
Obviously I deleted the Okay, now why in the world would I make the read operation return a Well, if we keep the same The important part of
This means that if we use the byte buffers that we get from Concretely, this means that we will provide directly the mapped file-region to the network buffer. Meaning that we skip Java's heap entirely by not copying |
Ok, we definitely need more details here. As already mentioned in this topic, FileChannel class has a map() method, that is full of black magic and trickery (e.g. trying to gracefully handle OutOfMemory errors by asking JVM to perform garbage collection to trigger deallocation of previously allocated native memory buffers... you get the idea), but really boils down to a native method and mmap system call. What The The actual type of the returned object is Considering Java, there are two peculiarities to keep in mind when working with
Here are the implications:
Therefore, while it may be beneficial to naively memory-map all files in some cases, more specifically:
I don't think it's as simple as you want it to be in general case. Brief write-up on copy vs mmap by Linus Torvalds. |
With regards to this:
You are being overly optimistic about the code :) There are several layers between storage and network, and some of them currently rely on passing each block as |
I'm aware of that and I'm not saying the change is trivial or even simple. What I highlighted is only the root for all work. You have several concerns and I'm aware of them. I was aware of them before you raised them because I had them in the past. as well You're saying we're providing Switching to nio is no trivial case and must be done application-wide. There are a lot of changes and perhaps it might be necessary to rethink some ways the library works at its core. It might be good for a further major version, and not for minor one. |
It won't be a problem to add methods that work with buffers rather than byte arrays. I think it was kind of an unconscious decision to use
I like your attitude! No joking :)
As I've said previously, networking already uses NIO... The problem is that working with storage (reads and writes) is asynchronous in respect to networking. In short, current design is based on the following:
I agree that piping data between |
Most importantly it provides a mapping overlay, that relieves us from thinking about files and instead provides a view of torrent's data in the form of a contiguous byte array. |
Sorry, that's a deformation from French in which "we" can be both "us" and the undefined pronoun (for French-speaking guys, I'm speaking about "on", not "nous"). Regarding the rest, I discovered your repo through your HN post last week, I was impressed and wanted to contribute. I haven't had the time to understand all the design yet. I know how the BitTorrent protocol works in the big lines (b-encoding, distribution of chunks, etc). But I most certainly haven't delved in it like you did to develop this project. However I have some experience with nio, though it's not natural for me, I still prefer good ol' Input/OutputStream. So indeed I might be naive on some topics, but I hope to help in the best of my knowledge. |
@ogregoire , man, come on, I'm glad that you're contributing, and I'm very thankful. I immensely enjoy our conversation and under no circumstances would like to seem "smarter" or whatever. In fact I'm sure as hell that you are much more knowledgeable than me, and I hope you will continue to share your knowledge. I just wanted this conversation to pave a road from A (where we are) to B (where we'd like to be), so I thought it would be beneficial to start a more in-depth discussion. And I totally imply that I might be wrong in what I'm saying, so you're welcome to correct and provide your own arguments! Peace! |
Hmm, I guess we weren't on the same page: I'm still in ;) I'll try to work on a prototype where I introduce those changes gradually, if that's okay. Probably in my own branch or something for quite a while. We'll see. |
Sure! I also have some other global tasks in mind, which would require research and quite possibly re-thinking approach and APIs. I think you could do a great job on these, so if you're interested, I'll try to dump my thoughts in a couple of tickets |
The API as shown now still uses the outdated
java.io.File
API. Since this is a Java-8 only library, it should support the more recentjava.nio
package for better client write-to-disk performances.The text was updated successfully, but these errors were encountered: