-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Simplifications and some fun stuff for the MNIST Gluon tutorial #13094
Conversation
@kohr-h Thanks for your contribiution! @mxnet-label-bot [pr-work-in-progress, Gluon] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the much needed update @kohr-h, can I bother you to update the data loading to be pure Gluon rather than using the old DataIter API ? That means no need to reset the iterator between epoch and the data comes out of the iterator as a tuple (data, label)
rather than a batch.data
and batch.label
train_dataset = mx.gluon.data.vision.MNIST(train=True).transform_first(mx.gluon.data.vision.transforms.ToTensor())
test_dataset = mx.gluon.data.vision.MNIST(train=False).transform_first(mx.gluon.data.vision.transforms.ToTensor())
train_data = mx.gluon.data.DataLoader(train_dataset, shuffle=True, last_batch='rollover', batch_size=batch_size, num_workers=2)
test_data = mx.gluon.data.DataLoader(train_dataset, shuffle=False, last_batch='rollover', batch_size=batch_size, num_workers=2)
No problem. I've started doing that, more to come. |
Thanks for your contribution @kohr-h |
Thanks for the reminder @kalyc. I got too many balls in the air. Will get this done asap. |
I'm done with the rewording and restructuring of the example. Note that I also made the CNN part more basic by moving from a custom Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of nit-picking on the formatting but overall when these are addressed I think we should be good to merge. Great work on the wording and explanations. Thanks for the contribution 👍
- Apply hybrid blocks - Move outputs outside of code blocks and mark for notebooks to ignore - Remove images, use external link - Fix a few formulations
I think I addressed all the comments from the review. |
@sandeep-krishnamurthy @stu1130 for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thank you.
1 question - Why activation function in last layer?
No real reason. I think I didn't change the original logic, but I can check again. If a different last activation is more reasonable for this tutorial, I can put it in, no problem. |
Sigmoid activation is a better fit here to output the probability of 10 digits. |
@sandeep-krishnamurthy I switched both nets to sigmoid activation. Note though that they perform worse with the same settings. I've tested a few learning rates, and the current ones improve things a bit. But maybe it's okay for the scope of the tutorial that the training isn't perfectly fine-tuned. |
Okay, now we're in the right ballpark with the results. Thanks @ThomasDelteil for the catch. |
…icense file" (#13558) * Revert "Chi_square_check for discrete distribution fix (#13543)" This reverts commit cf6e8cb. * Revert "Updated docs for randint operator (#13541)" This reverts commit e0ff3c3. * Revert "Simplifications and some fun stuff for the MNIST Gluon tutorial (#13094)" This reverts commit 8bbac82. * Revert "Fix #13521 (#13537)" This reverts commit f6b4665. * Revert "Add a retry to qemu_provision (#13551)" This reverts commit f6f8401. * Revert "[MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (#13531)" This reverts commit bd8e0f8. * Revert "[MXNET-1249] Fix Object Detector Performance with GPU (#13522)" This reverts commit 1c8972c. * Revert "Fixing a 404 in the ubuntu setup doc (#13542)" This reverts commit cb0db29. * Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (#13478)" This reverts commit 40db619.
…he#13094) * Simplify mnist Gluon tutorial and add mislabelled sample plotting * Add mnist Gluon tutorial images * Gluon MNIST tutorial: Use modern Gluon constructs, fix some wordings * [Gluon] Move to data loaders and improve wording in MNIST tutorial * Fix broken links * Fix spelling of mislabeled * Final rewordings and code simplifications * Fix things according to review - Apply hybrid blocks - Move outputs outside of code blocks and mark for notebooks to ignore - Remove images, use external link - Fix a few formulations * Change activations to sigmoid in MNIST tutorial * Remove superfluous last layer activations in MNIST tutorial
…icense file" (apache#13558) * Revert "Chi_square_check for discrete distribution fix (apache#13543)" This reverts commit cf6e8cb. * Revert "Updated docs for randint operator (apache#13541)" This reverts commit e0ff3c3. * Revert "Simplifications and some fun stuff for the MNIST Gluon tutorial (apache#13094)" This reverts commit 8bbac82. * Revert "Fix apache#13521 (apache#13537)" This reverts commit f6b4665. * Revert "Add a retry to qemu_provision (apache#13551)" This reverts commit f6f8401. * Revert "[MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (apache#13531)" This reverts commit bd8e0f8. * Revert "[MXNET-1249] Fix Object Detector Performance with GPU (apache#13522)" This reverts commit 1c8972c. * Revert "Fixing a 404 in the ubuntu setup doc (apache#13542)" This reverts commit cb0db29. * Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (apache#13478)" This reverts commit 40db619.
* upstream/master: (54 commits) Add notes about debug with libstdc++ symbols (apache#13533) add cpp example inception to nightly test (apache#13534) Fix exception handling api doc (apache#13519) fix link for gluon model zoo (apache#13583) ONNX import/export: Size (apache#13112) Update MXNetTutorialTemplate.ipynb (apache#13568) fix the situation where idx didn't align with rec (apache#13550) Fix use-before-assignment in convert_dot (apache#13511) License update (apache#13565) Update version to v1.5.0 including clojure package (apache#13566) Fix flaky test test_random:test_randint_generator (apache#13498) Add workspace cleaning after job finished (apache#13490) Adding test for softmaxoutput (apache#13116) apache#13441 [Clojure] Add Spec Validations for the Random namespace (apache#13523) Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file" (apache#13558) Chi_square_check for discrete distribution fix (apache#13543) Updated docs for randint operator (apache#13541) Simplifications and some fun stuff for the MNIST Gluon tutorial (apache#13094) Fix apache#13521 (apache#13537) Add a retry to qemu_provision (apache#13551) ...
Description
This PR removes some of the (IMO) unnecessary complexity from the tutorial (one of the first that new users click on) by always using 1 GPU or 1 CPU. This makes the MLP training code much more straightforward (no splitting, no inner loop).
I also think that there was too much staccato of obvious comments in that part, while the truly non-obvious parts were left uncommented (e.g., what is
ag.record()
?). I tried to straighten up that part a bit, although I think some comments are still a bit on the obvious side.Furthermore, I added a small section where some of the mislabeled (apparently it's one "l", need to fix that) images are plotted together with the predicted and true labels. I think that's more fun than just looking at an accuracy score.
I'm not finished with the whole tutorial yet, so it's WIP, but I'd like to know what you think before I put more work into this PR.
Note: I've included the 2 new images in the same directory as the tutorial for convenience, they would obviously need to go somewhere else.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.