Releases: AI-Hypercomputer/JetStream
Releases · AI-Hypercomputer/JetStream
v0.3
Key Changes
- Observability improvements in JetStream Server (prometheus metrics)
- Tensorboard support for remote access
- Engine API update for TTFT and TPOT measurements
- Hugginface tokenizer support
- Copybara G3 support
- Threading optimizations
What's Changed
- Add tensorboard plugin dep for remote access by @JoeZijunZhou in #97
- Update benchmark config for xlml automation by @morgandu in #96
- Minor fix by @morgandu in #98
- Add ssh port forward support for profile readme by @FanhaiLu1 in #99
- Add inference sampling utils in JetStream by @JoeZijunZhou in #100
- Add profiling server for proxy backend by @zhihaoshan-google in #101
- Change
jetstream_slots_available_percentage
tojetstream_slots_used_percentage
by @Bslabe123 in #102 - Bump urllib3 from 2.2.0 to 2.2.2 in the pip group across 1 directory by @dependabot in #104
- Added
jetstream_transfer_backlog_size
andjetstream_generate_backlog_size
metrics by @Bslabe123 in #103 - Update docs for benchmark warmup mode by @JoeZijunZhou in #106
- Update docs with metrics observation instructions by @Bslabe123 in #107
- Prefill return first token by @jwyang-google in #105
- change the detokenization thread to return the actual eos token. by @jwyang-google in #108
- Add loadgen in dev image by @morgandu in #109
- Bump certifi from 2024.2.2 to 2024.7.4 in the pip group by @dependabot in #110
- Bump zipp from 3.17.0 to 3.19.1 in the pip group by @dependabot in #111
- Model warmup support with AOT and endpoint for JetStream by @vivianrwu in #92
- Cleanup orchestrator proto by @JoeZijunZhou in #112
- Update images for mlperf by @morgandu in #113
- image fix by @morgandu in #114
- del prefill_result & update dev image by @morgandu in #116
- Fix benchmark script for saving benchmark result by @lsy323 in #117
- Add
jetstream_server_startup_latency
metric by @Bslabe123 in #118 - Add http server to JetStream by @JoeZijunZhou in #115
- Free engine resource for the slot after finished one request decoding by @FanhaiLu1 in #119
- Add
jetstream_request_success_count
metric by @Bslabe123 in #124 - Request input/output size metrics by @Bslabe123 in #123
- Makefile by @Bslabe123 in #125
- Various request time metrics by @Bslabe123 in #121
- Standalone JetStream removes pinned deps by @JoeZijunZhou in #129
- Update deps file by @JoeZijunZhou in #130
- Manual model warmup to resolve AOT model warmup performance degradation by @vivianrwu in #126
- Update JetStream instructions by @yeandy in #132
- Add an optional parameter for sampling in prefill / sample. by @qihqi in #133
- remove excessive logs in production run by changing from DEBUG to INFO by @jwyang-google in #134
- Change the default message for requester.py and remove mlperf 4.1 install for proxy version support. by @zhihaoshan-google in #136
- Change previewutilities -> pathwaysutils by @vivianrwu in #138
- Add option to use hf tokenizer by @RissyRan in #147
- Rename third_party folder to Avoid Copybara g3 Errors by @jyj0w0 in #148
- add seperate prefill detokenization thread by @zhihaoshan-google in #152
- Revert the change created by copybara by @jyj0w0 in #156
New Contributors
- @lsy323 made their first contribution in #117
- @RissyRan made their first contribution in #147
- @jyj0w0 made their first contribution in #148
Full Changelog: v0.2.2...v0.3
v0.2.2
Key Changes
- Enable observability in JetStream Server (prometheus metrics)
- Enable JAX profiler support on single-host JetStream Server
- Support both text and token ids I/O for JetStream Decode API
- Add health check API
- Support MLPerf evaluation
- Enable JetStream Server E2E tests
- Increase unit test coverage (>=96%)
What's Changed
- Accuracy eval mlperf by @jwyang-google in #76
- Add metadata metrics by @yeandy in #77
- Fix pad_tokens function description by @FanhaiLu1 in #80
- Prometheus Metrics by @Bslabe123 in #71
- Update JetStream grpc proto to support I/O with text and token ids by @JoeZijunZhou in #78
- Update benchmark script to easily test llama-3 by @bhavya01 in #83
- Unit test coverage cleanup by @JoeZijunZhou in #81
- Allow tokenizer to customize stop_tokens by @qihqi in #84
- Decode Batch Percentage Metrics/Improved Scraping by @Bslabe123 in #82
- Bump requests from 2.31.0 to 2.32.0 in the pip group across 1 directory by @dependabot in #86
- Add profiling support and update docs by @JoeZijunZhou in #85
- Add ray disaggregated serving support by @FanhaiLu1 in #87
- Ensure server warmup before benchmark by @JoeZijunZhou in #91
- Add healthcheck support for JetStream by @vivianrwu in #90
- Add JetStream E2E test CI by @JoeZijunZhou in #89
- Release v0.2.2 by @JoeZijunZhou in #95
New Contributors
- @jwyang-google made their first contribution in #76
- @Bslabe123 made their first contribution in #71
- @vivianrwu made their first contribution in #90
Full Changelog: v0.2.1...v0.2.2
v0.2.1
Key Changes
- Support Llama3 tokenizer
- JetStream Tokenizer refactor
- Disaggregation preparation work
What's Changed
- add sample_idx in InputRequest for debugging by @morgandu in #32
- Update README.md with user guides by @JoeZijunZhou in #34
- Update README.md with PT user guide by @JoeZijunZhou in #35
- Reorganize unit tests and update CICD by @JoeZijunZhou in #37
- Add badges for JetStream by @JoeZijunZhou in #38
- Bump idna from 3.6 to 3.7 by @dependabot in #39
- Reformat benchmark metrics by @yeandy in #42
- Update server host default value by @JoeZijunZhou in #43
- Refactor readme by @FanhaiLu1 in #41
- Add missing Documentation by @FanhaiLu1 in #47
- Update README.md to fix broken link by @charbull in #50
- Add np padded token support by @FanhaiLu1 in #49
- Format token utils and test by @FanhaiLu1 in #51
- Align Tokenizer in JetStream by @JoeZijunZhou in #40
- Do nothing for nd array in copy_to_host_async by @FanhaiLu1 in #52
- Add jax_padding support driver and server lib by @FanhaiLu1 in #54
- Update maxtext user guide by @JoeZijunZhou in #56
- Fix benchmark script type issue by @JoeZijunZhou in #59
- Fix requester flag default value by @JoeZijunZhou in #60
- Fix float division by zero in benchmark by @FanhaiLu1 in #62
- Register IFRT proxy backend when proxy is defined in the jax_platforms by @zhihaoshan-google in #63
- Add an abstract class for Tokenizer by @bhavya01 in #53
- refactor slice_to_num_chips to adapt to Cloud config by @zhihaoshan-google in #65
- Support llama3 tokenizer by @bhavya01 in #67
- Prerequisite work for supporting disaggregation: by @zhihaoshan-google in #68
- Create init.py in Jetstream/third_party by @bhavya01 in #69
- Add tokenize_and_pad function to backward compatible by @FanhaiLu1 in #70
- Release v0.2.1 by @JoeZijunZhou in #72
- Bump tqdm from 4.66.1 to 4.66.3 in the pip group across 1 directory by @dependabot in #73
- Release v0.2.1 with docs update by @JoeZijunZhou in #74
New Contributors
- @dependabot made their first contribution in #39
- @yeandy made their first contribution in #42
- @charbull made their first contribution in #50
- @zhihaoshan-google made their first contribution in #63
- @bhavya01 made their first contribution in #53
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Major Changes
- Support JetStream MaxText inference on Cloud TPU VM
- Support JetStream Pytorch inference on Cloud TPU VM
- Support Continuous Batching with interleaved mode in JetStream
- Support online serving benchmarking
What's Changed
- Add unit tests CI github action by @JoeZijunZhou in #1
- Refine thread in orchestrator by @JoeZijunZhou in #2
- Optimize maximum threads to saturate decoding capacity by @JoeZijunZhou in #3
- Add benchmarks maximum threads config by @JoeZijunZhou in #4
- First support necessary for MaxText by @rwitten in #5
- Support gracefully stopping orchestrator and server by @JoeZijunZhou in #6
- Save request outputs and add eval accuracy support by @FanhaiLu1 in #8
- Use parameter based num as inference request max output length by @FanhaiLu1 in #10
- Fix output token drop issue by @JoeZijunZhou in #9
- Add option to warm up by @qihqi in #11
- Replace token_list with generated_text in saved outputs by @FanhaiLu1 in #12
- Refine requester util by @JoeZijunZhou in #15
- Adds filtering for sharegpt based on conversation starter. by @patemotter in #17
- Allows more requests than available data. by @patemotter in #19
- Fix starvation with async server and interleaving optimization by @JoeZijunZhou in #13
- Add Token util unit test by @FanhaiLu1 in #20
- Fix llama2 decode bug in tokenizer by @FanhaiLu1 in #22
- Fix whitespace replacement bug by @FanhaiLu1 in #24
- Update benchmark to run openorca dataset by @morgandu in #21
- Add model ckpt conversion and AQT scripts for JetStream MaxText Serving by @JoeZijunZhou in #23
- Refactor to sample before tokenize by @morgandu in #26
- Update ckpt conversion scripts by @JoeZijunZhou in #25
- move tokenizer model to third party llama2 by @FanhaiLu1 in #27
- Support JetStream MaxText user guide by @JoeZijunZhou in #28
- Enable pylint linter and pyink formatter by @JoeZijunZhou in #29
- Update README by @JoeZijunZhou in #30
- Release v0.2.0 by @JoeZijunZhou in #31
New Contributors
- @JoeZijunZhou made their first contribution in #1
- @rwitten made their first contribution in #5
- @FanhaiLu1 made their first contribution in #8
- @qihqi made their first contribution in #11
- @patemotter made their first contribution in #17
- @morgandu made their first contribution in #21
Full Changelog: /~https://github.com/google/JetStream/commits/v0.2.0