Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

快速测试代码段OOM #36

Open
LiuZhihhxx opened this issue Jul 28, 2023 · 2 comments
Open

快速测试代码段OOM #36

LiuZhihhxx opened this issue Jul 28, 2023 · 2 comments

Comments

@LiuZhihhxx
Copy link

运行readme.md的快速测试代码

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_path = "LinkSoul/Chinese-Llama-2-7b"

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("用中文回答,When is the best time to visit Beijing, and do you have any suggestions for me?")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()处,内存暴涨至100%后终止,报如下信息:

You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
[2023-07-28 10:55:34,419] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
进程已结束,退出代码137 (interrupted by signal 9: SIGKILL)

请问是什么原因?
(使用 /~https://github.com/lvwerra/trl 进行微调完全可以顺利运行,显存基本可以满载,说明cuda应该是没问题的)

@541wsy
Copy link

541wsy commented Jul 31, 2023

我也是同样的问题

@rufeng-h
Copy link

rufeng-h commented Aug 9, 2023

解决了吗,我加载成功了,但一直卡在推理

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants