Sample selection when trying to train a Model #527
-
Hi, sorry to bother you, but I have a question~ descriptionIn the Figure 5.9, the demo have a smaple length of 6 And first select 1-6 tokens to train problemThat comes a problem, the 4,5,6 token can never see the 8 9 token. And When we train the model, we use this kind method to train a epoch. The solutions heardAnd I headered that some people use random starting index to select the context~ What I want to knowWhat I want to know is that when you actually train the model in the enterprise, what do you do to get the data sample. Thanks and best regrads. ( sorry for that I do not know how can I get the figure, so I just give the number) Finally, I would like to say that this is the best book I have encountered while studying LLM!!! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
That's a really good observation and question.
Yes, you are correct. Nowadays, some companies also have a long-context pre-training stage at the end of the pre-training cycle where the model specifically is fed whole, long context documents, e.g., with >100k tokens.
Since it's common to only train for 1 training epoch, this wouldn't really address the issue though. Usually the random index is more used to keep things simpler compared to the data loader. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your reply, it helps me a lot. |
Beta Was this translation helpful? Give feedback.
That's a really good observation and question.
Yes, you are correct. Nowadays, some companies also have a long-context pre-training stage at the end of the pre-training cycle where the model specifically is fed whole, long context documents, e.g., with >100k tokens.
Since it's common …