Skip to content

Sample selection when trying to train a Model #527

Answered by rasbt
azraelxuemo asked this question in Q&A
Discussion options

You must be logged in to vote

That's a really good observation and question.

That comes a problem, the 4,5,6 token can never see the 8 9 token.
And the same questions exist when context-length increase to 1024.
The part of the first batch token, can never see the part of second batch token.

And When we train the model, we use this kind method to train a epoch.
So I think it may cause some problems.

Yes, you are correct. Nowadays, some companies also have a long-context pre-training stage at the end of the pre-training cycle where the model specifically is fed whole, long context documents, e.g., with >100k tokens.

And I headered that some people use random starting index to select the context~

Since it's common …

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by rasbt
Comment options

You must be logged in to vote
1 reply
@rasbt
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants