You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My idea is when ProstT5 needs to predict/translate the protein sequence longer than around 500 amino acids long, it has very high chance of translating the 1D protein sequences to all d predictions which is nonsensical. When I was shortening a sequence to 480 amino acids long, the 3Di representation kinda makes sense.
python translate.py -i ./large_5.faa -o out --half 1 --is_3Di 0
Using device: cuda:0
Result directory already exists! - Watch out to not overwriting existing results!
is_3Di is False. (0=expect input to be AA, 1= input is 3Di
##########################
Loading model from: Rostlab/ProstT5
##########################
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
##########################
Input is 3Di: False . Sequence below should be lower-case if input is 3Di.
Translating 1 proteins with an avg. length of 2940 took 2.7[m] (164.3[s/protein])
Writing results ...
python translate.py -i ./large_5.faa -o out --half 1 --is_3Di 0
Using device: cuda:0
Result directory already exists! - Watch out to not overwriting existing results!
is_3Di is False. (0=expect input to be AA, 1= input is 3Di
##########################
Loading model from: Rostlab/ProstT5
##########################
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
##########################
Input is 3Di: False . Sequence below should be lower-case if input is 3Di.
Translating 1 proteins with an avg. length of 900 took 0.6[m] (33.7[s/protein])
Writing results ...
python translate.py -i ./large_5.faa -o out --half 1 --is_3Di 0
Using device: cuda:0
Result directory already exists! - Watch out to not overwriting existing results!
is_3Di is False. (0=expect input to be AA, 1= input is 3Di
##########################
Loading model from: Rostlab/ProstT5
##########################
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
##########################
Input is 3Di: False . Sequence below should be lower-case if input is 3Di.
My idea is when ProstT5 needs to predict/translate the protein sequence longer than around 500 amino acids long, it has very high chance of translating the 1D protein sequences to all d predictions which is nonsensical. When I was shortening a sequence to 480 amino acids long, the 3Di representation kinda makes sense.
python translate.py -i ./large_5.faa -o out --half 1 --is_3Di 0
Using device: cuda:0
Result directory already exists! - Watch out to not overwriting existing results!
is_3Di is False. (0=expect input to be AA, 1= input is 3Di
##########################
Loading model from: Rostlab/ProstT5
##########################
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565##########################
Input is 3Di: False . Sequence below should be lower-case if input is 3Di.
Example sequence: >GUT_GENOME107594_03343
MDNDKRKKVYDVLTSKTGYKDSFDDFNKFMDDGAARRKVYDVLREQTGYRDSFEDFDKFMSSSHQGEMPVPIVPMRPTEPDLSFSNPSAGKFVQNGLDANEELLNDVKAEPVKDGMGRSYFPVKPQKVMEDEIRREYQVEPIESQINNAIVANEEAIRRIEQGKTEDYDRKAEEHPFLHALGGFARQHEGRVPDVNPSNDKEALRNLYAERNKLEEAKKVIEASRTDGILSNISAGVKDAVTDMGFYDFGMTEMSDNSRLLGIKAKLDNKQPLTEDEQKLLNAAALAQGVQSSHQDRISPWYTAGQTTANMAPFMAEMLVNPAAGLGKAAQKAVTKKVSGLVGKSLTRKMLPKLSRVAGDVAGASVMTGTTGIMRTVADAEGRMLGDVNSSIRDGEIVGNGFSGGLDAGEASAKAFGANTIENWSEMLGEYFAPALRGMGMVADKGMRKMGLGRVSDFISDINSTSLARGIDDFLEKTQWNGPIGEIAEEEAGIIANSYITGDNKLSDLTDPRLQLDIVLGVGLFGGFVSGMKTIGYRSPSKIAEKDLKRAEKNASSMFDNWEDIRLEIENTDEEQLPDTLNSIIGHAAGNDNAKKAAIVDYAYSLQKWRGVNAAKLKQTVENPAEAQNLVENEENGTGVEAIPEFDKVSVYRNFKRAERKVAQALPNTPIEKIEDVTDVDKFAADNGLNEEQKSAVTDYMAAKEPYSIYQSDVEARKEVVKSNAREQAVKDAERTSNPDTGFITQVKRKFSETPVYLVGGNLSFGDDGLLDRSNSTETVYYLDENGKRLPAPAEDFDSIVSQSSKEELVANAEAQATADFDAQENESLASPEILPPGIGETVSLDGGTYVIEGADNDNPGNFSALKLNTDGEIEVTPGLSEQISLSPDEYYEAKETELWKNDGIQPEQPVGKVDEVVAPALVQGVDKMPEISAENTSGPEMEDNKVSEETPEQRLQKVVDSLPKKKDGNIDYKALTPQQRFDYTSAAESPEVAIEDLKSDVAVKNEELEKINARLEKAIGGERVELRDTIRSKKKELDELTAFFQSVVPEQPDTSVNEEQPSVPEDVRTDEDYVEWVADNSDDAEEVLGAYSVAKELASHEQTLKPWQRELLGRKVSTSSFVRFGDRNHITGTLAKGWLRKDGEEIDSIAQELTANGVDVSEQDIVDFILDNPSNRVSEISDTMRSLSSRFSEIATKETGIPVGGPESNTGKLYIQLKEANKKIDELTDEQKADMRNALIADMDASDVQRSGDYYESLADYAEQYDRFRDEMNAEEADEAVIRQMEAESPTLYHGGFTADELDDIYSQIENDNGTERQTEDSGETQPALSGDEIEQREEPGIPDAVGAEDSESEGQDSGVVPDIEEIQDNTLDNEDESLSLHLNSKEEENGTISESVPQGERREETPQNGSLEEATDRLRERSRANEEARSRSGKTLSFQEKLAEEARETEAYAKEKGSWIPMSKVFDLGASGPSGNEADTYISQDNHIYKVNNLMNSKGILPLLERVALHNAIFPSSQYELTGFTGFEGGNVYPVLRQRYVPNATLSSPEEIDSYMRSLGFKQTGEAAYSNGDVVISDLRPRNVLKDTDGDLYVVDADFKKEDAVSFEASPISPGENVLDYAERISREKEMHDVRQSVDTNPTDAQKEAGNYKKGHIRLDGYDITIENPKGSERSGTDAKGGKWSVTMNNDYGYIRGTQGVDGDHIDVFLSDDPTTGNVYVIDQVKEDGSFDEHKVMYGFGSALAAKRAYLSNYSKGWNGLGKITQVSKDEFRKWVNSSRRKTKPFAEYKSVKMESDVRTDRQGNPVDADGKLIIEGNRLVTDKRYAELLERMRKKLGGQMNMGVDPEILAIGTEMAVYHIEKGARKFAEYAKAMIADLGDVIRPYLKSFYNGARDLPEMQELAKNMDSYNDVSSFDVVNFDKVIPDVINGIATMAEEKEIKRQADVANAAIKKVRSKNKKKNNNVSLPLGDLFNQNIEEYGKEQRKESDSGSEGNQGTNGQLGEGAWEEDRKSGLQGETGSVSGRDGADADRGGRVHGVSVGRQSSVKRNRNNYSFGDSHIDVPSGDVAKLKANVAAIRTLKEIEESGLPATDEQKAILAKYSGWGGLSNALNDEKYNARKSYYGADKNWNEKYLPYYEQLIELLTPEEFRSAVQSTTTSHYTPETVIRSMWDIAGRIGVKGGDVSEPAMGIGRIIGLMPDETSSRSRISGYEIDSLSGRISKALYPDANIKVQGYETEFFPQSKDLVITNVPFGKQAPYDKALEKTLKKQMKGAYNLHNYFIAKSLLELKEGGIGIFVTSSATMDGASSRFREFASSGGFDLVGAIRLPNDAFQKDAGTSVTADVLVFRRRKSGEKPNEINFISTTQIGEGNYQENGETRTKPIMVNEYFASHPEMMLGEMMTAYDAGSGGLYSGASQTLKAKPGMDLQKALDAAVKKLTENVNIGIENADSRLENTEKEQTTLKNGTLSVKDGKVYVAMNGVLEEIAVKDKFVYSGKTRKTADAVNEYNELKSTLRELISEEQKKGGNPEPLRKKLNEQYDGFVGKYGTLNRNKALDDVFDEDFEHNLPLSLETVRRVPSPTGKSMVYEVEKGKGILDKRVNYPVEEPTKADSVKDAINISRSYKGNIDIPYIARLTGKGEEEVTEEMLRDGSAYRDPLTGTLVDRATYLSGNVKSKLEDARAMAENDPAFDKNVADLEKVQPETIRFGDISYRLGTPWIPAQYINEFAENVLGISGVDVTYMPSLNEFVVGKHARISDFEKSGAIGTDRVGAIDLFAYAINQRKPKIYDEHTEYGPSGSIKVRTVNEAETQAAAEKIMEISDKFIEYIDGRKGIHRELERIYNDKYNNYVLKKYELPSFSHMEKDEDGKEKMVTHYPNSNTSISMREHQARAIQRSIEGSTLLAHQVGTGKTFTMITTAMEMRRLGLARKPMIV
##########################
Average sequence length: 2940 measured over 1 sequences
Parameters used for generation: {'do_sample': True, 'num_beams': 3, 'top_p': 0.95, 'temperature': 1.2, 'top_k': 6, 'repetition_penalty': 1.2}
Example generation for 0_GUT_GENOME107594_03343:
seqs[batch_idx]
ddpvvvvvvvvvvcvppvddddpvvvvvvvpdpvvvvvvvvvvcvppvddddpvvvvvvvvppddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
Translating 1 proteins with an avg. length of 2940 took 2.7[m] (164.3[s/protein])
Writing results ...
python translate.py -i ./large_5.faa -o out --half 1 --is_3Di 0
Using device: cuda:0
Result directory already exists! - Watch out to not overwriting existing results!
is_3Di is False. (0=expect input to be AA, 1= input is 3Di
##########################
Loading model from: Rostlab/ProstT5
##########################
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565##########################
Input is 3Di: False . Sequence below should be lower-case if input is 3Di.
Example sequence: >GUT_GENOME107594_03343
MDNDKRKKVYDVLTSKTGYKDSFDDFNKFMDDGAARRKVYDVLREQTGYRDSFEDFDKFMSSSHQGEMPVPIVPMRPTEPDLSFSNPSAGKFVQNGLDANEELLNDVKAEPVKDGMGRSYFPVKPQKVMEDEIRREYQVEPIESQINNAIVANEEAIRRIEQGKTEDYDRKAEEHPFLHALGGFARQHEGRVPDVNPSNDKEALRNLYAERNKLEEAKKVIEASRTDGILSNISAGVKDAVTDMGFYDFGMTEMSDNSRLLGIKAKLDNKQPLTEDEQKLLNAAALAQGVQSSHQDRISPWYTAGQTTANMAPFMAEMLVNPAAGLGKAAQKAVTKKVSGLVGKSLTRKMLPKLSRVAGDVAGASVMTGTTGIMRTVADAEGRMLGDVNSSIRDGEIVGNGFSGGLDAGEASAKAFGANTIENWSEMLGEYFAPALRGMGMVADKGMRKMGLGRVSDFISDINSTSLARGIDDFLEKTQWNGPIGEIAEEEAGIIANSYITGDNKLSDLTDPRLQLDIVLGVGLFGGFVSGMKTIGYRSPSKIAEKDLKRAEKNASSMFDNWEDIRLEIENTDEEQLPDTLNSIIGHAAGNDNAKKAAIVDYAYSLQKWRGVNAAKLKQTVENPAEAQNLVENEENGTGVEAIPEFDKVSVYRNFKRAERKVAQALPNTPIEKIEDVTDVDKFAADNGLNEEQKSAVTDYMAAKEPYSIYQSDVEARKEVVKSNAREQAVKDAERTSNPDTGFITQVKRKFSETPVYLVGGNLSFGDDGLLDRSNSTETVYYLDENGKRLPAPAEDFDSIVSQSSKEELVANAEAQATADFDAQENESLASPEILPPGIGETVSLDGGTYVIEGADNDNPGNFSALKLNTDGEIEVTPGLSEQISLSPDEYYEAKETELW
##########################
Average sequence length: 900 measured over 1 sequences
Parameters used for generation: {'do_sample': True, 'num_beams': 3, 'top_p': 0.95, 'temperature': 1.2, 'top_k': 6, 'repetition_penalty': 1.2}
Example generation for 0_GUT_GENOME107594_03343:
seqs[batch_idx]
ddpvvlvvvvvvvcvppvddddsvvvvvcvvdpvslvvvvvvvcvvpvddddsvvvvvvvvvvpdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddplvvlvvllvvlvvvlvvlvvvvvvvvvvvvpdddddddddddddddddddddddpvvsvvvnvvsvvvsvvsvvvsvvvvvvvpddlvvllvlllvvlcvvvvvddpppvvvvlvvllvvlvvcvvvvhdddpvsvvsvvvvvvvvvvvvvvvvpddpssvvsnvvnvcvvvvvvcvvdvppdddpvvvvvvvvvvvvpddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
Translating 1 proteins with an avg. length of 900 took 0.6[m] (33.7[s/protein])
Writing results ...
python translate.py -i ./large_5.faa -o out --half 1 --is_3Di 0
Using device: cuda:0
Result directory already exists! - Watch out to not overwriting existing results!
is_3Di is False. (0=expect input to be AA, 1= input is 3Di
##########################
Loading model from: Rostlab/ProstT5
##########################
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565##########################
Input is 3Di: False . Sequence below should be lower-case if input is 3Di.
Example sequence: >GUT_GENOME107594_03343
MDNDKRKKVYDVLTSKTGYKDSFDDFNKFMDDGAARRKVYDVLREQTGYRDSFEDFDKFMSSSHQGEMPVPIVPMRPTEPDLSFSNPSAGKFVQNGLDANEELLNDVKAEPVKDGMGRSYFPVKPQKVMEDEIRREYQVEPIESQINNAIVANEEAIRRIEQGKTEDYDRKAEEHPFLHALGGFARQHEGRVPDVNPSNDKEALRNLYAERNKLEEAKKVIEASRTDGILSNISAGVKDAVTDMGFYDFGMTEMSDNSRLLGIKAKLDNKQPLTEDEQKLLNAAALAQGVQSSHQDRISPWYTAGQTTANMAPFMAEMLVNPAAGLGKAAQKAVTKKVSGLVGKSLTRKMLPKLSRVAGDVAGASVMTGTTGIMRTVADAEGRMLGDVNSSIRDGEIVGNGFSGGLDAGEASAKAFGANTIENWSEMLGEYFAPALRGMGMVADKGMRKMGLGRVSDFISDINSTSLARGIDDFLEKTQW
##########################
Average sequence length: 480 measured over 1 sequences
Parameters used for generation: {'do_sample': True, 'num_beams': 3, 'top_p': 0.95, 'temperature': 1.2, 'top_k': 6, 'repetition_penalty': 1.2}
Example generation for 0_GUT_GENOME107594_03343:
seqs[batch_idx]
ddpvvlvvvvvvvcvvpvddddsvvvvvcvvdpvslvvvvvvvcvvpvddddsvvvvvvvvvvpdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddpvvvvvvvvvvppppdllvvlvvllvvlvvvlvvlvvvvvvvvvvvcvvcvpvvvvvvvvvvpppddppppvvvsvvvnvvsvvvsvlsvllsvlvvvlvpddlvvllvlllvllcvvvvvddpcllvllpdpllvvlvvcvvvvhdddpvsvssvssnvssvvsvvvvvvsddpssvvsnvcnvcvvvvvvcvvcvppppppvvvvvvvvvvvvpddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
The text was updated successfully, but these errors were encountered: