sagawa commited on
Commit
68d5343
1 Parent(s): cc7f70e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -3
README.md CHANGED
@@ -1,3 +1,89 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - biology
5
+ - protein
6
+ ---
7
+ # PLTNUM-ESM2-HeLa
8
+ PLTNUM is a protein language model trained to predict protein half-lives based on their sequences.
9
+ This model was created based on [facebook/esm2_t33_650M_UR50D](https://huggingface.co/facebook/esm2_t33_650M_UR50D) and trained on protein half-life dataset of HeLa human cell line ([paper link](https://pubmed.ncbi.nlm.nih.gov/29414762/)).
10
+
11
+ ### Model Sources
12
+ <!-- Provide the basic links for the model. -->
13
+ - **Repository:** https://github.com/sagawatatsuya/PLTNUM
14
+ - **Paper:**
15
+ - **Demo:** https://huggingface.co/spaces/sagawa/PLTNUM
16
+
17
+ ## Uses
18
+
19
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
20
+
21
+ ## How to Get Started with the Model
22
+
23
+ Use the code below to get started with the model.
24
+
25
+ ```python
26
+ from torch import sigmoid
27
+ import torch.nn as nn
28
+ from transformers import AutoModel, AutoConfig, PreTrainedModel, AutoTokenizer
29
+
30
+
31
+ class PLTNUM_PreTrainedModel(PreTrainedModel):
32
+ config_class = AutoConfig
33
+
34
+ def __init__(self, config):
35
+ super(PLTNUM_PreTrainedModel, self).__init__(config)
36
+ self.model = AutoModel.from_pretrained(self.config._name_or_path)
37
+
38
+ self.fc_dropout1 = nn.Dropout(0.8)
39
+ self.fc_dropout2 = nn.Dropout(0.4)
40
+ self.fc = nn.Linear(self.config.hidden_size, 1)
41
+ self._init_weights(self.fc)
42
+
43
+ def _init_weights(self, module):
44
+ if isinstance(module, nn.Linear):
45
+ nn.init.normal_(module.weight, mean=0.0, std=self.config.initializer_range)
46
+ if module.bias is not None:
47
+ nn.init.constant_(module.bias, 0)
48
+ elif isinstance(module, nn.Embedding):
49
+ nn.init.normal_(module.weight, mean=0.0, std=self.config.initializer_range)
50
+ if module.padding_idx is not None:
51
+ nn.init.constant_(module.weight[module.padding_idx], 0.0)
52
+ elif isinstance(module, nn.LayerNorm):
53
+ nn.init.constant_(module.bias, 0)
54
+ nn.init.constant_(module.weight, 1.0)
55
+
56
+ def forward(self, inputs):
57
+ outputs = self.model(**inputs)
58
+ last_hidden_state = outputs.last_hidden_state[:, 0]
59
+ output = (
60
+ self.fc(self.fc_dropout1(last_hidden_state))
61
+ + self.fc(self.fc_dropout2(last_hidden_state))
62
+ ) / 2
63
+ return output
64
+
65
+ def create_embedding(self, inputs):
66
+ outputs = self.model(**inputs)
67
+ last_hidden_state = outputs.last_hidden_state[:, 0]
68
+ return last_hidden_state
69
+
70
+
71
+ model = PLTNUM_PreTrainedModel.from_pretrained("sagawa/PLTNUM-ESM2-HeLa")
72
+ tokenizer = AutoTokenizer.from_pretrained("sagawa/PLTNUM-ESM2-HeLa")
73
+ seq = "MSGRGKQGGKARAKAKTRSSRAGLQFPVGRVHRLLRKGNYSERVGAGAPVYLAAVLEYLTAEILELAGNAARDNKKTRIIPRHLQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLPKKTESHHKPKGK"
74
+ input = tokenizer(
75
+ [seq],
76
+ add_special_tokens=True,
77
+ max_length=512,
78
+ padding="max_length",
79
+ truncation=True,
80
+ return_offsets_mapping=False,
81
+ return_attention_mask=True,
82
+ return_tensors="pt",
83
+ )
84
+ print(sigmoid(model(input)))
85
+ ```
86
+
87
+ ## Citation
88
+
89
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->