Wals Roberta Sets Top Guide
from transformers import RobertaModel, RobertaTokenizer model = RobertaModel.from_pretrained("roberta-base", output_hidden_states=True) tokenizer = RobertaTokenizer.from_pretrained("roberta-base") outputs = model(input_ids) hidden_states = outputs.hidden_states # Tuple of 13 (embedding + 12 layers) Take top 4 layers (layers 9-12 in 0-indexing for base) top_layer_embeddings = torch.stack(hidden_states[-4:]).mean(dim=0)
Need to dive deeper? Experiment with the code snippets provided, and don’t forget to share your results with the NLP community. wals roberta sets top
class RobertaWALSProjector(nn.Module): def __init__(self, roberta_dim=768, latent_dim=200): super().__init__() self.roberta = RobertaModel.from_pretrained("roberta-base") self.projection = nn.Linear(roberta_dim, latent_dim) def forward(self, input_ids): roberta_out = self.roberta(input_ids).pooler_output return self.projection(roberta_out) This preserves syntactic (lower layers) and semantic (upper
Use a weighted sum of the top 4 layers rather than the final layer only. This preserves syntactic (lower layers) and semantic (upper layers) information. 3.2 Setting the Top-k for WALS Predictions WALS produces a score for every (user, item) pair. But in production, you only return the top-k items. However, the way you set this interacts with RoBERTa embeddings. However, the way you set this interacts with
Then, when setting top-k, compute similarity between user factors and projected RoBERTa embeddings. The predictions will be those with highest dot product. 3.3 Setting the Top Hyperparameters (The SOTA Configuration) To “set top” performance on benchmarks like Amazon Reviews or MovieLens with WALS+RoBERTa, use these hyperparameters:

