Bilingual Language Model for Next Token Prediction

Overview

This project focuses on building a neural network-based language model for next token prediction using two languages: English and Hebrew. The model is implemented using an LSTM (Long Short-Term Memory) architecture, designed to predict the next word in a sequence based on the training data provided. The project leverages Recurrent Neural Networks (RNNs) and evaluates the model using the perplexity metric to measure the quality of the predictions.

The final model and checkpoints are provided, along with training history including perplexity and loss values.

Model Architecture

Embedding Layer: Converts tokenized words into dense vector representations.
LSTM Layer: Consists of 128 units to capture long-term dependencies in the sequence data.
Dense Output Layer: Outputs a probability distribution over the vocabulary to predict the next word.
Total Vocabulary Size: The model is trained on a corpus of size [total_words] (combining both English and Hebrew datasets).

Dataset

The model is trained using a combination of English and Hebrew text datasets. The input sequences are tokenized and padded to ensure consistent input length for training the model.

Training

The model was trained with the following parameters:

Optimizer: Adam
Loss Function: Categorical Crossentropy
Batch Size: 64
Epochs: 20
Validation Split: 20%

Evaluation Metric: Perplexity

Perplexity is used to measure the model's performance, with lower perplexity indicating better generalization to unseen data. The final perplexity scores are:

Final Training Perplexity: [Final Training Perplexity]
Final Validation Perplexity: [Final Validation Perplexity]

Checkpoints

A checkpoint mechanism is used to save the model at its best-performing stage based on validation loss. The best model checkpoint (best_model.keras) is included, which can be loaded for inference.

Results

The model demonstrates competitive performance in predicting next tokens for both English and Hebrew, achieving satisfactory perplexity scores on both training and validation datasets.

How to Use

To use this model, follow these steps:

Clone the repository:

git clone https://huggingface.co/username/model-name
cd model-name