
Attention layers are integral to the Transformer architecture. The paper introducing the Transformer was titled “Attention Is All You Need,” highlighting the importance of attention layers.
Function of attention layers: These layers direct the model to focus on specific words in a sentence, while downplaying the importance of others.
Contextual meaning: The meaning of a word depends not only on the word itself but also on its context, which includes other words around it.
The Transformer architecture was initially designed for translation.
Decoder models use only the decoder part of a Transformer model.
Attention mechanism: At each stage, the attention layers can only access the words that are positioned before the current word in the sentence.
These models are often referred to as auto-regressive models.
Pretraining: Typically focuses on predicting the next word in a sentence.
Best suited for: Tasks involving text generation.
Examples of decoder models:

Step 1: Input Embedding Converts words into numerical vectors:
Text: "The cat eats"
↓
Tokens: [The] [cat] [eats]
↓
Each becomes a dense vector (e.g., 768 numbers)
[The] → [0.23, -0.45, 0.67, ..., 0.12]
Step 2: Positional Encoding Adds position information since transformers don’t inherently understand word order:
Position 0: Gets encoding vector
Position 1: Gets different encoding vector
Position 2: Gets another different encoding vector
Final = Word Embedding + Position Encoding
Step 3: Transformer Block (repeated N times)
3.1 - Masked Multi-Head Attention
The Causal Mask:
When processing "eats" at position 2:
Can see: [The] [cat] [eats]
Cannot see: [the] [mouse] (future tokens)
Attention mask matrix:
The cat eats the mouse
The [ ✓ ✗ ✗ ✗ ✗ ]
cat [ ✓ ✓ ✗ ✗ ✗ ]
eats [ ✓ ✓ ✓ ✗ ✗ ]
the [ ✓ ✓ ✓ ✓ ✗ ]
mouse [ ✓ ✓ ✓ ✓ ✓ ]
(✓ = can see, ✗ = blocked)
Why “Multi-Head”? Having multiple attention heads (e.g., 12) allows the model to focus on different aspects simultaneously: - Head 1: Subject-verb relationships - Head 2: Semantic meaning - Head 3: Long-range dependencies - etc.
3.2 - Add & Norm Two important techniques: - Residual Connection: Adds the input back to the output (prevents information loss in deep networks) - Layer Normalization: Stabilizes the numbers to prevent them from getting too large or small
Think of it as: “Keep the original information and just add the new insights”
3.3 - Feed Forward Network A simple neural network applied to each position independently: - Expands the representation (768 → 3072 dimensions) - Applies non-linear transformation - Compresses back (3072 → 768 dimensions)
This allows the model to process the attended information and extract higher-level features.
Step 4: Stacking Layers
These blocks repeat many times: - GPT-2: 12-48 layers - GPT-3: 96 layers
Each layer refines understanding: - Early layers:
Grammar, syntax, word relationships - Middle layers:
Meaning, context, semantic relationships
- Late layers: Abstract reasoning, global context
Step 5: Output Prediction
The final layer produces probabilities for the next word:
Current text: "The cat eats"
Probability for next word:
[the] : 0.001
[a] : 0.089
[fish] : 0.156
[mice] : 0.234 ← Most likely
[quickly]: 0.078
...
Encoder models use only the encoder part of a Transformer model.
Attention mechanism: At each stage, the attention layers can access all the words in the sentence.
These models are characterized by bi-directional attention and are often referred to as auto-encoding models. Processes text bidirectionally - can see both past and future words. Designed for understanding, not generation.
Pretraining: Typically involves corrupting a sentence (e.g., by masking random words) and tasking the model with reconstructing the original sentence.
Best suited for: Tasks requiring a full understanding of the sentence, such as:
Examples of encoder models:

Step 1: Input Embedding Similar to GPT, but adds special tokens:
Text: "The cat eats the mouse"
↓
Tokens: [CLS] [The] [cat] [eats] [the] [mouse] [SEP]
[CLS] = Classification token (for sentence-level tasks)
[SEP] = Separator (for sentence pairs)
Step 2: Positional Encoding Same as GPT - adds position information.
Step 3: Transformer Block
3.1 - Multi-Head Attention (NO MASK)
KEY DIFFERENCE: Bidirectional attention
When processing "cat" at position 2:
Can see: [CLS] [The] [cat] [eats] [the] [mouse] [SEP]
↑ ↑ ↑ ↑ ↑ ↑ ↑
ALL tokens visible (past AND future)
Attention matrix (no masking):
CLS The cat eats the mouse SEP
CLS [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ]
The [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ]
cat [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ] ← Can see everything!
eats [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ]
the [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ]
mouse [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ]
SEP [ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ]
Example attention for “eats”:
Can attend to:
- "cat" (subject before): 0.35
- "mouse" (object after): 0.40 ← Can see future!
- "the" (determiner after): 0.15
- others: 0.10
The model understands full context from both directions, making it better at understanding what words mean in context.
3.2 - Add & Norm Same as GPT - residual connections and normalization.
3.3 - Feed Forward Same as GPT - expand, transform, compress.
Step 4: Stacking Layers - BERT-base: 12 layers - BERT-large: 24 layers
Each layer builds deeper understanding of the full sentence context.
Step 5: Task-Specific Output
No autoregressive generation!
BERT produces a rich representation for each token:
[CLS] → vector ← Used for sentence classification
[The] → vector ← Used for word-level tasks
[cat] → vector ← Used for named entity recognition
[eats] → vector
...
Encoder-decoder models (also known as sequence-to-sequence models) use both parts of the Transformer architecture.
Attention mechanism:
Best suited for: Tasks that involve generating new sentences based on a given input, such as:
Examples of encoder-decoder models:
Here is a first small example:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.optim import AdamW
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
sequences = [
"I've been waiting for a HuggingFace course my whole life.",
"This course is amazing!",
]
batch = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt")
batch["labels"] = torch.tensor([1, 1]) # Set labels, here both sequence are labelled as 1
optimizer = AdamW(model.parameters())
loss = model(**batch).loss
loss.backward()
optimizer.step()
The Hub contain models multiple datasets in lots of different languages.(https://huggingface.co/datasets)
MRPC dataset: This is one of the 10 datasets composing the GLUE benchmark, which is an academic benchmark that is used to measure the performance of ML models across 10 different text classification tasks.
from datasets import load_dataset
raw_datasets = load_dataset("glue", "mrpc")
raw_datasets
## DatasetDict({
## train: Dataset({
## features: ['sentence1', 'sentence2', 'label', 'idx'],
## num_rows: 3668
## })
## validation: Dataset({
## features: ['sentence1', 'sentence2', 'label', 'idx'],
## num_rows: 408
## })
## test: Dataset({
## features: ['sentence1', 'sentence2', 'label', 'idx'],
## num_rows: 1725
## })
## })
raw_train_dataset = raw_datasets["train"]
raw_train_dataset[0]
## {'sentence1': 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .', 'sentence2': 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .', 'label': 1, 'idx': 0}
raw_train_dataset.features
## {'sentence1': Value('string'), 'sentence2': Value('string'), 'label': ClassLabel(names=['not_equivalent', 'equivalent']), 'idx': Value('int32')}
raw_datasets["train"]["sentence1"][:3]
## ['Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .', "Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion .", 'They had published an advertisement on the Internet on June 10 , offering the cargo for sale , he added .']
To preprocess the dataset, we need to convert the text to numbers the model can make sense of. This is done with a tokenizer. We can feed the tokenizer one sentence or a list of sentences, so we can directly tokenize all the first sentences and all the second sentences of each pair like this:
from transformers import AutoTokenizer
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenized_sentences_1 = tokenizer(list(raw_datasets["train"]["sentence1"]))
tokenized_sentences_2 = tokenizer(list(raw_datasets["train"]["sentence2"]))
inputs = tokenizer("This is the first sentence.", "This is the second one.")
inputs
## {'input_ids': [101, 2023, 2003, 1996, 2034, 6251, 1012, 102, 2023, 2003, 1996, 2117, 2028, 1012, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
tokenizer.convert_ids_to_tokens(inputs["input_ids"])
## ['[CLS]', 'this', 'is', 'the', 'first', 'sentence', '.', '[SEP]', 'this', 'is', 'the', 'second', 'one', '.', '[SEP]']
[CLS] sentence1 [SEP] have a token type ID of 0.sentence2 [SEP]
have a token type ID of 1.tokenized_dataset = tokenizer(
list(raw_datasets["train"]["sentence1"]),
list(raw_datasets["train"]["sentence2"]),
padding=True,
truncation=True,
)
input_ids,
attention_mask, token_type_ids) and values as
lists of lists.Dataset.map() method.map() applies a function to each element in the
dataset, enabling customized tokenization functions.def tokenize_function(example):
return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
tokenize_function( raw_datasets["train"][0])
## {'input_ids': [101, 2572, 3217, 5831, 5496, 2010, 2567, 1010, 3183, 2002, 2170, 1000, 1996, 7409, 1000, 1010, 1997, 9969, 4487, 23809, 3436, 2010, 3350, 1012, 102, 7727, 2000, 2032, 2004, 2069, 1000, 1996, 7409, 1000, 1010, 2572, 3217, 5831, 5496, 2010, 2567, 1997, 9969, 4487, 23809, 3436, 2010, 3350, 1012, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
input_idsattention_masktoken_type_idsbatched=True in the
map() call, enhancing tokenization speed by processing
multiple samples at once.batched=True with map().Datasets library adds new fields to each dataset
based on the keys in the returned dictionary for efficient
preprocessing.tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
tokenized_datasets
## DatasetDict({
## train: Dataset({
## features: ['sentence1', 'sentence2', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
## num_rows: 3668
## })
## validation: Dataset({
## features: ['sentence1', 'sentence2', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
## num_rows: 408
## })
## test: Dataset({
## features: ['sentence1', 'sentence2', 'label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
## num_rows: 1725
## })
## })
input_idsattention_masktoken_type_idsimport pandas as pd
df = pd.read_csv('Data/text1995.csv')
data_wang = pd.read_json('Data/1995.json')
data_wang['score'] = data_wang['items_wang_3_restricted50'].apply(lambda x: x['score']['novelty'] )
df = pd.merge(df[['id','text']],data_wang[['id','score']],on = 'id', how = 'inner')
num_positive = df[df['score'] > 0].shape[0]
positive_score_subset = df[df['score'] > 0]
positive_score_subset['score'] = 1
zero_score_subset = df[df['score'] == 0].sample(n=num_positive, random_state=42)
balanced_df = pd.concat([positive_score_subset, zero_score_subset])
balanced_df = balanced_df.sample(frac=1, random_state=24).reset_index(drop=True)
balanced_df['score'] = balanced_df['score'].astype(int)
from sklearn.model_selection import train_test_split
balanced_df = balanced_df[['id','score', 'text']].set_index('id')
train_df, temp_df = train_test_split(balanced_df, test_size=0.4, random_state=42)
valid_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)
from datasets import Dataset, DatasetDict, load_dataset
train_dataset = Dataset.from_pandas(train_df)
valid_dataset = Dataset.from_pandas(valid_df)
test_dataset = Dataset.from_pandas(test_df)
datasets = DatasetDict({
'train': train_dataset,
'validation': valid_dataset,
'test':test_dataset
})
datasets.save_to_disk('Data/test_novelty')
DataCollatorWithPadding
for this purpose.
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
samples = tokenized_datasets["train"][:8]
samples = {k: v for k, v in samples.items() if k not in ["idx", "sentence1", "sentence2"]}
[len(x) for x in samples["input_ids"]]
## [50, 59, 47, 67, 59, 50, 62, 32]
data_collator confirms proper application of
dynamic padding for the batch.batch = data_collator(samples)
{k: v.shape for k, v in batch.items()}
## {'input_ids': torch.Size([8, 67]), 'token_type_ids': torch.Size([8, 67]), 'attention_mask': torch.Size([8, 67]), 'labels': torch.Size([8])}
Trainer.train() on a CPU is very slow; a GPU is
recommended.from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding
raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["sentence1"], example["sentence2"], truncation=True)
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
TrainingArguments
class.from transformers import TrainingArguments
training_args = TrainingArguments("test-trainer0")
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
training_args: settings and configurations for the
training process.training and validation datasets.data_collator: a function to collate batches of
data.tokenizer: to process text inputs.from transformers import Trainer
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
)
## <string>:2: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
evaluation_strategy was not set to “steps” (evaluate
every eval_steps) or “epoch” (evaluate at the end of each
epoch).compute_metrics() function:
Goal: Build a compute_metrics()
function to use during model training.
Function Requirements:
EvalPrediction object (a named tuple with:
predictions fieldlabel_ids field)Usage:
Trainer.predict() to generate model
predictions.predictions = trainer.predict(tokenized_datasets["validation"])
## /Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## 0%| | 0/51 [00:00<?, ?it/s] 8%|7 | 4/51 [00:00<00:01, 37.61it/s] 16%|#5 | 8/51 [00:00<00:01, 29.76it/s] 24%|##3 | 12/51 [00:00<00:01, 28.28it/s] 29%|##9 | 15/51 [00:00<00:01, 28.21it/s] 35%|###5 | 18/51 [00:00<00:01, 27.94it/s] 43%|####3 | 22/51 [00:00<00:00, 29.14it/s] 51%|##### | 26/51 [00:00<00:00, 32.07it/s] 59%|#####8 | 30/51 [00:00<00:00, 32.63it/s] 67%|######6 | 34/51 [00:01<00:00, 32.85it/s] 75%|#######4 | 38/51 [00:01<00:00, 34.71it/s] 82%|########2 | 42/51 [00:01<00:00, 36.03it/s] 90%|######### | 46/51 [00:01<00:00, 34.48it/s] 98%|#########8| 50/51 [00:01<00:00, 34.54it/s]100%|##########| 51/51 [00:01<00:00, 32.58it/s]
print(predictions.predictions.shape, predictions.label_ids.shape)
## (408, 2) (408,)
compute_metrics() is defined and passed to
Trainer, metrics also includes the metrics
returned by compute_metrics().import numpy as np
preds = np.argmax(predictions.predictions, axis=-1)
import evaluate
metric = evaluate.load("glue", "mrpc")
metric.compute(predictions=preds, references=predictions.label_ids)
## {'accuracy': 0.6838235294117647, 'f1': 0.8122270742358079}
print("Preds:", preds[:10])
## Preds: [1 1 1 1 1 1 1 1 1 1]
print("Labels:", predictions.label_ids[:10])
## Labels: [1 0 0 1 0 1 0 1 1 1]
print("Preds type:", type(preds[0]))
## Preds type: <class 'numpy.int64'>
print("Labels type:", type(predictions.label_ids[0]))
## Labels type: <class 'numpy.int64'>
print("Unique values in preds:", np.unique(preds))
## Unique values in preds: [1]
print("Unique values in labels:", np.unique(predictions.label_ids))
## Unique values in labels: [0 1]
def compute_metrics(eval_preds):
metric = evaluate.load("glue", "mrpc")
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
training_args = TrainingArguments("test-trainer", eval_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
model.to(device)
## BertForSequenceClassification(
## (bert): BertModel(
## (embeddings): BertEmbeddings(
## (word_embeddings): Embedding(30522, 768, padding_idx=0)
## (position_embeddings): Embedding(512, 768)
## (token_type_embeddings): Embedding(2, 768)
## (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
## (dropout): Dropout(p=0.1, inplace=False)
## )
## (encoder): BertEncoder(
## (layer): ModuleList(
## (0-11): 12 x BertLayer(
## (attention): BertAttention(
## (self): BertSdpaSelfAttention(
## (query): Linear(in_features=768, out_features=768, bias=True)
## (key): Linear(in_features=768, out_features=768, bias=True)
## (value): Linear(in_features=768, out_features=768, bias=True)
## (dropout): Dropout(p=0.1, inplace=False)
## )
## (output): BertSelfOutput(
## (dense): Linear(in_features=768, out_features=768, bias=True)
## (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
## (dropout): Dropout(p=0.1, inplace=False)
## )
## )
## (intermediate): BertIntermediate(
## (dense): Linear(in_features=768, out_features=3072, bias=True)
## (intermediate_act_fn): GELUActivation()
## )
## (output): BertOutput(
## (dense): Linear(in_features=3072, out_features=768, bias=True)
## (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
## (dropout): Dropout(p=0.1, inplace=False)
## )
## )
## )
## )
## (pooler): BertPooler(
## (dense): Linear(in_features=768, out_features=768, bias=True)
## (activation): Tanh()
## )
## )
## (dropout): Dropout(p=0.1, inplace=False)
## (classifier): Linear(in_features=768, out_features=2, bias=True)
## )
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
## <string>:2: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
TrainingArguments object is created.evaluation_strategy parameter is set to
"epoch".trainer.train()
## 0%| | 0/1377 [00:00<?, ?it/s]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## 0%| | 1/1377 [00:00<04:01, 5.69it/s] 0%| | 2/1377 [00:00<03:37, 6.33it/s] 0%| | 3/1377 [00:00<03:09, 7.26it/s] 0%| | 4/1377 [00:00<02:56, 7.76it/s] 0%| | 5/1377 [00:00<02:56, 7.78it/s] 0%| | 6/1377 [00:00<02:52, 7.94it/s] 1%| | 8/1377 [00:01<02:36, 8.72it/s] 1%| | 9/1377 [00:01<02:37, 8.68it/s] 1%| | 10/1377 [00:01<02:39, 8.57it/s] 1%| | 11/1377 [00:01<02:34, 8.83it/s] 1%| | 12/1377 [00:01<02:32, 8.93it/s] 1%|1 | 14/1377 [00:01<02:28, 9.17it/s] 1%|1 | 15/1377 [00:01<02:29, 9.11it/s] 1%|1 | 16/1377 [00:01<02:31, 9.00it/s] 1%|1 | 17/1377 [00:02<02:29, 9.07it/s] 1%|1 | 18/1377 [00:02<02:28, 9.14it/s] 1%|1 | 19/1377 [00:02<02:33, 8.87it/s] 1%|1 | 20/1377 [00:02<02:31, 8.93it/s] 2%|1 | 21/1377 [00:02<02:39, 8.50it/s] 2%|1 | 22/1377 [00:02<02:48, 8.07it/s] 2%|1 | 23/1377 [00:02<02:41, 8.37it/s] 2%|1 | 24/1377 [00:02<02:51, 7.89it/s] 2%|1 | 25/1377 [00:02<02:50, 7.93it/s] 2%|1 | 26/1377 [00:03<02:42, 8.29it/s] 2%|1 | 27/1377 [00:03<02:43, 8.23it/s] 2%|2 | 28/1377 [00:03<02:47, 8.06it/s] 2%|2 | 29/1377 [00:03<02:42, 8.32it/s] 2%|2 | 30/1377 [00:03<02:48, 7.99it/s] 2%|2 | 31/1377 [00:03<02:42, 8.27it/s] 2%|2 | 32/1377 [00:03<02:43, 8.23it/s] 2%|2 | 33/1377 [00:03<02:38, 8.46it/s] 2%|2 | 34/1377 [00:04<02:34, 8.68it/s] 3%|2 | 35/1377 [00:04<02:32, 8.82it/s] 3%|2 | 36/1377 [00:04<02:35, 8.61it/s] 3%|2 | 37/1377 [00:04<02:31, 8.87it/s] 3%|2 | 38/1377 [00:04<02:30, 8.92it/s] 3%|2 | 39/1377 [00:04<02:30, 8.90it/s] 3%|2 | 41/1377 [00:04<02:22, 9.40it/s] 3%|3 | 42/1377 [00:04<02:26, 9.11it/s] 3%|3 | 43/1377 [00:05<02:33, 8.69it/s] 3%|3 | 44/1377 [00:05<02:33, 8.67it/s] 3%|3 | 45/1377 [00:05<02:30, 8.86it/s] 3%|3 | 46/1377 [00:05<02:28, 8.95it/s] 3%|3 | 48/1377 [00:05<02:21, 9.40it/s] 4%|3 | 49/1377 [00:05<02:24, 9.17it/s] 4%|3 | 50/1377 [00:05<02:26, 9.08it/s] 4%|3 | 51/1377 [00:05<02:22, 9.31it/s] 4%|3 | 52/1377 [00:06<02:22, 9.27it/s] 4%|3 | 53/1377 [00:06<02:20, 9.43it/s] 4%|3 | 54/1377 [00:06<02:21, 9.36it/s] 4%|4 | 56/1377 [00:06<02:17, 9.59it/s] 4%|4 | 57/1377 [00:06<02:20, 9.36it/s] 4%|4 | 58/1377 [00:06<02:23, 9.19it/s] 4%|4 | 59/1377 [00:06<02:23, 9.16it/s] 4%|4 | 61/1377 [00:06<02:19, 9.41it/s] 5%|4 | 62/1377 [00:07<02:24, 9.13it/s] 5%|4 | 63/1377 [00:07<02:24, 9.12it/s] 5%|4 | 64/1377 [00:07<02:22, 9.18it/s] 5%|4 | 65/1377 [00:07<02:24, 9.07it/s] 5%|4 | 66/1377 [00:07<02:22, 9.18it/s] 5%|4 | 67/1377 [00:07<02:22, 9.22it/s] 5%|4 | 68/1377 [00:07<02:20, 9.29it/s] 5%|5 | 69/1377 [00:07<02:20, 9.28it/s] 5%|5 | 70/1377 [00:07<02:18, 9.45it/s] 5%|5 | 72/1377 [00:08<02:15, 9.61it/s] 5%|5 | 74/1377 [00:08<02:13, 9.78it/s] 5%|5 | 75/1377 [00:08<02:17, 9.50it/s] 6%|5 | 76/1377 [00:08<02:17, 9.48it/s] 6%|5 | 77/1377 [00:08<02:16, 9.52it/s] 6%|5 | 78/1377 [00:08<02:17, 9.48it/s] 6%|5 | 79/1377 [00:08<02:17, 9.45it/s] 6%|5 | 81/1377 [00:09<02:14, 9.67it/s] 6%|5 | 82/1377 [00:09<02:16, 9.46it/s] 6%|6 | 83/1377 [00:09<02:16, 9.46it/s] 6%|6 | 84/1377 [00:09<02:15, 9.53it/s] 6%|6 | 85/1377 [00:09<02:14, 9.57it/s] 6%|6 | 86/1377 [00:09<02:14, 9.60it/s] 6%|6 | 87/1377 [00:09<02:31, 8.49it/s] 6%|6 | 88/1377 [00:09<02:33, 8.40it/s] 6%|6 | 89/1377 [00:10<02:32, 8.43it/s] 7%|6 | 90/1377 [00:10<02:26, 8.81it/s] 7%|6 | 91/1377 [00:10<02:21, 9.09it/s] 7%|6 | 93/1377 [00:10<02:15, 9.47it/s] 7%|6 | 94/1377 [00:10<02:14, 9.50it/s] 7%|6 | 96/1377 [00:10<02:11, 9.76it/s] 7%|7 | 97/1377 [00:10<02:11, 9.77it/s] 7%|7 | 98/1377 [00:10<02:13, 9.60it/s] 7%|7 | 99/1377 [00:11<02:12, 9.62it/s] 7%|7 | 100/1377 [00:11<02:12, 9.60it/s] 7%|7 | 101/1377 [00:11<02:12, 9.61it/s] 7%|7 | 103/1377 [00:11<02:10, 9.73it/s] 8%|7 | 104/1377 [00:11<02:24, 8.84it/s] 8%|7 | 105/1377 [00:11<02:22, 8.94it/s] 8%|7 | 106/1377 [00:11<02:20, 9.02it/s] 8%|7 | 107/1377 [00:11<02:21, 9.00it/s] 8%|7 | 109/1377 [00:12<02:11, 9.63it/s] 8%|7 | 110/1377 [00:12<02:12, 9.54it/s] 8%|8 | 111/1377 [00:12<02:12, 9.53it/s] 8%|8 | 112/1377 [00:12<02:12, 9.55it/s] 8%|8 | 113/1377 [00:12<02:13, 9.50it/s] 8%|8 | 114/1377 [00:12<02:14, 9.41it/s] 8%|8 | 115/1377 [00:12<02:16, 9.27it/s] 8%|8 | 116/1377 [00:12<02:15, 9.32it/s] 8%|8 | 117/1377 [00:12<02:13, 9.45it/s] 9%|8 | 118/1377 [00:13<02:13, 9.46it/s] 9%|8 | 119/1377 [00:13<02:12, 9.46it/s] 9%|8 | 120/1377 [00:13<02:16, 9.19it/s] 9%|8 | 121/1377 [00:13<02:15, 9.24it/s] 9%|8 | 122/1377 [00:13<02:15, 9.23it/s] 9%|8 | 123/1377 [00:13<02:22, 8.79it/s] 9%|9 | 124/1377 [00:13<02:21, 8.84it/s] 9%|9 | 125/1377 [00:13<02:24, 8.68it/s] 9%|9 | 127/1377 [00:14<02:18, 9.03it/s] 9%|9 | 128/1377 [00:14<02:28, 8.40it/s] 9%|9 | 129/1377 [00:14<02:26, 8.53it/s] 9%|9 | 130/1377 [00:14<02:22, 8.74it/s] 10%|9 | 131/1377 [00:14<02:28, 8.39it/s] 10%|9 | 132/1377 [00:14<02:36, 7.97it/s] 10%|9 | 133/1377 [00:14<02:29, 8.31it/s] 10%|9 | 134/1377 [00:14<02:26, 8.50it/s] 10%|9 | 135/1377 [00:15<02:21, 8.78it/s] 10%|9 | 136/1377 [00:15<02:16, 9.07it/s] 10%|9 | 137/1377 [00:15<02:15, 9.12it/s] 10%|# | 139/1377 [00:15<02:11, 9.39it/s] 10%|# | 140/1377 [00:15<02:17, 8.99it/s] 10%|# | 141/1377 [00:15<02:20, 8.79it/s] 10%|# | 142/1377 [00:15<02:16, 9.05it/s] 10%|# | 143/1377 [00:15<02:16, 9.03it/s] 11%|# | 145/1377 [00:16<02:11, 9.34it/s] 11%|# | 146/1377 [00:16<02:11, 9.39it/s] 11%|# | 147/1377 [00:16<02:10, 9.44it/s] 11%|# | 149/1377 [00:16<02:10, 9.41it/s] 11%|# | 150/1377 [00:16<02:10, 9.40it/s] 11%|# | 151/1377 [00:16<02:11, 9.32it/s] 11%|#1 | 152/1377 [00:16<02:13, 9.20it/s] 11%|#1 | 153/1377 [00:16<02:14, 9.09it/s] 11%|#1 | 154/1377 [00:17<02:12, 9.24it/s] 11%|#1 | 155/1377 [00:17<02:19, 8.78it/s] 11%|#1 | 156/1377 [00:17<02:14, 9.09it/s] 11%|#1 | 157/1377 [00:17<02:15, 9.02it/s] 11%|#1 | 158/1377 [00:17<02:14, 9.05it/s] 12%|#1 | 160/1377 [00:17<02:05, 9.73it/s] 12%|#1 | 162/1377 [00:17<02:04, 9.75it/s] 12%|#1 | 163/1377 [00:18<02:04, 9.74it/s] 12%|#1 | 164/1377 [00:18<02:04, 9.77it/s] 12%|#1 | 165/1377 [00:18<02:05, 9.66it/s] 12%|#2 | 166/1377 [00:18<02:07, 9.47it/s] 12%|#2 | 167/1377 [00:18<02:08, 9.38it/s] 12%|#2 | 168/1377 [00:18<02:06, 9.52it/s] 12%|#2 | 169/1377 [00:18<02:07, 9.51it/s] 12%|#2 | 170/1377 [00:18<02:06, 9.53it/s] 12%|#2 | 171/1377 [00:18<02:08, 9.35it/s] 12%|#2 | 172/1377 [00:18<02:06, 9.49it/s] 13%|#2 | 174/1377 [00:19<02:01, 9.92it/s] 13%|#2 | 176/1377 [00:19<02:00, 9.96it/s] 13%|#2 | 177/1377 [00:19<02:01, 9.86it/s] 13%|#2 | 178/1377 [00:19<02:02, 9.78it/s] 13%|#2 | 179/1377 [00:19<02:03, 9.69it/s] 13%|#3 | 180/1377 [00:19<02:04, 9.58it/s] 13%|#3 | 181/1377 [00:19<02:04, 9.59it/s] 13%|#3 | 183/1377 [00:20<02:06, 9.46it/s] 13%|#3 | 184/1377 [00:20<02:06, 9.45it/s] 13%|#3 | 185/1377 [00:20<02:05, 9.47it/s] 14%|#3 | 186/1377 [00:20<02:05, 9.47it/s] 14%|#3 | 187/1377 [00:20<02:04, 9.57it/s] 14%|#3 | 188/1377 [00:20<02:04, 9.56it/s] 14%|#3 | 189/1377 [00:20<02:05, 9.44it/s] 14%|#3 | 190/1377 [00:20<02:06, 9.39it/s] 14%|#3 | 191/1377 [00:20<02:05, 9.45it/s] 14%|#3 | 192/1377 [00:21<02:21, 8.38it/s] 14%|#4 | 193/1377 [00:21<02:15, 8.76it/s] 14%|#4 | 194/1377 [00:21<02:15, 8.76it/s] 14%|#4 | 195/1377 [00:21<02:15, 8.74it/s] 14%|#4 | 196/1377 [00:21<02:11, 8.98it/s] 14%|#4 | 197/1377 [00:21<02:08, 9.16it/s] 14%|#4 | 199/1377 [00:21<02:04, 9.43it/s] 15%|#4 | 200/1377 [00:21<02:05, 9.41it/s] 15%|#4 | 202/1377 [00:22<02:02, 9.58it/s] 15%|#4 | 203/1377 [00:22<02:02, 9.57it/s] 15%|#4 | 204/1377 [00:22<02:03, 9.49it/s] 15%|#4 | 206/1377 [00:22<02:00, 9.74it/s] 15%|#5 | 207/1377 [00:22<02:02, 9.55it/s] 15%|#5 | 208/1377 [00:22<02:03, 9.47it/s] 15%|#5 | 209/1377 [00:22<02:05, 9.33it/s] 15%|#5 | 210/1377 [00:23<02:05, 9.27it/s] 15%|#5 | 211/1377 [00:23<02:06, 9.19it/s] 15%|#5 | 212/1377 [00:23<02:09, 8.97it/s] 16%|#5 | 214/1377 [00:23<02:02, 9.46it/s] 16%|#5 | 215/1377 [00:23<02:03, 9.41it/s] 16%|#5 | 216/1377 [00:23<02:03, 9.40it/s] 16%|#5 | 217/1377 [00:23<02:03, 9.37it/s] 16%|#5 | 219/1377 [00:23<02:00, 9.60it/s] 16%|#5 | 220/1377 [00:24<02:04, 9.27it/s] 16%|#6 | 222/1377 [00:24<02:01, 9.50it/s] 16%|#6 | 223/1377 [00:24<02:02, 9.44it/s] 16%|#6 | 224/1377 [00:24<02:01, 9.45it/s] 16%|#6 | 226/1377 [00:24<01:55, 9.93it/s] 16%|#6 | 227/1377 [00:24<01:57, 9.79it/s] 17%|#6 | 228/1377 [00:24<02:03, 9.27it/s] 17%|#6 | 230/1377 [00:25<02:01, 9.47it/s] 17%|#6 | 231/1377 [00:25<02:00, 9.49it/s] 17%|#6 | 232/1377 [00:25<02:00, 9.47it/s] 17%|#6 | 233/1377 [00:25<02:01, 9.45it/s] 17%|#6 | 234/1377 [00:25<02:04, 9.20it/s] 17%|#7 | 235/1377 [00:25<02:03, 9.22it/s] 17%|#7 | 236/1377 [00:25<02:02, 9.30it/s] 17%|#7 | 238/1377 [00:25<01:58, 9.59it/s] 17%|#7 | 239/1377 [00:26<01:58, 9.57it/s] 17%|#7 | 240/1377 [00:26<01:58, 9.57it/s] 18%|#7 | 241/1377 [00:26<02:08, 8.83it/s] 18%|#7 | 242/1377 [00:26<02:08, 8.82it/s] 18%|#7 | 243/1377 [00:26<02:08, 8.86it/s] 18%|#7 | 244/1377 [00:26<02:08, 8.82it/s] 18%|#7 | 246/1377 [00:26<02:01, 9.30it/s] 18%|#7 | 247/1377 [00:26<02:01, 9.29it/s] 18%|#8 | 248/1377 [00:27<02:02, 9.24it/s] 18%|#8 | 249/1377 [00:27<02:02, 9.20it/s] 18%|#8 | 250/1377 [00:27<02:01, 9.27it/s] 18%|#8 | 251/1377 [00:27<02:00, 9.34it/s] 18%|#8 | 252/1377 [00:27<02:03, 9.12it/s] 18%|#8 | 253/1377 [00:27<02:02, 9.19it/s] 18%|#8 | 254/1377 [00:27<02:03, 9.12it/s] 19%|#8 | 255/1377 [00:27<02:02, 9.14it/s] 19%|#8 | 257/1377 [00:28<01:58, 9.46it/s] 19%|#8 | 258/1377 [00:28<01:58, 9.47it/s] 19%|#8 | 259/1377 [00:28<01:57, 9.52it/s] 19%|#8 | 260/1377 [00:28<01:55, 9.64it/s] 19%|#8 | 261/1377 [00:28<01:58, 9.45it/s] 19%|#9 | 262/1377 [00:28<01:58, 9.40it/s] 19%|#9 | 263/1377 [00:28<01:59, 9.35it/s] 19%|#9 | 264/1377 [00:28<01:58, 9.41it/s] 19%|#9 | 265/1377 [00:28<02:02, 9.09it/s] 19%|#9 | 266/1377 [00:29<02:00, 9.26it/s] 19%|#9 | 267/1377 [00:29<01:59, 9.32it/s] 19%|#9 | 268/1377 [00:29<02:00, 9.21it/s] 20%|#9 | 269/1377 [00:29<01:59, 9.24it/s] 20%|#9 | 270/1377 [00:29<02:00, 9.22it/s] 20%|#9 | 271/1377 [00:29<02:01, 9.12it/s] 20%|#9 | 272/1377 [00:29<02:00, 9.18it/s] 20%|#9 | 274/1377 [00:29<01:56, 9.45it/s] 20%|#9 | 275/1377 [00:29<01:56, 9.47it/s] 20%|## | 276/1377 [00:30<01:56, 9.45it/s] 20%|## | 277/1377 [00:30<01:56, 9.43it/s] 20%|## | 278/1377 [00:30<01:56, 9.40it/s] 20%|## | 279/1377 [00:30<01:57, 9.31it/s] 20%|## | 280/1377 [00:30<01:58, 9.24it/s] 20%|## | 281/1377 [00:30<01:57, 9.31it/s] 21%|## | 283/1377 [00:30<01:56, 9.36it/s] 21%|## | 284/1377 [00:30<02:00, 9.10it/s] 21%|## | 286/1377 [00:31<01:55, 9.42it/s] 21%|## | 287/1377 [00:31<01:55, 9.44it/s] 21%|## | 288/1377 [00:31<01:54, 9.47it/s] 21%|## | 289/1377 [00:31<01:55, 9.42it/s] 21%|##1 | 290/1377 [00:31<01:55, 9.40it/s] 21%|##1 | 291/1377 [00:31<01:54, 9.46it/s] 21%|##1 | 292/1377 [00:31<01:54, 9.49it/s] 21%|##1 | 293/1377 [00:31<01:54, 9.48it/s] 21%|##1 | 294/1377 [00:32<01:59, 9.08it/s] 21%|##1 | 296/1377 [00:32<01:54, 9.47it/s] 22%|##1 | 297/1377 [00:32<01:53, 9.52it/s] 22%|##1 | 299/1377 [00:32<01:51, 9.66it/s] 22%|##1 | 300/1377 [00:32<01:52, 9.60it/s] 22%|##1 | 301/1377 [00:32<02:04, 8.67it/s] 22%|##1 | 302/1377 [00:32<02:02, 8.81it/s] 22%|##2 | 303/1377 [00:33<02:05, 8.57it/s] 22%|##2 | 304/1377 [00:33<02:00, 8.91it/s] 22%|##2 | 305/1377 [00:33<01:57, 9.13it/s] 22%|##2 | 306/1377 [00:33<01:55, 9.31it/s] 22%|##2 | 307/1377 [00:33<01:53, 9.42it/s] 22%|##2 | 308/1377 [00:33<01:52, 9.50it/s] 22%|##2 | 309/1377 [00:33<01:53, 9.37it/s] 23%|##2 | 311/1377 [00:33<01:47, 9.89it/s] 23%|##2 | 312/1377 [00:33<01:48, 9.82it/s] 23%|##2 | 313/1377 [00:34<01:48, 9.82it/s] 23%|##2 | 314/1377 [00:34<01:50, 9.59it/s] 23%|##2 | 315/1377 [00:34<01:52, 9.48it/s] 23%|##2 | 316/1377 [00:34<01:56, 9.14it/s] 23%|##3 | 317/1377 [00:34<01:54, 9.25it/s] 23%|##3 | 318/1377 [00:34<01:53, 9.37it/s] 23%|##3 | 319/1377 [00:34<02:07, 8.33it/s] 23%|##3 | 320/1377 [00:34<02:02, 8.62it/s] 23%|##3 | 321/1377 [00:34<01:59, 8.82it/s] 23%|##3 | 322/1377 [00:35<02:01, 8.72it/s] 23%|##3 | 323/1377 [00:35<01:59, 8.83it/s] 24%|##3 | 325/1377 [00:35<01:53, 9.24it/s] 24%|##3 | 326/1377 [00:35<01:54, 9.16it/s] 24%|##3 | 327/1377 [00:35<01:56, 9.00it/s] 24%|##3 | 328/1377 [00:35<01:56, 9.02it/s] 24%|##3 | 329/1377 [00:35<01:55, 9.04it/s] 24%|##3 | 330/1377 [00:35<01:53, 9.21it/s] 24%|##4 | 332/1377 [00:36<01:50, 9.44it/s] 24%|##4 | 333/1377 [00:36<01:50, 9.41it/s] 24%|##4 | 334/1377 [00:36<01:51, 9.39it/s] 24%|##4 | 335/1377 [00:36<01:58, 8.81it/s] 24%|##4 | 336/1377 [00:36<02:00, 8.65it/s] 24%|##4 | 337/1377 [00:36<01:57, 8.84it/s] 25%|##4 | 338/1377 [00:36<01:56, 8.94it/s] 25%|##4 | 339/1377 [00:36<01:57, 8.80it/s] 25%|##4 | 340/1377 [00:37<01:55, 8.97it/s] 25%|##4 | 342/1377 [00:37<01:49, 9.48it/s] 25%|##4 | 343/1377 [00:37<01:49, 9.42it/s] 25%|##4 | 344/1377 [00:37<01:49, 9.43it/s] 25%|##5 | 345/1377 [00:37<01:52, 9.16it/s] 25%|##5 | 346/1377 [00:37<01:51, 9.26it/s] 25%|##5 | 347/1377 [00:37<01:53, 9.08it/s] 25%|##5 | 348/1377 [00:37<01:51, 9.19it/s] 25%|##5 | 349/1377 [00:38<01:54, 8.96it/s] 25%|##5 | 350/1377 [00:38<01:54, 8.94it/s] 25%|##5 | 351/1377 [00:38<01:53, 9.02it/s] 26%|##5 | 352/1377 [00:38<01:52, 9.12it/s] 26%|##5 | 353/1377 [00:38<01:51, 9.18it/s] 26%|##5 | 355/1377 [00:38<01:48, 9.43it/s] 26%|##5 | 356/1377 [00:38<01:48, 9.43it/s] 26%|##5 | 357/1377 [00:38<01:47, 9.48it/s] 26%|##5 | 358/1377 [00:38<01:49, 9.34it/s] 26%|##6 | 359/1377 [00:39<01:48, 9.36it/s] 26%|##6 | 361/1377 [00:39<01:45, 9.64it/s] 26%|##6 | 362/1377 [00:39<01:45, 9.65it/s] 26%|##6 | 364/1377 [00:39<01:43, 9.82it/s] 27%|##6 | 365/1377 [00:39<01:43, 9.75it/s] 27%|##6 | 366/1377 [00:39<01:44, 9.65it/s] 27%|##6 | 367/1377 [00:39<01:45, 9.62it/s] 27%|##6 | 368/1377 [00:40<01:44, 9.61it/s] 27%|##6 | 369/1377 [00:40<01:44, 9.63it/s] 27%|##6 | 370/1377 [00:40<01:45, 9.50it/s] 27%|##6 | 371/1377 [00:40<01:45, 9.53it/s] 27%|##7 | 372/1377 [00:40<01:46, 9.42it/s] 27%|##7 | 373/1377 [00:40<01:47, 9.37it/s] 27%|##7 | 374/1377 [00:40<01:46, 9.41it/s] 27%|##7 | 375/1377 [00:40<01:50, 9.04it/s] 27%|##7 | 376/1377 [00:40<01:49, 9.16it/s] 27%|##7 | 377/1377 [00:40<01:49, 9.17it/s] 27%|##7 | 378/1377 [00:41<01:47, 9.25it/s] 28%|##7 | 379/1377 [00:41<01:50, 8.99it/s] 28%|##7 | 380/1377 [00:41<01:53, 8.79it/s] 28%|##7 | 381/1377 [00:41<01:51, 8.96it/s] 28%|##7 | 382/1377 [00:41<01:53, 8.79it/s] 28%|##7 | 383/1377 [00:41<01:50, 8.96it/s] 28%|##7 | 384/1377 [00:41<01:50, 8.95it/s] 28%|##7 | 385/1377 [00:41<01:50, 8.96it/s] 28%|##8 | 386/1377 [00:41<01:49, 9.04it/s] 28%|##8 | 387/1377 [00:42<01:47, 9.24it/s] 28%|##8 | 388/1377 [00:42<01:46, 9.32it/s] 28%|##8 | 389/1377 [00:42<01:45, 9.37it/s] 28%|##8 | 390/1377 [00:42<01:45, 9.37it/s] 28%|##8 | 391/1377 [00:42<01:47, 9.14it/s] 28%|##8 | 392/1377 [00:42<01:49, 9.03it/s] 29%|##8 | 394/1377 [00:42<01:47, 9.18it/s] 29%|##8 | 395/1377 [00:42<01:45, 9.31it/s] 29%|##8 | 396/1377 [00:43<01:46, 9.24it/s] 29%|##8 | 397/1377 [00:43<01:44, 9.35it/s] 29%|##8 | 398/1377 [00:43<01:44, 9.36it/s] 29%|##9 | 400/1377 [00:43<01:43, 9.47it/s] 29%|##9 | 401/1377 [00:43<01:42, 9.49it/s] 29%|##9 | 402/1377 [00:43<01:44, 9.37it/s] 29%|##9 | 403/1377 [00:43<01:44, 9.32it/s] 29%|##9 | 405/1377 [00:44<01:39, 9.79it/s] 29%|##9 | 406/1377 [00:44<01:41, 9.61it/s] 30%|##9 | 407/1377 [00:44<01:41, 9.59it/s] 30%|##9 | 408/1377 [00:44<01:41, 9.51it/s] 30%|##9 | 410/1377 [00:44<01:38, 9.77it/s] 30%|##9 | 411/1377 [00:44<01:39, 9.71it/s] 30%|##9 | 413/1377 [00:44<01:40, 9.58it/s] 30%|### | 414/1377 [00:44<01:40, 9.61it/s] 30%|### | 415/1377 [00:45<01:39, 9.65it/s] 30%|### | 416/1377 [00:45<01:39, 9.63it/s] 30%|### | 417/1377 [00:45<01:41, 9.48it/s] 30%|### | 418/1377 [00:45<01:43, 9.24it/s] 30%|### | 419/1377 [00:45<01:43, 9.28it/s] 31%|### | 420/1377 [00:45<01:42, 9.31it/s] 31%|### | 421/1377 [00:45<01:41, 9.44it/s] 31%|### | 422/1377 [00:45<01:40, 9.53it/s] 31%|### | 423/1377 [00:45<01:40, 9.52it/s] 31%|### | 425/1377 [00:46<01:38, 9.65it/s] 31%|### | 426/1377 [00:46<01:48, 8.77it/s] 31%|###1 | 427/1377 [00:46<01:45, 8.99it/s] 31%|###1 | 428/1377 [00:46<01:45, 9.02it/s] 31%|###1 | 429/1377 [00:46<01:47, 8.84it/s] 31%|###1 | 430/1377 [00:46<01:44, 9.06it/s] 31%|###1 | 431/1377 [00:46<01:44, 9.08it/s] 31%|###1 | 432/1377 [00:46<01:44, 9.02it/s] 31%|###1 | 433/1377 [00:47<01:47, 8.81it/s] 32%|###1 | 434/1377 [00:47<01:44, 9.06it/s] 32%|###1 | 436/1377 [00:47<01:42, 9.16it/s] 32%|###1 | 437/1377 [00:47<01:44, 9.03it/s] 32%|###1 | 439/1377 [00:47<01:42, 9.18it/s] 32%|###2 | 441/1377 [00:47<01:39, 9.39it/s] 32%|###2 | 442/1377 [00:47<01:40, 9.30it/s] 32%|###2 | 443/1377 [00:48<01:39, 9.35it/s] 32%|###2 | 444/1377 [00:48<01:39, 9.39it/s] 32%|###2 | 445/1377 [00:48<01:39, 9.37it/s] 32%|###2 | 447/1377 [00:48<01:36, 9.60it/s] 33%|###2 | 448/1377 [00:48<01:38, 9.47it/s] 33%|###2 | 450/1377 [00:48<01:36, 9.59it/s] 33%|###2 | 451/1377 [00:48<01:36, 9.56it/s] 33%|###2 | 452/1377 [00:49<01:38, 9.44it/s] 33%|###2 | 453/1377 [00:49<01:38, 9.40it/s] 33%|###2 | 454/1377 [00:49<01:37, 9.43it/s] 33%|###3 | 456/1377 [00:49<01:35, 9.66it/s] 33%|###3 | 457/1377 [00:49<01:35, 9.64it/s] 33%|###3 | 459/1377 [00:49<01:40, 9.11it/s]
## 0%| | 0/51 [00:00<?, ?it/s][A
## 10%|9 | 5/51 [00:00<00:01, 44.93it/s][A
## 20%|#9 | 10/51 [00:00<00:01, 38.12it/s][A
## 27%|##7 | 14/51 [00:00<00:00, 37.36it/s][A
## 35%|###5 | 18/51 [00:00<00:00, 35.23it/s][A
## 43%|####3 | 22/51 [00:00<00:00, 35.73it/s][A
## 51%|##### | 26/51 [00:00<00:00, 36.43it/s][A
## 59%|#####8 | 30/51 [00:00<00:00, 36.07it/s][A
## 67%|######6 | 34/51 [00:00<00:00, 36.88it/s][A
## 75%|#######4 | 38/51 [00:01<00:00, 36.80it/s][A
## 82%|########2 | 42/51 [00:01<00:00, 36.84it/s][A
## 90%|######### | 46/51 [00:01<00:00, 36.56it/s][A
## 98%|#########8| 50/51 [00:01<00:00, 36.57it/s][A
## [A{'eval_loss': 0.41917771100997925, 'eval_accuracy': 0.8382352941176471, 'eval_f1': 0.8885135135135135, 'eval_runtime': 2.2577, 'eval_samples_per_second': 180.719, 'eval_steps_per_second': 22.59, 'epoch': 1.0}
## 33%|###3 | 459/1377 [00:52<01:40, 9.11it/s]
## 100%|##########| 51/51 [00:02<00:00, 36.57it/s][A
## [A/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## 33%|###3 | 460/1377 [00:52<09:22, 1.63it/s] 33%|###3 | 461/1377 [00:52<07:31, 2.03it/s] 34%|###3 | 463/1377 [00:52<05:05, 3.00it/s] 34%|###3 | 464/1377 [00:52<04:19, 3.51it/s] 34%|###3 | 465/1377 [00:52<03:39, 4.15it/s] 34%|###3 | 466/1377 [00:52<03:08, 4.84it/s] 34%|###3 | 467/1377 [00:52<02:44, 5.53it/s] 34%|###4 | 469/1377 [00:53<02:12, 6.85it/s] 34%|###4 | 471/1377 [00:53<01:56, 7.78it/s] 34%|###4 | 472/1377 [00:53<01:52, 8.07it/s] 34%|###4 | 474/1377 [00:53<01:45, 8.56it/s] 34%|###4 | 475/1377 [00:53<01:44, 8.67it/s] 35%|###4 | 476/1377 [00:53<01:41, 8.84it/s] 35%|###4 | 478/1377 [00:54<01:39, 9.02it/s] 35%|###4 | 479/1377 [00:54<01:39, 9.06it/s] 35%|###4 | 480/1377 [00:54<01:37, 9.16it/s] 35%|###4 | 481/1377 [00:54<01:38, 9.13it/s] 35%|###5 | 482/1377 [00:54<01:42, 8.74it/s] 35%|###5 | 483/1377 [00:54<01:40, 8.90it/s] 35%|###5 | 485/1377 [00:54<01:38, 9.07it/s] 35%|###5 | 486/1377 [00:54<01:37, 9.16it/s] 35%|###5 | 487/1377 [00:55<01:36, 9.20it/s] 35%|###5 | 488/1377 [00:55<01:35, 9.34it/s] 36%|###5 | 489/1377 [00:55<01:34, 9.40it/s] 36%|###5 | 491/1377 [00:55<01:30, 9.81it/s] 36%|###5 | 493/1377 [00:55<01:29, 9.83it/s] 36%|###5 | 494/1377 [00:55<01:30, 9.77it/s] 36%|###5 | 495/1377 [00:55<01:33, 9.40it/s] 36%|###6 | 496/1377 [00:55<01:35, 9.26it/s] 36%|###6 | 497/1377 [00:56<01:35, 9.25it/s] 36%|###6 | 498/1377 [00:56<01:33, 9.38it/s] 36%|###6 | 499/1377 [00:56<01:34, 9.27it/s] 36%|###6 | 500/1377 [00:56<01:33, 9.36it/s] {'loss': 0.5506, 'grad_norm': 18.350160598754883, 'learning_rate': 3.1880900508351494e-05, 'epoch': 1.09}
## 36%|###6 | 500/1377 [00:56<01:33, 9.36it/s] 36%|###6 | 501/1377 [00:57<05:00, 2.91it/s] 36%|###6 | 502/1377 [00:57<04:02, 3.60it/s] 37%|###6 | 503/1377 [00:57<03:18, 4.41it/s] 37%|###6 | 505/1377 [00:57<02:30, 5.81it/s] 37%|###6 | 506/1377 [00:57<02:16, 6.39it/s] 37%|###6 | 507/1377 [00:57<02:04, 7.01it/s] 37%|###6 | 509/1377 [00:58<01:48, 8.04it/s] 37%|###7 | 511/1377 [00:58<01:41, 8.55it/s] 37%|###7 | 512/1377 [00:58<01:40, 8.63it/s] 37%|###7 | 514/1377 [00:58<01:34, 9.09it/s] 37%|###7 | 515/1377 [00:58<01:34, 9.12it/s] 37%|###7 | 516/1377 [00:58<01:34, 9.08it/s] 38%|###7 | 517/1377 [00:59<01:34, 9.13it/s] 38%|###7 | 518/1377 [00:59<01:33, 9.22it/s] 38%|###7 | 519/1377 [00:59<01:33, 9.18it/s] 38%|###7 | 520/1377 [00:59<01:32, 9.24it/s] 38%|###7 | 521/1377 [00:59<01:32, 9.30it/s] 38%|###7 | 522/1377 [00:59<01:32, 9.29it/s] 38%|###7 | 523/1377 [00:59<01:31, 9.33it/s] 38%|###8 | 524/1377 [00:59<01:30, 9.42it/s] 38%|###8 | 525/1377 [00:59<01:37, 8.76it/s] 38%|###8 | 527/1377 [01:00<01:32, 9.17it/s] 38%|###8 | 528/1377 [01:00<01:31, 9.28it/s] 38%|###8 | 529/1377 [01:00<01:30, 9.35it/s] 38%|###8 | 530/1377 [01:00<01:31, 9.29it/s] 39%|###8 | 531/1377 [01:00<01:33, 9.00it/s] 39%|###8 | 532/1377 [01:00<01:31, 9.23it/s] 39%|###8 | 534/1377 [01:00<01:25, 9.84it/s] 39%|###8 | 535/1377 [01:00<01:27, 9.63it/s] 39%|###8 | 536/1377 [01:01<01:28, 9.47it/s] 39%|###9 | 538/1377 [01:01<01:23, 10.02it/s] 39%|###9 | 539/1377 [01:01<01:27, 9.62it/s] 39%|###9 | 540/1377 [01:01<01:27, 9.59it/s] 39%|###9 | 542/1377 [01:01<01:24, 9.92it/s] 39%|###9 | 543/1377 [01:01<01:25, 9.79it/s] 40%|###9 | 544/1377 [01:01<01:26, 9.63it/s] 40%|###9 | 545/1377 [01:01<01:26, 9.66it/s] 40%|###9 | 546/1377 [01:02<01:25, 9.68it/s] 40%|###9 | 547/1377 [01:02<01:26, 9.64it/s] 40%|###9 | 548/1377 [01:02<01:27, 9.42it/s] 40%|###9 | 549/1377 [01:02<01:28, 9.36it/s] 40%|###9 | 550/1377 [01:02<01:28, 9.35it/s] 40%|#### | 551/1377 [01:02<01:28, 9.39it/s] 40%|#### | 552/1377 [01:02<01:27, 9.46it/s] 40%|#### | 553/1377 [01:02<01:26, 9.57it/s] 40%|#### | 554/1377 [01:02<01:28, 9.26it/s] 40%|#### | 555/1377 [01:03<01:33, 8.75it/s] 40%|#### | 556/1377 [01:03<01:33, 8.78it/s] 40%|#### | 557/1377 [01:03<01:38, 8.30it/s] 41%|#### | 558/1377 [01:03<01:42, 8.02it/s] 41%|#### | 559/1377 [01:03<01:40, 8.14it/s] 41%|#### | 560/1377 [01:03<01:36, 8.45it/s] 41%|#### | 561/1377 [01:03<01:32, 8.78it/s] 41%|#### | 562/1377 [01:03<01:32, 8.84it/s] 41%|#### | 563/1377 [01:04<01:31, 8.94it/s] 41%|#### | 564/1377 [01:04<01:29, 9.07it/s] 41%|####1 | 565/1377 [01:04<01:28, 9.12it/s] 41%|####1 | 566/1377 [01:04<01:27, 9.24it/s] 41%|####1 | 567/1377 [01:04<01:27, 9.30it/s] 41%|####1 | 568/1377 [01:04<01:26, 9.35it/s] 41%|####1 | 569/1377 [01:04<01:26, 9.35it/s] 41%|####1 | 570/1377 [01:04<01:25, 9.39it/s] 41%|####1 | 571/1377 [01:04<01:25, 9.47it/s] 42%|####1 | 572/1377 [01:04<01:24, 9.50it/s] 42%|####1 | 573/1377 [01:05<01:24, 9.56it/s] 42%|####1 | 574/1377 [01:05<01:27, 9.16it/s] 42%|####1 | 575/1377 [01:05<01:26, 9.26it/s] 42%|####1 | 576/1377 [01:05<01:26, 9.30it/s] 42%|####1 | 577/1377 [01:05<01:29, 8.97it/s] 42%|####1 | 578/1377 [01:05<01:27, 9.15it/s] 42%|####2 | 579/1377 [01:05<01:26, 9.28it/s] 42%|####2 | 580/1377 [01:05<01:26, 9.16it/s] 42%|####2 | 581/1377 [01:05<01:26, 9.19it/s] 42%|####2 | 582/1377 [01:06<01:27, 9.12it/s] 42%|####2 | 583/1377 [01:06<01:26, 9.19it/s] 42%|####2 | 584/1377 [01:06<01:26, 9.19it/s] 42%|####2 | 585/1377 [01:06<01:24, 9.35it/s] 43%|####2 | 586/1377 [01:06<01:26, 9.10it/s] 43%|####2 | 587/1377 [01:06<01:27, 9.05it/s] 43%|####2 | 588/1377 [01:06<01:25, 9.22it/s] 43%|####2 | 589/1377 [01:06<01:24, 9.33it/s] 43%|####2 | 590/1377 [01:06<01:24, 9.37it/s] 43%|####2 | 591/1377 [01:07<01:24, 9.31it/s] 43%|####2 | 592/1377 [01:07<01:23, 9.43it/s] 43%|####3 | 594/1377 [01:07<01:20, 9.67it/s] 43%|####3 | 595/1377 [01:07<01:22, 9.48it/s] 43%|####3 | 596/1377 [01:07<01:23, 9.41it/s] 43%|####3 | 598/1377 [01:07<01:21, 9.61it/s] 44%|####3 | 599/1377 [01:07<01:22, 9.49it/s] 44%|####3 | 600/1377 [01:07<01:21, 9.52it/s] 44%|####3 | 601/1377 [01:08<01:24, 9.20it/s] 44%|####3 | 603/1377 [01:08<01:23, 9.22it/s] 44%|####3 | 604/1377 [01:08<01:24, 9.20it/s] 44%|####3 | 605/1377 [01:08<01:24, 9.16it/s] 44%|####4 | 606/1377 [01:08<01:23, 9.26it/s] 44%|####4 | 608/1377 [01:08<01:19, 9.70it/s] 44%|####4 | 609/1377 [01:08<01:19, 9.67it/s] 44%|####4 | 610/1377 [01:09<01:18, 9.72it/s] 44%|####4 | 611/1377 [01:09<01:20, 9.50it/s] 45%|####4 | 613/1377 [01:09<01:19, 9.66it/s] 45%|####4 | 614/1377 [01:09<01:19, 9.61it/s] 45%|####4 | 615/1377 [01:09<01:21, 9.40it/s] 45%|####4 | 616/1377 [01:09<01:21, 9.35it/s] 45%|####4 | 617/1377 [01:09<01:20, 9.42it/s] 45%|####4 | 618/1377 [01:09<01:20, 9.40it/s] 45%|####4 | 619/1377 [01:10<01:21, 9.30it/s] 45%|####5 | 620/1377 [01:10<01:21, 9.33it/s] 45%|####5 | 621/1377 [01:10<01:21, 9.27it/s] 45%|####5 | 622/1377 [01:10<01:21, 9.22it/s] 45%|####5 | 623/1377 [01:10<01:26, 8.76it/s] 45%|####5 | 624/1377 [01:10<01:24, 8.92it/s] 45%|####5 | 626/1377 [01:10<01:20, 9.37it/s] 46%|####5 | 627/1377 [01:10<01:22, 9.07it/s] 46%|####5 | 628/1377 [01:10<01:22, 9.13it/s] 46%|####5 | 629/1377 [01:11<01:21, 9.16it/s] 46%|####5 | 630/1377 [01:11<01:21, 9.17it/s] 46%|####5 | 631/1377 [01:11<01:22, 9.08it/s] 46%|####5 | 633/1377 [01:11<01:20, 9.20it/s] 46%|####6 | 634/1377 [01:11<01:21, 9.15it/s] 46%|####6 | 635/1377 [01:11<01:23, 8.89it/s] 46%|####6 | 636/1377 [01:11<01:22, 8.95it/s] 46%|####6 | 637/1377 [01:11<01:21, 9.07it/s] 46%|####6 | 638/1377 [01:12<01:21, 9.08it/s] 46%|####6 | 639/1377 [01:12<01:21, 9.11it/s] 46%|####6 | 640/1377 [01:12<01:19, 9.23it/s] 47%|####6 | 641/1377 [01:12<01:18, 9.32it/s] 47%|####6 | 642/1377 [01:12<01:19, 9.24it/s] 47%|####6 | 643/1377 [01:12<01:18, 9.41it/s] 47%|####6 | 644/1377 [01:12<01:23, 8.77it/s] 47%|####6 | 645/1377 [01:12<01:24, 8.64it/s] 47%|####6 | 647/1377 [01:13<01:19, 9.14it/s] 47%|####7 | 648/1377 [01:13<01:19, 9.16it/s] 47%|####7 | 649/1377 [01:13<01:19, 9.13it/s] 47%|####7 | 651/1377 [01:13<01:16, 9.48it/s] 47%|####7 | 652/1377 [01:13<01:17, 9.37it/s] 47%|####7 | 653/1377 [01:13<01:17, 9.36it/s] 47%|####7 | 654/1377 [01:13<01:18, 9.24it/s] 48%|####7 | 655/1377 [01:13<01:17, 9.34it/s] 48%|####7 | 657/1377 [01:14<01:12, 9.91it/s] 48%|####7 | 658/1377 [01:14<01:13, 9.75it/s] 48%|####7 | 659/1377 [01:14<01:14, 9.59it/s] 48%|####8 | 661/1377 [01:14<01:16, 9.38it/s] 48%|####8 | 662/1377 [01:14<01:16, 9.36it/s] 48%|####8 | 663/1377 [01:14<01:16, 9.37it/s] 48%|####8 | 665/1377 [01:14<01:14, 9.61it/s] 48%|####8 | 666/1377 [01:15<01:16, 9.32it/s] 48%|####8 | 667/1377 [01:15<01:16, 9.24it/s] 49%|####8 | 668/1377 [01:15<01:17, 9.20it/s] 49%|####8 | 669/1377 [01:15<01:17, 9.17it/s] 49%|####8 | 670/1377 [01:15<01:17, 9.17it/s] 49%|####8 | 671/1377 [01:15<01:17, 9.08it/s] 49%|####8 | 672/1377 [01:15<01:17, 9.13it/s] 49%|####8 | 673/1377 [01:15<01:16, 9.26it/s] 49%|####8 | 674/1377 [01:15<01:15, 9.31it/s] 49%|####9 | 675/1377 [01:16<01:16, 9.21it/s] 49%|####9 | 676/1377 [01:16<01:16, 9.15it/s] 49%|####9 | 677/1377 [01:16<01:16, 9.16it/s] 49%|####9 | 678/1377 [01:16<01:17, 9.07it/s] 49%|####9 | 679/1377 [01:16<01:18, 8.84it/s] 49%|####9 | 680/1377 [01:16<01:19, 8.77it/s] 49%|####9 | 681/1377 [01:16<01:17, 8.98it/s] 50%|####9 | 682/1377 [01:16<01:16, 9.13it/s] 50%|####9 | 683/1377 [01:16<01:16, 9.09it/s] 50%|####9 | 684/1377 [01:17<01:15, 9.14it/s] 50%|####9 | 685/1377 [01:17<01:15, 9.13it/s] 50%|####9 | 686/1377 [01:17<01:17, 8.88it/s] 50%|####9 | 687/1377 [01:17<01:16, 9.05it/s] 50%|####9 | 688/1377 [01:17<01:14, 9.20it/s] 50%|##### | 690/1377 [01:17<01:12, 9.44it/s] 50%|##### | 691/1377 [01:17<01:16, 8.99it/s] 50%|##### | 692/1377 [01:17<01:18, 8.68it/s] 50%|##### | 693/1377 [01:18<01:17, 8.79it/s] 50%|##### | 694/1377 [01:18<01:17, 8.86it/s] 50%|##### | 695/1377 [01:18<01:15, 9.06it/s] 51%|##### | 696/1377 [01:18<01:16, 8.85it/s] 51%|##### | 697/1377 [01:18<01:18, 8.64it/s] 51%|##### | 698/1377 [01:18<01:16, 8.89it/s] 51%|##### | 699/1377 [01:18<01:18, 8.66it/s] 51%|##### | 700/1377 [01:18<01:16, 8.81it/s] 51%|##### | 701/1377 [01:18<01:17, 8.77it/s] 51%|##### | 702/1377 [01:19<01:15, 8.89it/s] 51%|#####1 | 704/1377 [01:19<01:13, 9.22it/s] 51%|#####1 | 705/1377 [01:19<01:12, 9.27it/s] 51%|#####1 | 707/1377 [01:19<01:08, 9.76it/s] 51%|#####1 | 708/1377 [01:19<01:10, 9.44it/s] 51%|#####1 | 709/1377 [01:19<01:10, 9.43it/s] 52%|#####1 | 710/1377 [01:19<01:13, 9.11it/s] 52%|#####1 | 711/1377 [01:20<01:12, 9.19it/s] 52%|#####1 | 712/1377 [01:20<01:11, 9.36it/s] 52%|#####1 | 714/1377 [01:20<01:10, 9.46it/s] 52%|#####1 | 715/1377 [01:20<01:11, 9.29it/s] 52%|#####1 | 716/1377 [01:20<01:11, 9.29it/s] 52%|#####2 | 717/1377 [01:20<01:10, 9.34it/s] 52%|#####2 | 719/1377 [01:20<01:07, 9.70it/s] 52%|#####2 | 720/1377 [01:20<01:07, 9.68it/s] 52%|#####2 | 722/1377 [01:21<01:07, 9.71it/s] 53%|#####2 | 723/1377 [01:21<01:08, 9.53it/s] 53%|#####2 | 724/1377 [01:21<01:09, 9.46it/s] 53%|#####2 | 725/1377 [01:21<01:10, 9.28it/s] 53%|#####2 | 726/1377 [01:21<01:12, 8.96it/s] 53%|#####2 | 728/1377 [01:21<01:07, 9.59it/s] 53%|#####2 | 729/1377 [01:21<01:07, 9.59it/s] 53%|#####3 | 731/1377 [01:22<01:06, 9.66it/s] 53%|#####3 | 733/1377 [01:22<01:05, 9.84it/s] 53%|#####3 | 735/1377 [01:22<01:05, 9.85it/s] 53%|#####3 | 736/1377 [01:22<01:07, 9.53it/s] 54%|#####3 | 738/1377 [01:22<01:06, 9.60it/s] 54%|#####3 | 739/1377 [01:22<01:06, 9.56it/s] 54%|#####3 | 741/1377 [01:23<01:05, 9.73it/s] 54%|#####3 | 743/1377 [01:23<01:04, 9.79it/s] 54%|#####4 | 745/1377 [01:23<01:05, 9.65it/s] 54%|#####4 | 746/1377 [01:23<01:05, 9.61it/s] 54%|#####4 | 748/1377 [01:23<01:04, 9.72it/s] 54%|#####4 | 749/1377 [01:24<01:05, 9.63it/s] 55%|#####4 | 751/1377 [01:24<01:04, 9.72it/s] 55%|#####4 | 752/1377 [01:24<01:05, 9.57it/s] 55%|#####4 | 753/1377 [01:24<01:06, 9.41it/s] 55%|#####4 | 755/1377 [01:24<01:04, 9.64it/s] 55%|#####4 | 756/1377 [01:24<01:04, 9.60it/s] 55%|#####4 | 757/1377 [01:24<01:09, 8.97it/s] 55%|#####5 | 759/1377 [01:25<01:05, 9.45it/s] 55%|#####5 | 761/1377 [01:25<01:03, 9.65it/s] 55%|#####5 | 762/1377 [01:25<01:04, 9.59it/s] 55%|#####5 | 763/1377 [01:25<01:05, 9.40it/s] 55%|#####5 | 764/1377 [01:25<01:04, 9.48it/s] 56%|#####5 | 765/1377 [01:25<01:05, 9.32it/s] 56%|#####5 | 766/1377 [01:25<01:05, 9.35it/s] 56%|#####5 | 767/1377 [01:25<01:06, 9.23it/s] 56%|#####5 | 768/1377 [01:26<01:08, 8.96it/s] 56%|#####5 | 769/1377 [01:26<01:07, 9.01it/s] 56%|#####5 | 770/1377 [01:26<01:06, 9.12it/s] 56%|#####6 | 772/1377 [01:26<01:02, 9.65it/s] 56%|#####6 | 773/1377 [01:26<01:02, 9.61it/s] 56%|#####6 | 774/1377 [01:26<01:02, 9.60it/s] 56%|#####6 | 775/1377 [01:26<01:02, 9.61it/s] 56%|#####6 | 777/1377 [01:26<01:02, 9.61it/s] 57%|#####6 | 779/1377 [01:27<01:01, 9.66it/s] 57%|#####6 | 780/1377 [01:27<01:02, 9.62it/s] 57%|#####6 | 782/1377 [01:27<01:00, 9.80it/s] 57%|#####6 | 784/1377 [01:27<01:01, 9.65it/s] 57%|#####7 | 785/1377 [01:27<01:01, 9.55it/s] 57%|#####7 | 786/1377 [01:27<01:02, 9.52it/s] 57%|#####7 | 787/1377 [01:28<01:02, 9.47it/s] 57%|#####7 | 788/1377 [01:28<01:02, 9.42it/s] 57%|#####7 | 789/1377 [01:28<01:02, 9.43it/s] 57%|#####7 | 790/1377 [01:28<01:02, 9.45it/s] 57%|#####7 | 791/1377 [01:28<01:03, 9.30it/s] 58%|#####7 | 792/1377 [01:28<01:02, 9.35it/s] 58%|#####7 | 793/1377 [01:28<01:01, 9.43it/s] 58%|#####7 | 794/1377 [01:28<01:01, 9.52it/s] 58%|#####7 | 795/1377 [01:28<01:00, 9.60it/s] 58%|#####7 | 796/1377 [01:28<01:01, 9.48it/s] 58%|#####7 | 797/1377 [01:29<01:02, 9.30it/s] 58%|#####7 | 798/1377 [01:29<01:04, 8.96it/s] 58%|#####8 | 799/1377 [01:29<01:04, 8.97it/s] 58%|#####8 | 800/1377 [01:29<01:03, 9.16it/s] 58%|#####8 | 802/1377 [01:29<00:59, 9.62it/s] 58%|#####8 | 803/1377 [01:29<01:02, 9.22it/s] 58%|#####8 | 804/1377 [01:29<01:01, 9.26it/s] 58%|#####8 | 805/1377 [01:29<01:01, 9.31it/s] 59%|#####8 | 807/1377 [01:30<01:00, 9.42it/s] 59%|#####8 | 808/1377 [01:30<01:00, 9.37it/s] 59%|#####8 | 809/1377 [01:30<01:01, 9.30it/s] 59%|#####8 | 810/1377 [01:30<01:01, 9.29it/s] 59%|#####8 | 811/1377 [01:30<01:00, 9.35it/s] 59%|#####9 | 813/1377 [01:30<00:59, 9.55it/s] 59%|#####9 | 815/1377 [01:30<00:57, 9.69it/s] 59%|#####9 | 817/1377 [01:31<00:56, 9.83it/s] 59%|#####9 | 818/1377 [01:31<00:57, 9.73it/s] 59%|#####9 | 819/1377 [01:31<00:57, 9.67it/s] 60%|#####9 | 821/1377 [01:31<00:57, 9.63it/s] 60%|#####9 | 822/1377 [01:31<00:58, 9.50it/s] 60%|#####9 | 823/1377 [01:31<00:58, 9.45it/s] 60%|#####9 | 824/1377 [01:31<00:59, 9.33it/s] 60%|#####9 | 825/1377 [01:32<00:59, 9.31it/s] 60%|#####9 | 826/1377 [01:32<00:59, 9.29it/s] 60%|###### | 827/1377 [01:32<00:59, 9.29it/s] 60%|###### | 828/1377 [01:32<00:59, 9.19it/s] 60%|###### | 829/1377 [01:32<00:59, 9.24it/s] 60%|###### | 830/1377 [01:32<00:58, 9.30it/s] 60%|###### | 832/1377 [01:32<00:57, 9.56it/s] 61%|###### | 834/1377 [01:33<00:58, 9.24it/s] 61%|###### | 835/1377 [01:33<00:58, 9.34it/s] 61%|###### | 836/1377 [01:33<00:58, 9.26it/s] 61%|###### | 837/1377 [01:33<00:58, 9.26it/s] 61%|###### | 838/1377 [01:33<00:58, 9.26it/s] 61%|###### | 839/1377 [01:33<00:57, 9.32it/s] 61%|######1 | 840/1377 [01:33<00:57, 9.32it/s] 61%|######1 | 841/1377 [01:33<00:57, 9.34it/s] 61%|######1 | 842/1377 [01:33<00:57, 9.32it/s] 61%|######1 | 843/1377 [01:33<00:56, 9.37it/s] 61%|######1 | 844/1377 [01:34<00:56, 9.46it/s] 61%|######1 | 845/1377 [01:34<00:55, 9.52it/s] 61%|######1 | 846/1377 [01:34<00:56, 9.47it/s] 62%|######1 | 847/1377 [01:34<00:58, 9.05it/s] 62%|######1 | 848/1377 [01:34<00:57, 9.16it/s] 62%|######1 | 849/1377 [01:34<00:59, 8.91it/s] 62%|######1 | 850/1377 [01:34<00:58, 9.08it/s] 62%|######1 | 851/1377 [01:34<00:57, 9.20it/s] 62%|######1 | 852/1377 [01:34<00:57, 9.13it/s] 62%|######1 | 853/1377 [01:35<00:56, 9.26it/s] 62%|######2 | 854/1377 [01:35<00:56, 9.22it/s] 62%|######2 | 855/1377 [01:35<00:55, 9.33it/s] 62%|######2 | 856/1377 [01:35<00:58, 8.84it/s] 62%|######2 | 857/1377 [01:35<00:57, 9.11it/s] 62%|######2 | 858/1377 [01:35<00:58, 8.86it/s] 62%|######2 | 859/1377 [01:35<00:57, 9.06it/s] 62%|######2 | 860/1377 [01:35<00:56, 9.14it/s] 63%|######2 | 861/1377 [01:35<00:55, 9.24it/s] 63%|######2 | 862/1377 [01:36<00:56, 9.19it/s] 63%|######2 | 863/1377 [01:36<00:55, 9.28it/s] 63%|######2 | 865/1377 [01:36<00:54, 9.38it/s] 63%|######2 | 866/1377 [01:36<00:54, 9.45it/s] 63%|######2 | 867/1377 [01:36<00:55, 9.14it/s] 63%|######3 | 868/1377 [01:36<00:55, 9.24it/s] 63%|######3 | 870/1377 [01:36<00:54, 9.35it/s] 63%|######3 | 871/1377 [01:37<00:55, 9.13it/s] 63%|######3 | 872/1377 [01:37<00:55, 9.16it/s] 63%|######3 | 873/1377 [01:37<00:54, 9.20it/s] 63%|######3 | 874/1377 [01:37<00:54, 9.26it/s] 64%|######3 | 875/1377 [01:37<00:54, 9.25it/s] 64%|######3 | 876/1377 [01:37<00:54, 9.23it/s] 64%|######3 | 877/1377 [01:37<00:53, 9.29it/s] 64%|######3 | 878/1377 [01:37<00:53, 9.41it/s] 64%|######3 | 879/1377 [01:37<00:52, 9.40it/s] 64%|######3 | 880/1377 [01:37<00:53, 9.31it/s] 64%|######3 | 881/1377 [01:38<00:53, 9.31it/s] 64%|######4 | 882/1377 [01:38<00:52, 9.35it/s] 64%|######4 | 883/1377 [01:38<00:53, 9.32it/s] 64%|######4 | 884/1377 [01:38<00:52, 9.40it/s] 64%|######4 | 885/1377 [01:38<00:52, 9.30it/s] 64%|######4 | 886/1377 [01:38<00:52, 9.30it/s] 64%|######4 | 887/1377 [01:38<00:54, 8.99it/s] 64%|######4 | 888/1377 [01:38<00:53, 9.07it/s] 65%|######4 | 890/1377 [01:39<00:55, 8.84it/s] 65%|######4 | 892/1377 [01:39<00:52, 9.31it/s] 65%|######4 | 893/1377 [01:39<00:52, 9.25it/s] 65%|######4 | 894/1377 [01:39<00:52, 9.20it/s] 65%|######4 | 895/1377 [01:39<00:51, 9.33it/s] 65%|######5 | 897/1377 [01:39<00:51, 9.30it/s] 65%|######5 | 898/1377 [01:39<00:51, 9.34it/s] 65%|######5 | 899/1377 [01:40<00:51, 9.32it/s] 65%|######5 | 900/1377 [01:40<00:50, 9.38it/s] 65%|######5 | 901/1377 [01:40<00:50, 9.45it/s] 66%|######5 | 902/1377 [01:40<00:50, 9.43it/s] 66%|######5 | 903/1377 [01:40<00:49, 9.49it/s] 66%|######5 | 904/1377 [01:40<00:50, 9.33it/s] 66%|######5 | 905/1377 [01:40<00:50, 9.30it/s] 66%|######5 | 906/1377 [01:40<00:49, 9.43it/s] 66%|######5 | 908/1377 [01:40<00:48, 9.58it/s] 66%|######6 | 909/1377 [01:41<00:48, 9.61it/s] 66%|######6 | 910/1377 [01:41<00:48, 9.58it/s] 66%|######6 | 912/1377 [01:41<00:47, 9.83it/s] 66%|######6 | 913/1377 [01:41<00:49, 9.46it/s] 66%|######6 | 914/1377 [01:41<00:48, 9.52it/s] 66%|######6 | 915/1377 [01:41<00:48, 9.56it/s] 67%|######6 | 916/1377 [01:41<00:48, 9.56it/s] 67%|######6 | 917/1377 [01:41<00:47, 9.61it/s] 67%|######6 | 918/1377 [01:42<00:49, 9.31it/s]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
##
## 0%| | 0/51 [00:00<?, ?it/s][A
## 10%|9 | 5/51 [00:00<00:01, 45.53it/s][A
## 20%|#9 | 10/51 [00:00<00:01, 39.48it/s][A
## 27%|##7 | 14/51 [00:00<00:00, 38.73it/s][A
## 35%|###5 | 18/51 [00:00<00:00, 35.84it/s][A
## 43%|####3 | 22/51 [00:00<00:00, 36.57it/s][A
## 51%|##### | 26/51 [00:00<00:00, 37.20it/s][A
## 59%|#####8 | 30/51 [00:00<00:00, 37.09it/s][A
## 69%|######8 | 35/51 [00:00<00:00, 38.11it/s][A
## 76%|#######6 | 39/51 [00:01<00:00, 38.62it/s][A
## 84%|########4 | 43/51 [00:01<00:00, 37.13it/s][A
## 92%|#########2| 47/51 [00:01<00:00, 37.02it/s][A
## 100%|##########| 51/51 [00:01<00:00, 37.34it/s][A
## [A{'eval_loss': 0.424141526222229, 'eval_accuracy': 0.8480392156862745, 'eval_f1': 0.8938356164383562, 'eval_runtime': 2.1727, 'eval_samples_per_second': 187.785, 'eval_steps_per_second': 23.473, 'epoch': 2.0}
## 67%|######6 | 918/1377 [01:44<00:49, 9.31it/s]
## 100%|##########| 51/51 [00:02<00:00, 37.34it/s][A
## [A/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## 67%|######6 | 919/1377 [01:44<05:36, 1.36it/s] 67%|######6 | 920/1377 [01:44<04:11, 1.81it/s] 67%|######6 | 921/1377 [01:44<03:12, 2.37it/s] 67%|######7 | 923/1377 [01:44<02:04, 3.63it/s] 67%|######7 | 925/1377 [01:44<01:32, 4.86it/s] 67%|######7 | 926/1377 [01:45<01:22, 5.45it/s] 67%|######7 | 928/1377 [01:45<01:07, 6.62it/s] 67%|######7 | 929/1377 [01:45<01:04, 6.99it/s] 68%|######7 | 930/1377 [01:45<01:00, 7.43it/s] 68%|######7 | 931/1377 [01:45<00:57, 7.73it/s] 68%|######7 | 932/1377 [01:45<00:55, 8.08it/s] 68%|######7 | 933/1377 [01:45<00:52, 8.46it/s] 68%|######7 | 934/1377 [01:45<00:52, 8.44it/s] 68%|######7 | 935/1377 [01:46<00:50, 8.69it/s] 68%|######8 | 937/1377 [01:46<00:47, 9.24it/s] 68%|######8 | 938/1377 [01:46<00:47, 9.28it/s] 68%|######8 | 939/1377 [01:46<00:48, 8.97it/s] 68%|######8 | 940/1377 [01:46<00:48, 9.08it/s] 68%|######8 | 941/1377 [01:46<00:47, 9.27it/s] 68%|######8 | 942/1377 [01:46<00:46, 9.31it/s] 68%|######8 | 943/1377 [01:46<00:48, 8.97it/s] 69%|######8 | 944/1377 [01:46<00:48, 8.84it/s] 69%|######8 | 945/1377 [01:47<00:48, 8.95it/s] 69%|######8 | 947/1377 [01:47<00:46, 9.22it/s] 69%|######8 | 948/1377 [01:47<00:48, 8.79it/s] 69%|######8 | 949/1377 [01:47<00:47, 8.98it/s] 69%|######8 | 950/1377 [01:47<00:46, 9.10it/s] 69%|######9 | 951/1377 [01:47<00:46, 9.14it/s] 69%|######9 | 952/1377 [01:47<00:46, 9.21it/s] 69%|######9 | 953/1377 [01:47<00:46, 9.18it/s] 69%|######9 | 954/1377 [01:48<00:47, 8.93it/s] 69%|######9 | 955/1377 [01:48<00:47, 8.90it/s] 69%|######9 | 956/1377 [01:48<00:46, 9.14it/s] 69%|######9 | 957/1377 [01:48<00:45, 9.17it/s] 70%|######9 | 958/1377 [01:48<00:46, 9.00it/s] 70%|######9 | 959/1377 [01:48<00:46, 9.04it/s] 70%|######9 | 960/1377 [01:48<00:47, 8.77it/s] 70%|######9 | 961/1377 [01:48<00:46, 8.87it/s] 70%|######9 | 962/1377 [01:48<00:46, 8.91it/s] 70%|######9 | 963/1377 [01:49<00:47, 8.81it/s] 70%|####### | 964/1377 [01:49<00:46, 8.90it/s] 70%|####### | 966/1377 [01:49<00:44, 9.31it/s] 70%|####### | 967/1377 [01:49<00:43, 9.42it/s] 70%|####### | 969/1377 [01:49<00:43, 9.33it/s] 70%|####### | 970/1377 [01:49<00:44, 9.21it/s] 71%|####### | 971/1377 [01:49<00:44, 9.07it/s] 71%|####### | 972/1377 [01:50<00:44, 9.02it/s] 71%|####### | 973/1377 [01:50<00:44, 9.08it/s] 71%|####### | 974/1377 [01:50<00:44, 9.15it/s] 71%|####### | 975/1377 [01:50<00:43, 9.14it/s] 71%|####### | 976/1377 [01:50<00:43, 9.16it/s] 71%|####### | 977/1377 [01:50<00:43, 9.10it/s] 71%|#######1 | 978/1377 [01:50<00:45, 8.73it/s] 71%|#######1 | 979/1377 [01:50<00:45, 8.79it/s] 71%|#######1 | 980/1377 [01:50<00:43, 9.08it/s] 71%|#######1 | 981/1377 [01:51<00:44, 8.81it/s] 71%|#######1 | 982/1377 [01:51<00:43, 9.05it/s] 71%|#######1 | 983/1377 [01:51<00:43, 9.13it/s] 71%|#######1 | 984/1377 [01:51<00:42, 9.34it/s] 72%|#######1 | 985/1377 [01:51<00:43, 8.96it/s] 72%|#######1 | 986/1377 [01:51<00:43, 9.04it/s] 72%|#######1 | 987/1377 [01:51<00:42, 9.16it/s] 72%|#######1 | 988/1377 [01:51<00:42, 9.10it/s] 72%|#######1 | 989/1377 [01:51<00:42, 9.03it/s] 72%|#######1 | 991/1377 [01:52<00:40, 9.43it/s] 72%|#######2 | 992/1377 [01:52<00:40, 9.40it/s] 72%|#######2 | 994/1377 [01:52<00:40, 9.46it/s] 72%|#######2 | 995/1377 [01:52<00:40, 9.37it/s] 72%|#######2 | 996/1377 [01:52<00:40, 9.45it/s] 72%|#######2 | 997/1377 [01:52<00:40, 9.36it/s] 72%|#######2 | 998/1377 [01:52<00:41, 9.15it/s] 73%|#######2 | 1000/1377 [01:53<00:40, 9.41it/s] {'loss': 0.3412, 'grad_norm': 0.13441860675811768, 'learning_rate': 1.3725490196078432e-05, 'epoch': 2.18}
## 73%|#######2 | 1000/1377 [01:53<00:40, 9.41it/s] 73%|#######2 | 1001/1377 [01:54<01:51, 3.37it/s] 73%|#######2 | 1002/1377 [01:54<01:33, 4.02it/s] 73%|#######2 | 1004/1377 [01:54<01:09, 5.37it/s] 73%|#######2 | 1005/1377 [01:54<01:02, 5.92it/s] 73%|#######3 | 1006/1377 [01:54<00:56, 6.53it/s] 73%|#######3 | 1007/1377 [01:54<00:51, 7.13it/s] 73%|#######3 | 1008/1377 [01:54<00:47, 7.69it/s] 73%|#######3 | 1010/1377 [01:54<00:43, 8.43it/s] 73%|#######3 | 1011/1377 [01:55<00:42, 8.65it/s] 73%|#######3 | 1012/1377 [01:55<00:41, 8.85it/s] 74%|#######3 | 1013/1377 [01:55<00:40, 8.88it/s] 74%|#######3 | 1014/1377 [01:55<00:40, 9.05it/s] 74%|#######3 | 1015/1377 [01:55<00:39, 9.15it/s] 74%|#######3 | 1016/1377 [01:55<00:38, 9.27it/s] 74%|#######3 | 1017/1377 [01:55<00:38, 9.34it/s] 74%|#######3 | 1018/1377 [01:55<00:40, 8.96it/s] 74%|#######4 | 1020/1377 [01:56<00:38, 9.27it/s] 74%|#######4 | 1021/1377 [01:56<00:37, 9.40it/s] 74%|#######4 | 1022/1377 [01:56<00:37, 9.38it/s] 74%|#######4 | 1023/1377 [01:56<00:37, 9.40it/s] 74%|#######4 | 1024/1377 [01:56<00:38, 9.26it/s] 74%|#######4 | 1025/1377 [01:56<00:37, 9.33it/s] 75%|#######4 | 1026/1377 [01:56<00:37, 9.33it/s] 75%|#######4 | 1028/1377 [01:56<00:36, 9.50it/s] 75%|#######4 | 1029/1377 [01:56<00:36, 9.55it/s] 75%|#######4 | 1030/1377 [01:57<00:36, 9.46it/s] 75%|#######4 | 1031/1377 [01:57<00:36, 9.49it/s] 75%|#######4 | 1032/1377 [01:57<00:36, 9.50it/s] 75%|#######5 | 1033/1377 [01:57<00:37, 9.19it/s] 75%|#######5 | 1034/1377 [01:57<00:37, 9.12it/s] 75%|#######5 | 1035/1377 [01:57<00:36, 9.31it/s] 75%|#######5 | 1036/1377 [01:57<00:36, 9.37it/s] 75%|#######5 | 1038/1377 [01:57<00:34, 9.85it/s] 75%|#######5 | 1039/1377 [01:58<00:34, 9.79it/s] 76%|#######5 | 1040/1377 [01:58<00:34, 9.67it/s] 76%|#######5 | 1041/1377 [01:58<00:34, 9.60it/s] 76%|#######5 | 1043/1377 [01:58<00:34, 9.77it/s] 76%|#######5 | 1044/1377 [01:58<00:34, 9.59it/s] 76%|#######5 | 1045/1377 [01:58<00:34, 9.50it/s] 76%|#######5 | 1046/1377 [01:58<00:35, 9.24it/s] 76%|#######6 | 1047/1377 [01:58<00:35, 9.37it/s] 76%|#######6 | 1048/1377 [01:58<00:35, 9.32it/s] 76%|#######6 | 1049/1377 [01:59<00:35, 9.15it/s] 76%|#######6 | 1050/1377 [01:59<00:35, 9.22it/s] 76%|#######6 | 1051/1377 [01:59<00:35, 9.25it/s] 76%|#######6 | 1052/1377 [01:59<00:36, 9.01it/s] 76%|#######6 | 1053/1377 [01:59<00:35, 9.07it/s] 77%|#######6 | 1054/1377 [01:59<00:35, 9.21it/s] 77%|#######6 | 1056/1377 [01:59<00:33, 9.54it/s] 77%|#######6 | 1057/1377 [01:59<00:33, 9.56it/s] 77%|#######6 | 1058/1377 [02:00<00:34, 9.28it/s] 77%|#######6 | 1059/1377 [02:00<00:34, 9.16it/s] 77%|#######6 | 1060/1377 [02:00<00:33, 9.33it/s] 77%|#######7 | 1061/1377 [02:00<00:33, 9.39it/s] 77%|#######7 | 1062/1377 [02:00<00:33, 9.36it/s] 77%|#######7 | 1063/1377 [02:00<00:34, 9.24it/s] 77%|#######7 | 1064/1377 [02:00<00:33, 9.35it/s] 77%|#######7 | 1065/1377 [02:00<00:33, 9.26it/s] 77%|#######7 | 1067/1377 [02:01<00:32, 9.50it/s] 78%|#######7 | 1069/1377 [02:01<00:31, 9.87it/s] 78%|#######7 | 1070/1377 [02:01<00:31, 9.82it/s] 78%|#######7 | 1071/1377 [02:01<00:31, 9.73it/s] 78%|#######7 | 1072/1377 [02:01<00:31, 9.59it/s] 78%|#######7 | 1073/1377 [02:01<00:31, 9.54it/s] 78%|#######7 | 1074/1377 [02:01<00:31, 9.50it/s] 78%|#######8 | 1075/1377 [02:01<00:32, 9.33it/s] 78%|#######8 | 1076/1377 [02:01<00:31, 9.50it/s] 78%|#######8 | 1077/1377 [02:02<00:31, 9.51it/s] 78%|#######8 | 1078/1377 [02:02<00:32, 9.10it/s] 78%|#######8 | 1079/1377 [02:02<00:32, 9.14it/s] 78%|#######8 | 1080/1377 [02:02<00:32, 9.17it/s] 79%|#######8 | 1081/1377 [02:02<00:31, 9.30it/s] 79%|#######8 | 1082/1377 [02:02<00:31, 9.42it/s] 79%|#######8 | 1084/1377 [02:02<00:31, 9.45it/s] 79%|#######8 | 1085/1377 [02:02<00:30, 9.47it/s] 79%|#######8 | 1086/1377 [02:03<00:30, 9.51it/s] 79%|#######8 | 1087/1377 [02:03<00:30, 9.44it/s] 79%|#######9 | 1089/1377 [02:03<00:29, 9.66it/s] 79%|#######9 | 1090/1377 [02:03<00:29, 9.60it/s] 79%|#######9 | 1092/1377 [02:03<00:29, 9.81it/s] 79%|#######9 | 1094/1377 [02:03<00:29, 9.55it/s] 80%|#######9 | 1095/1377 [02:03<00:29, 9.45it/s] 80%|#######9 | 1097/1377 [02:04<00:29, 9.60it/s] 80%|#######9 | 1098/1377 [02:04<00:29, 9.57it/s] 80%|#######9 | 1099/1377 [02:04<00:29, 9.47it/s] 80%|#######9 | 1100/1377 [02:04<00:29, 9.39it/s] 80%|#######9 | 1101/1377 [02:04<00:29, 9.48it/s] 80%|######## | 1102/1377 [02:04<00:29, 9.31it/s] 80%|######## | 1104/1377 [02:04<00:28, 9.48it/s] 80%|######## | 1105/1377 [02:05<00:28, 9.43it/s] 80%|######## | 1106/1377 [02:05<00:29, 9.33it/s] 80%|######## | 1107/1377 [02:05<00:28, 9.36it/s] 80%|######## | 1108/1377 [02:05<00:28, 9.42it/s] 81%|######## | 1109/1377 [02:05<00:28, 9.40it/s] 81%|######## | 1110/1377 [02:05<00:28, 9.41it/s] 81%|######## | 1111/1377 [02:05<00:28, 9.46it/s] 81%|######## | 1112/1377 [02:05<00:28, 9.34it/s] 81%|######## | 1113/1377 [02:05<00:27, 9.44it/s] 81%|######## | 1114/1377 [02:05<00:27, 9.42it/s] 81%|######## | 1115/1377 [02:06<00:27, 9.50it/s] 81%|########1 | 1116/1377 [02:06<00:27, 9.54it/s] 81%|########1 | 1117/1377 [02:06<00:27, 9.44it/s] 81%|########1 | 1118/1377 [02:06<00:28, 9.12it/s] 81%|########1 | 1119/1377 [02:06<00:28, 9.19it/s] 81%|########1 | 1120/1377 [02:06<00:27, 9.21it/s] 81%|########1 | 1121/1377 [02:06<00:27, 9.26it/s] 81%|########1 | 1122/1377 [02:06<00:28, 8.95it/s] 82%|########1 | 1123/1377 [02:06<00:27, 9.11it/s] 82%|########1 | 1124/1377 [02:07<00:27, 9.16it/s] 82%|########1 | 1125/1377 [02:07<00:27, 9.24it/s] 82%|########1 | 1126/1377 [02:07<00:28, 8.96it/s] 82%|########1 | 1127/1377 [02:07<00:27, 9.10it/s] 82%|########1 | 1128/1377 [02:07<00:26, 9.27it/s] 82%|########1 | 1129/1377 [02:07<00:27, 9.05it/s] 82%|########2 | 1130/1377 [02:07<00:27, 9.09it/s] 82%|########2 | 1131/1377 [02:07<00:27, 9.08it/s] 82%|########2 | 1132/1377 [02:07<00:26, 9.24it/s] 82%|########2 | 1133/1377 [02:08<00:26, 9.15it/s] 82%|########2 | 1134/1377 [02:08<00:26, 9.17it/s] 82%|########2 | 1135/1377 [02:08<00:26, 9.24it/s] 82%|########2 | 1136/1377 [02:08<00:26, 9.17it/s] 83%|########2 | 1137/1377 [02:08<00:25, 9.29it/s] 83%|########2 | 1138/1377 [02:08<00:25, 9.29it/s] 83%|########2 | 1139/1377 [02:08<00:25, 9.43it/s] 83%|########2 | 1140/1377 [02:08<00:25, 9.31it/s] 83%|########2 | 1141/1377 [02:08<00:25, 9.36it/s] 83%|########2 | 1142/1377 [02:09<00:25, 9.35it/s] 83%|########3 | 1143/1377 [02:09<00:26, 8.71it/s] 83%|########3 | 1144/1377 [02:09<00:26, 8.92it/s] 83%|########3 | 1145/1377 [02:09<00:25, 9.11it/s] 83%|########3 | 1147/1377 [02:09<00:24, 9.26it/s] 83%|########3 | 1148/1377 [02:09<00:25, 9.15it/s] 83%|########3 | 1149/1377 [02:09<00:24, 9.16it/s] 84%|########3 | 1150/1377 [02:09<00:24, 9.25it/s] 84%|########3 | 1151/1377 [02:10<00:24, 9.29it/s] 84%|########3 | 1153/1377 [02:10<00:22, 9.76it/s] 84%|########3 | 1154/1377 [02:10<00:22, 9.70it/s] 84%|########3 | 1155/1377 [02:10<00:23, 9.53it/s] 84%|########3 | 1156/1377 [02:10<00:23, 9.48it/s] 84%|########4 | 1158/1377 [02:10<00:22, 9.79it/s] 84%|########4 | 1159/1377 [02:10<00:23, 9.43it/s] 84%|########4 | 1160/1377 [02:10<00:23, 9.39it/s] 84%|########4 | 1161/1377 [02:11<00:23, 9.30it/s] 84%|########4 | 1162/1377 [02:11<00:23, 9.30it/s] 84%|########4 | 1163/1377 [02:11<00:22, 9.35it/s] 85%|########4 | 1165/1377 [02:11<00:22, 9.52it/s] 85%|########4 | 1166/1377 [02:11<00:22, 9.44it/s] 85%|########4 | 1167/1377 [02:11<00:22, 9.33it/s] 85%|########4 | 1168/1377 [02:11<00:22, 9.39it/s] 85%|########4 | 1169/1377 [02:11<00:22, 9.34it/s] 85%|########4 | 1170/1377 [02:12<00:22, 9.23it/s] 85%|########5 | 1171/1377 [02:12<00:22, 9.27it/s] 85%|########5 | 1172/1377 [02:12<00:22, 8.93it/s] 85%|########5 | 1173/1377 [02:12<00:22, 8.89it/s] 85%|########5 | 1174/1377 [02:12<00:22, 9.08it/s] 85%|########5 | 1175/1377 [02:12<00:22, 9.11it/s] 85%|########5 | 1176/1377 [02:12<00:22, 9.11it/s] 85%|########5 | 1177/1377 [02:12<00:21, 9.15it/s] 86%|########5 | 1178/1377 [02:12<00:21, 9.24it/s] 86%|########5 | 1179/1377 [02:13<00:21, 9.37it/s] 86%|########5 | 1180/1377 [02:13<00:21, 9.37it/s] 86%|########5 | 1181/1377 [02:13<00:20, 9.50it/s] 86%|########5 | 1183/1377 [02:13<00:20, 9.62it/s] 86%|########6 | 1185/1377 [02:13<00:19, 9.88it/s] 86%|########6 | 1186/1377 [02:13<00:19, 9.70it/s] 86%|########6 | 1187/1377 [02:13<00:19, 9.65it/s] 86%|########6 | 1188/1377 [02:13<00:19, 9.49it/s] 86%|########6 | 1189/1377 [02:14<00:19, 9.55it/s] 86%|########6 | 1190/1377 [02:14<00:20, 9.13it/s] 86%|########6 | 1191/1377 [02:14<00:20, 8.92it/s] 87%|########6 | 1192/1377 [02:14<00:20, 8.90it/s] 87%|########6 | 1193/1377 [02:14<00:20, 9.04it/s] 87%|########6 | 1195/1377 [02:14<00:19, 9.40it/s] 87%|########6 | 1196/1377 [02:14<00:19, 9.37it/s] 87%|########6 | 1197/1377 [02:14<00:19, 9.34it/s] 87%|########7 | 1198/1377 [02:15<00:19, 9.07it/s] 87%|########7 | 1199/1377 [02:15<00:19, 9.18it/s] 87%|########7 | 1200/1377 [02:15<00:21, 8.43it/s] 87%|########7 | 1201/1377 [02:15<00:20, 8.40it/s] 87%|########7 | 1202/1377 [02:15<00:20, 8.57it/s] 87%|########7 | 1203/1377 [02:15<00:19, 8.86it/s] 87%|########7 | 1204/1377 [02:15<00:19, 9.02it/s] 88%|########7 | 1205/1377 [02:15<00:19, 8.94it/s] 88%|########7 | 1206/1377 [02:15<00:18, 9.12it/s] 88%|########7 | 1207/1377 [02:16<00:18, 9.08it/s] 88%|########7 | 1208/1377 [02:16<00:19, 8.82it/s] 88%|########7 | 1209/1377 [02:16<00:18, 9.01it/s] 88%|########7 | 1210/1377 [02:16<00:18, 9.23it/s] 88%|########7 | 1211/1377 [02:16<00:18, 9.19it/s] 88%|########8 | 1212/1377 [02:16<00:18, 9.02it/s] 88%|########8 | 1213/1377 [02:16<00:18, 9.05it/s] 88%|########8 | 1214/1377 [02:16<00:17, 9.17it/s] 88%|########8 | 1215/1377 [02:16<00:18, 8.99it/s] 88%|########8 | 1216/1377 [02:17<00:17, 9.09it/s] 88%|########8 | 1217/1377 [02:17<00:17, 9.25it/s] 88%|########8 | 1218/1377 [02:17<00:17, 9.34it/s] 89%|########8 | 1220/1377 [02:17<00:16, 9.52it/s] 89%|########8 | 1221/1377 [02:17<00:17, 8.87it/s] 89%|########8 | 1222/1377 [02:17<00:17, 9.05it/s] 89%|########8 | 1223/1377 [02:17<00:17, 9.01it/s] 89%|########8 | 1224/1377 [02:17<00:16, 9.10it/s] 89%|########8 | 1225/1377 [02:18<00:17, 8.89it/s] 89%|########9 | 1226/1377 [02:18<00:17, 8.88it/s] 89%|########9 | 1227/1377 [02:18<00:17, 8.68it/s] 89%|########9 | 1228/1377 [02:18<00:16, 8.93it/s] 89%|########9 | 1229/1377 [02:18<00:16, 9.13it/s] 89%|########9 | 1231/1377 [02:18<00:15, 9.60it/s] 89%|########9 | 1232/1377 [02:18<00:15, 9.50it/s] 90%|########9 | 1233/1377 [02:18<00:15, 9.57it/s] 90%|########9 | 1234/1377 [02:19<00:15, 9.50it/s] 90%|########9 | 1235/1377 [02:19<00:15, 9.38it/s] 90%|########9 | 1236/1377 [02:19<00:14, 9.54it/s] 90%|########9 | 1237/1377 [02:19<00:14, 9.56it/s] 90%|########9 | 1238/1377 [02:19<00:14, 9.54it/s] 90%|########9 | 1239/1377 [02:19<00:14, 9.65it/s] 90%|######### | 1241/1377 [02:19<00:14, 9.62it/s] 90%|######### | 1242/1377 [02:19<00:13, 9.68it/s] 90%|######### | 1244/1377 [02:20<00:13, 9.80it/s] 90%|######### | 1245/1377 [02:20<00:13, 9.81it/s] 90%|######### | 1246/1377 [02:20<00:13, 9.81it/s] 91%|######### | 1247/1377 [02:20<00:13, 9.64it/s] 91%|######### | 1248/1377 [02:20<00:13, 9.49it/s] 91%|######### | 1249/1377 [02:20<00:13, 9.49it/s] 91%|######### | 1250/1377 [02:20<00:13, 9.43it/s] 91%|######### | 1251/1377 [02:20<00:13, 9.32it/s] 91%|######### | 1252/1377 [02:20<00:13, 9.31it/s] 91%|######### | 1253/1377 [02:20<00:13, 9.35it/s] 91%|#########1| 1254/1377 [02:21<00:13, 9.34it/s] 91%|#########1| 1255/1377 [02:21<00:13, 9.02it/s] 91%|#########1| 1256/1377 [02:21<00:13, 9.21it/s] 91%|#########1| 1258/1377 [02:21<00:12, 9.42it/s] 91%|#########1| 1259/1377 [02:21<00:12, 9.44it/s] 92%|#########1| 1260/1377 [02:21<00:12, 9.33it/s] 92%|#########1| 1261/1377 [02:21<00:12, 9.35it/s] 92%|#########1| 1262/1377 [02:21<00:12, 9.27it/s] 92%|#########1| 1264/1377 [02:22<00:11, 9.54it/s] 92%|#########1| 1266/1377 [02:22<00:11, 9.87it/s] 92%|#########2| 1267/1377 [02:22<00:11, 9.73it/s] 92%|#########2| 1268/1377 [02:22<00:11, 9.57it/s] 92%|#########2| 1269/1377 [02:22<00:11, 9.37it/s] 92%|#########2| 1270/1377 [02:22<00:11, 9.10it/s] 92%|#########2| 1272/1377 [02:23<00:11, 9.40it/s] 92%|#########2| 1273/1377 [02:23<00:11, 9.23it/s] 93%|#########2| 1275/1377 [02:23<00:10, 9.76it/s] 93%|#########2| 1276/1377 [02:23<00:10, 9.63it/s] 93%|#########2| 1277/1377 [02:23<00:10, 9.44it/s] 93%|#########2| 1278/1377 [02:23<00:10, 9.48it/s] 93%|#########2| 1279/1377 [02:23<00:10, 9.45it/s] 93%|#########2| 1280/1377 [02:23<00:10, 9.45it/s] 93%|#########3| 1281/1377 [02:23<00:10, 9.42it/s] 93%|#########3| 1282/1377 [02:24<00:10, 9.26it/s] 93%|#########3| 1283/1377 [02:24<00:10, 8.80it/s] 93%|#########3| 1284/1377 [02:24<00:10, 8.92it/s] 93%|#########3| 1286/1377 [02:24<00:10, 9.02it/s] 93%|#########3| 1287/1377 [02:24<00:09, 9.11it/s] 94%|#########3| 1288/1377 [02:24<00:09, 9.28it/s] 94%|#########3| 1289/1377 [02:24<00:09, 9.33it/s] 94%|#########3| 1290/1377 [02:24<00:09, 9.25it/s] 94%|#########3| 1291/1377 [02:25<00:09, 9.32it/s] 94%|#########3| 1292/1377 [02:25<00:09, 9.36it/s] 94%|#########3| 1293/1377 [02:25<00:09, 9.20it/s] 94%|#########3| 1294/1377 [02:25<00:08, 9.28it/s] 94%|#########4| 1295/1377 [02:25<00:08, 9.29it/s] 94%|#########4| 1296/1377 [02:25<00:08, 9.40it/s] 94%|#########4| 1297/1377 [02:25<00:08, 9.34it/s] 94%|#########4| 1298/1377 [02:25<00:08, 9.28it/s] 94%|#########4| 1299/1377 [02:25<00:08, 9.33it/s] 94%|#########4| 1300/1377 [02:26<00:08, 9.29it/s] 94%|#########4| 1301/1377 [02:26<00:08, 9.35it/s] 95%|#########4| 1303/1377 [02:26<00:07, 9.37it/s] 95%|#########4| 1304/1377 [02:26<00:07, 9.43it/s] 95%|#########4| 1305/1377 [02:26<00:07, 9.35it/s] 95%|#########4| 1306/1377 [02:26<00:08, 8.80it/s] 95%|#########4| 1308/1377 [02:26<00:07, 9.40it/s] 95%|#########5| 1309/1377 [02:26<00:07, 9.29it/s] 95%|#########5| 1310/1377 [02:27<00:07, 9.30it/s] 95%|#########5| 1311/1377 [02:27<00:07, 9.19it/s] 95%|#########5| 1313/1377 [02:27<00:07, 8.99it/s] 95%|#########5| 1314/1377 [02:27<00:07, 8.81it/s] 95%|#########5| 1315/1377 [02:27<00:06, 8.88it/s] 96%|#########5| 1316/1377 [02:27<00:06, 9.00it/s] 96%|#########5| 1317/1377 [02:27<00:06, 9.17it/s] 96%|#########5| 1318/1377 [02:27<00:06, 9.12it/s] 96%|#########5| 1320/1377 [02:28<00:06, 9.43it/s] 96%|#########5| 1321/1377 [02:28<00:06, 9.16it/s] 96%|#########6| 1322/1377 [02:28<00:06, 8.93it/s] 96%|#########6| 1323/1377 [02:28<00:06, 8.84it/s] 96%|#########6| 1324/1377 [02:28<00:06, 8.41it/s] 96%|#########6| 1325/1377 [02:28<00:05, 8.68it/s] 96%|#########6| 1326/1377 [02:28<00:05, 8.67it/s] 96%|#########6| 1328/1377 [02:29<00:05, 8.80it/s] 97%|#########6| 1330/1377 [02:29<00:05, 9.16it/s] 97%|#########6| 1331/1377 [02:29<00:05, 8.93it/s] 97%|#########6| 1332/1377 [02:29<00:04, 9.05it/s] 97%|#########6| 1333/1377 [02:29<00:04, 8.91it/s] 97%|#########6| 1334/1377 [02:29<00:04, 9.07it/s] 97%|#########7| 1336/1377 [02:29<00:04, 9.61it/s] 97%|#########7| 1337/1377 [02:30<00:04, 9.59it/s] 97%|#########7| 1339/1377 [02:30<00:03, 9.72it/s] 97%|#########7| 1340/1377 [02:30<00:03, 9.64it/s] 97%|#########7| 1341/1377 [02:30<00:03, 9.52it/s] 97%|#########7| 1342/1377 [02:30<00:03, 9.59it/s] 98%|#########7| 1344/1377 [02:30<00:03, 9.74it/s] 98%|#########7| 1345/1377 [02:30<00:03, 9.54it/s] 98%|#########7| 1346/1377 [02:31<00:03, 9.40it/s] 98%|#########7| 1348/1377 [02:31<00:03, 9.66it/s] 98%|#########7| 1349/1377 [02:31<00:02, 9.69it/s] 98%|#########8| 1351/1377 [02:31<00:02, 9.97it/s] 98%|#########8| 1353/1377 [02:31<00:02, 9.92it/s] 98%|#########8| 1354/1377 [02:31<00:02, 9.83it/s] 98%|#########8| 1355/1377 [02:31<00:02, 9.76it/s] 98%|#########8| 1356/1377 [02:32<00:02, 9.71it/s] 99%|#########8| 1358/1377 [02:32<00:01, 9.79it/s] 99%|#########8| 1360/1377 [02:32<00:01, 9.86it/s] 99%|#########8| 1362/1377 [02:32<00:01, 9.67it/s] 99%|#########8| 1363/1377 [02:32<00:01, 9.54it/s] 99%|#########9| 1364/1377 [02:32<00:01, 9.46it/s] 99%|#########9| 1365/1377 [02:32<00:01, 9.41it/s] 99%|#########9| 1366/1377 [02:33<00:01, 9.46it/s] 99%|#########9| 1367/1377 [02:33<00:01, 9.47it/s] 99%|#########9| 1368/1377 [02:33<00:00, 9.53it/s] 99%|#########9| 1370/1377 [02:33<00:00, 9.62it/s]100%|#########9| 1371/1377 [02:33<00:00, 9.58it/s]100%|#########9| 1372/1377 [02:33<00:00, 9.67it/s]100%|#########9| 1373/1377 [02:33<00:00, 9.57it/s]100%|#########9| 1374/1377 [02:33<00:00, 9.68it/s]100%|#########9| 1375/1377 [02:34<00:00, 9.55it/s]100%|#########9| 1376/1377 [02:34<00:00, 9.37it/s]100%|##########| 1377/1377 [02:34<00:00, 9.16it/s]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
##
## 0%| | 0/51 [00:00<?, ?it/s][A
## 10%|9 | 5/51 [00:00<00:00, 46.92it/s][A
## 20%|#9 | 10/51 [00:00<00:01, 40.07it/s][A
## 29%|##9 | 15/51 [00:00<00:00, 37.89it/s][A
## 37%|###7 | 19/51 [00:00<00:00, 35.81it/s][A
## 45%|####5 | 23/51 [00:00<00:00, 36.09it/s][A
## 53%|#####2 | 27/51 [00:00<00:00, 36.67it/s][A
## 61%|###### | 31/51 [00:00<00:00, 36.57it/s][A
## 69%|######8 | 35/51 [00:00<00:00, 37.10it/s][A
## 76%|#######6 | 39/51 [00:01<00:00, 37.65it/s][A
## 84%|########4 | 43/51 [00:01<00:00, 37.60it/s][A
## 92%|#########2| 47/51 [00:01<00:00, 36.67it/s][A
## 100%|##########| 51/51 [00:01<00:00, 36.88it/s][A
## [A{'eval_loss': 0.6223188042640686, 'eval_accuracy': 0.8529411764705882, 'eval_f1': 0.8969072164948454, 'eval_runtime': 2.1818, 'eval_samples_per_second': 186.999, 'eval_steps_per_second': 23.375, 'epoch': 3.0}
## 100%|##########| 1377/1377 [02:37<00:00, 9.16it/s]
## 100%|##########| 51/51 [00:02<00:00, 36.88it/s][A
## [A {'train_runtime': 157.2799, 'train_samples_per_second': 69.964, 'train_steps_per_second': 8.755, 'train_loss': 0.38249307935793675, 'epoch': 3.0}
## 100%|##########| 1377/1377 [02:37<00:00, 9.16it/s]100%|##########| 1377/1377 [02:37<00:00, 8.76it/s]
## TrainOutput(global_step=1377, training_loss=0.38249307935793675, metrics={'train_runtime': 157.2799, 'train_samples_per_second': 69.964, 'train_steps_per_second': 8.755, 'total_flos': 405114969714960.0, 'train_loss': 0.38249307935793675, 'epoch': 3.0})
test_pred = trainer.predict(tokenized_datasets["test"])
## /Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## 0%| | 0/216 [00:00<?, ?it/s] 2%|2 | 5/216 [00:00<00:05, 40.35it/s] 5%|4 | 10/216 [00:00<00:05, 39.21it/s] 6%|6 | 14/216 [00:00<00:05, 37.08it/s] 8%|8 | 18/216 [00:00<00:05, 36.75it/s] 10%|# | 22/216 [00:00<00:05, 37.42it/s] 12%|#2 | 26/216 [00:00<00:05, 37.61it/s] 14%|#3 | 30/216 [00:00<00:05, 34.77it/s] 16%|#5 | 34/216 [00:00<00:05, 35.53it/s] 18%|#7 | 38/216 [00:01<00:05, 35.25it/s] 19%|#9 | 42/216 [00:01<00:04, 36.14it/s] 21%|##1 | 46/216 [00:01<00:04, 36.39it/s] 23%|##3 | 50/216 [00:01<00:04, 34.64it/s] 25%|##5 | 54/216 [00:01<00:04, 35.01it/s] 27%|##6 | 58/216 [00:01<00:04, 35.90it/s] 29%|##8 | 62/216 [00:01<00:04, 36.40it/s] 31%|### | 66/216 [00:01<00:04, 36.03it/s] 32%|###2 | 70/216 [00:01<00:04, 35.57it/s] 34%|###4 | 74/216 [00:02<00:04, 35.43it/s] 36%|###6 | 78/216 [00:02<00:03, 34.98it/s] 38%|###7 | 82/216 [00:02<00:03, 35.17it/s] 40%|###9 | 86/216 [00:02<00:03, 35.47it/s] 42%|####1 | 90/216 [00:02<00:03, 35.74it/s] 44%|####3 | 94/216 [00:02<00:03, 36.86it/s] 45%|####5 | 98/216 [00:02<00:03, 36.34it/s] 47%|####7 | 102/216 [00:02<00:03, 35.77it/s] 49%|####9 | 106/216 [00:02<00:03, 35.57it/s] 51%|##### | 110/216 [00:03<00:03, 35.32it/s] 53%|#####2 | 114/216 [00:03<00:02, 35.60it/s] 55%|#####4 | 118/216 [00:03<00:02, 35.71it/s] 56%|#####6 | 122/216 [00:03<00:02, 36.23it/s] 58%|#####8 | 126/216 [00:03<00:02, 34.28it/s] 60%|###### | 130/216 [00:03<00:02, 34.29it/s] 62%|######2 | 134/216 [00:03<00:02, 35.32it/s] 64%|######3 | 138/216 [00:03<00:02, 36.30it/s] 66%|######5 | 142/216 [00:03<00:02, 36.79it/s] 68%|######7 | 146/216 [00:04<00:01, 37.26it/s] 69%|######9 | 150/216 [00:04<00:01, 37.40it/s] 71%|#######1 | 154/216 [00:04<00:01, 36.88it/s] 73%|#######3 | 158/216 [00:04<00:01, 36.53it/s] 75%|#######5 | 162/216 [00:04<00:01, 36.61it/s] 77%|#######6 | 166/216 [00:04<00:01, 34.99it/s] 79%|#######8 | 170/216 [00:04<00:01, 35.44it/s] 81%|######## | 174/216 [00:04<00:01, 35.37it/s] 82%|########2 | 178/216 [00:04<00:01, 34.81it/s] 84%|########4 | 182/216 [00:05<00:00, 35.00it/s] 86%|########6 | 186/216 [00:05<00:00, 34.35it/s] 88%|########7 | 190/216 [00:05<00:00, 34.78it/s] 90%|########9 | 194/216 [00:05<00:00, 35.26it/s] 92%|#########1| 198/216 [00:05<00:00, 35.27it/s] 94%|#########3| 202/216 [00:05<00:00, 35.53it/s] 95%|#########5| 206/216 [00:05<00:00, 35.59it/s] 97%|#########7| 210/216 [00:05<00:00, 35.78it/s] 99%|#########9| 214/216 [00:05<00:00, 36.12it/s]100%|##########| 216/216 [00:06<00:00, 31.39it/s]
preds = np.argmax(test_pred.predictions, axis=-1)
metric = evaluate.load("glue", "mrpc")
metric.compute(predictions=preds, references=test_pred.label_ids)
## {'accuracy': 0.8121739130434783, 'f1': 0.8648874061718098}
print("Preds:", preds[:10])
## Preds: [1 1 1 1 0 1 0 0 1 0]
print("Labels:", predictions.label_ids[:10])
## Labels: [1 0 0 1 0 1 0 1 1 1]
print("Preds type:", type(preds[0]))
## Preds type: <class 'numpy.int64'>
print("Labels type:", type(predictions.label_ids[0]))
## Labels type: <class 'numpy.int64'>
print("Unique values in preds:", np.unique(preds))
## Unique values in preds: [0 1]
print("Unique values in labels:", np.unique(predictions.label_ids))
## Unique values in labels: [0 1]
from transformers import AutoTokenizer, DataCollatorWithPadding, Trainer, TrainingArguments
from transformers import AutoModelForSequenceClassification, EarlyStoppingCallback
from datasets import load_from_disk
from evaluate import load
import numpy as np
import torch
datasets = load_from_disk('Data/test_novelty')
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["text"], truncation=True)
tokenized_datasets = datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
tokenized_datasets = tokenized_datasets.remove_columns(["text", "id"])
tokenized_datasets = tokenized_datasets.rename_column("score", "labels")
tokenized_datasets.set_format("torch")
def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)
accuracy = load('accuracy')
f1 = load('f1')
precision = load('precision')
recall = load('recall')
accuracy_result = accuracy.compute(predictions=predictions, references=labels)
f1_result = f1.compute(predictions=predictions, references=labels)
precision_result = precision.compute(predictions=predictions, references=labels)
recall_result = recall.compute(predictions=predictions, references=labels)
return {
'accuracy': accuracy_result['accuracy'],
'f1': f1_result['f1'],
'precision': precision_result['precision'],
'recall': recall_result['recall']
}
# device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
model.to(device)
training_args = TrainingArguments(
output_dir='./novelty_bert_checkpoints',
eval_strategy="steps", # Evaluate the model every N steps (not per epoch)
eval_steps=200, # Run evaluation on validation set every 200 training steps
save_steps=400, # Save model checkpoint every 400 training steps
learning_rate=2e-5, # Learning rate for the optimizer
per_device_train_batch_size=16, # Number of training samples processed per device before updating weights
per_device_eval_batch_size=16, # Number of evaluation samples processed per device
num_train_epochs=1, # Total number of complete passes through the training dataset
weight_decay=0.01, # L2 regularization coefficient to prevent overfitting
load_best_model_at_end=True, # After training, load the checkpoint with the best validation performance
metric_for_best_model="f1", # Use F1 score to determine which checkpoint is "best"
greater_is_better=True,
warmup_steps=100, # Number of steps to gradually increase learning rate from 0 to learning_rate (helps training stability)
logging_steps=50, # Log training metrics (loss, learning rate) every 50 steps
report_to="none", # Disable automatic logging to external tools (wandb, tensorboard, etc.)
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
compute_metrics=compute_metrics, # Calculate metrics on validation set
processing_class=tokenizer, # Tokenizer for processing
data_collator=data_collator, # Handles dynamic padding of batches
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)] # Stop training if validation metric doesn't improve for 3 consecutive evaluations
)
trainer.train()
trainer.save_model('./novelty_bert_final')
tokenizer.save_pretrained('./novelty_bert_final')
##
## Map: 100%|██████████████████████████████████████████████████████████████████████████████| 12706/12706 [00:01<00:00, 8906.47 examples/s]
## Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
## You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
## 0%| | 0/2383 [00:00<?, ?it/s]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## {'loss': 0.7648, 'grad_norm': 12.595635414123535, 'learning_rate': 9.800000000000001e-06, 'epoch': 0.02}
## {'loss': 0.6888, 'grad_norm': 5.888716697692871, 'learning_rate': 1.98e-05, 'epoch': 0.04}
## {'loss': 0.6858, 'grad_norm': 2.0533313751220703, 'learning_rate': 1.9570740254051688e-05, 'epoch': 0.06}
## {'loss': 0.6569, 'grad_norm': 3.2530934810638428, 'learning_rate': 1.9132720105124838e-05, 'epoch': 0.08}
## Downloading builder script: 7.56kB [00:00, 1.44MB/s] | 200/2383 [04:30<46:13, 1.27s/it]
## Downloading builder script: 7.38kB [00:00, 5.48MB/s]█████████████████████████████████████████████████| 795/795 [04:57<00:00, 3.05it/s]
## {'eval_loss': 0.6507338881492615, 'eval_accuracy': 0.6109405745769382, 'eval_f1': 0.47532109117928034, 'eval_precision': 0.7180885182809493, 'eval_recall': 0.355227669363795, 'eval_runtime': 309.63, 'eval_samples_per_second': 41.033, 'eval_steps_per_second': 2.568, 'epoch': 0.08}
## {'loss': 0.6463, 'grad_norm': 3.86037015914917, 'learning_rate': 1.8694699956197987e-05, 'epoch': 0.1}
## {'loss': 0.6469, 'grad_norm': 7.634893417358398, 'learning_rate': 1.8256679807271137e-05, 'epoch': 0.13}
## {'loss': 0.6517, 'grad_norm': 2.9199094772338867, 'learning_rate': 1.7818659658344287e-05, 'epoch': 0.15}
## {'loss': 0.6356, 'grad_norm': 6.318357944488525, 'learning_rate': 1.7380639509417433e-05, 'epoch': 0.17}
## 17%|███████████████▉ | 400/2383 [14:25<43:57, 1.33s/it]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## Downloading builder script: 7.56kB [00:00, 3.58MB/s]
## Downloading builder script: 7.38kB [00:00, 3.77MB/s]████████████████████████████████████████████████▉| 794/795 [04:54<00:00, 2.65it/s]
## {'eval_loss': 0.6258542537689209, 'eval_accuracy': 0.657693821330185, 'eval_f1': 0.6632597754548974, 'eval_precision': 0.6477616454930429, 'eval_recall': 0.6795176899888942, 'eval_runtime': 298.9649, 'eval_samples_per_second': 42.497, 'eval_steps_per_second': 2.659, 'epoch': 0.17}
## {'loss': 0.6368, 'grad_norm': 2.8214261531829834, 'learning_rate': 1.6942619360490583e-05, 'epoch': 0.19}
## {'loss': 0.6433, 'grad_norm': 6.014491558074951, 'learning_rate': 1.6504599211563733e-05, 'epoch': 0.21}
## {'loss': 0.6428, 'grad_norm': 6.863008499145508, 'learning_rate': 1.6066579062636882e-05, 'epoch': 0.23}
## {'loss': 0.6229, 'grad_norm': 4.182170867919922, 'learning_rate': 1.5628558913710032e-05, 'epoch': 0.25}
## 25%|███████████████████████▉ | 600/2383 [24:40<36:58, 1.24s/it]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## {'eval_loss': 0.622889518737793, 'eval_accuracy': 0.6445493900039354, 'eval_f1': 0.6951120712935458, 'eval_precision': 0.6050064637442708, 'eval_recall': 0.8167539267015707, 'eval_runtime': 312.7921, 'eval_samples_per_second': 40.618, 'eval_steps_per_second': 2.542, 'epoch': 0.25}
## {'loss': 0.6093, 'grad_norm': 4.26153039932251, 'learning_rate': 1.519053876478318e-05, 'epoch': 0.27}
## {'loss': 0.6248, 'grad_norm': 3.7737467288970947, 'learning_rate': 1.475251861585633e-05, 'epoch': 0.29}
## {'loss': 0.6273, 'grad_norm': 7.192020416259766, 'learning_rate': 1.431449846692948e-05, 'epoch': 0.31}
## {'loss': 0.6266, 'grad_norm': 5.350361347198486, 'learning_rate': 1.3876478318002628e-05, 'epoch': 0.34}
## 34%|███████████████████████████████▉ | 800/2383 [34:26<35:29, 1.35s/it]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## {'eval_loss': 0.6135651469230652, 'eval_accuracy': 0.6554112554112554, 'eval_f1': 0.6299239222316145, 'eval_precision': 0.6741451058440383, 'eval_recall': 0.5911470728224655, 'eval_runtime': 300.898, 'eval_samples_per_second': 42.224, 'eval_steps_per_second': 2.642, 'epoch': 0.34}
## {'loss': 0.6263, 'grad_norm': 4.366634845733643, 'learning_rate': 1.343845816907578e-05, 'epoch': 0.36}
## {'loss': 0.6162, 'grad_norm': 6.9116950035095215, 'learning_rate': 1.3000438020148929e-05, 'epoch': 0.38}
## {'loss': 0.633, 'grad_norm': 2.419966220855713, 'learning_rate': 1.2562417871222077e-05, 'epoch': 0.4}
## {'loss': 0.619, 'grad_norm': 3.9776971340179443, 'learning_rate': 1.2124397722295227e-05, 'epoch': 0.42}
## 42%|███████████████████████████████████████▍ | 1000/2383 [43:37<27:23, 1.19s/it]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## {'eval_loss': 0.6167282462120056, 'eval_accuracy': 0.6605273514364424, 'eval_f1': 0.663808558734118, 'eval_precision': 0.6524670548574931, 'eval_recall': 0.6755513247659845, 'eval_runtime': 291.3567, 'eval_samples_per_second': 43.606, 'eval_steps_per_second': 2.729, 'epoch': 0.42}
## {'loss': 0.6176, 'grad_norm': 2.9242210388183594, 'learning_rate': 1.1686377573368375e-05, 'epoch': 0.44}
## {'loss': 0.6312, 'grad_norm': 8.742416381835938, 'learning_rate': 1.1248357424441525e-05, 'epoch': 0.46}
## {'loss': 0.6317, 'grad_norm': 5.807154178619385, 'learning_rate': 1.0810337275514674e-05, 'epoch': 0.48}
## {'loss': 0.6111, 'grad_norm': 6.02040433883667, 'learning_rate': 1.0372317126587823e-05, 'epoch': 0.5}
## 50%|█████████████████████████████████████████████████████████████████████████████████████ | 1200/2383 [52:34<25:09, 1.28s/it]/Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## {'eval_loss': 0.6142168641090393, 'eval_accuracy': 0.6557260920897284, 'eval_f1': 0.6404734506000329, 'eval_precision': 0.6645062254818352, 'eval_recall': 0.6181183563382516, 'eval_runtime': 294.665, 'eval_samples_per_second': 43.117, 'eval_steps_per_second': 2.698, 'epoch': 0.5}
## {'train_runtime': 3451.0748, 'train_samples_per_second': 11.044, 'train_steps_per_second': 0.691, 'train_loss': 0.6415314928690592, 'epoch': 0.5}
## 50%|█████████████████████████████████████████████████████████████████████████████████████ | 1200/2383 [57:31<56:42, 2.88s/it]
## /Users/peltouz/Documents/GitHub/M2-Py-DS2E/hf/lib/python3.13/site-packages/torch/utils/data/dataloader.py:692: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, device pinned memory won't be used.
## warnings.warn(warn_msg)
## 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 795/795 [04:53<00:00, 2.71it/s]
from transformers import AutoModelForSequenceClassification
from torch.utils.data import DataLoader
import evaluate
from tqdm import tqdm
import torch
import numpy as np
def get_model_eval_testset(checkpoint):
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
model.to(device)
eval_dataloader = DataLoader(
tokenized_datasets["test"], batch_size=16, collate_fn=data_collator
)
# Collect all predictions and labels
all_logits = []
all_labels = []
model.eval()
for batch in eval_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
logits = outputs.logits
all_logits.append(logits.cpu().numpy())
all_labels.append(batch["labels"].cpu().numpy())
all_logits = np.concatenate(all_logits, axis=0)
all_labels = np.concatenate(all_labels, axis=0)
results = compute_metrics((all_logits, all_labels))
print("\nTest Set Results:")
print("=" * 50)
for metric_name, value in results.items():
print(f"{metric_name}: {value:.4f}")
print("=" * 50)
get_model_eval_testset('bert-base-uncased')
##
## Test Set Results:
## ==================================================
## accuracy: 0.5026
## f1: 0.0427
## precision: 0.4947
## recall: 0.0223
## ==================================================
get_model_eval_testset('./novelty_bert_final')
##
## Test Set Results:
## ==================================================
## accuracy: 0.6585
## f1: 0.6666
## precision: 0.6477
## recall: 0.6867
## ==================================================