|
|
+ deepspeed --master_port 18414 --module safe_rlhf.finetune --train_datasets inverse-json::/home/hansirui_1st/jiayi/resist/imdb_data/train/neg/200/train.json --model_name_or_path /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000 --max_length 512 --trust_remote_code True --epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --gradient_checkpointing --learning_rate 1e-5 --lr_warmup_ratio 0 --weight_decay 0.0 --lr_scheduler_type constant --weight_decay 0.0 --seed 42 --output_dir /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000-Q2-200 --log_type wandb --log_run_name imdb-tinyllama-3T-s3-Q1-5000-Q2-200 --log_project Inverse_Alignment_IMDb --zero_stage 3 --offload none --bf16 True --tf32 True --save_16bit |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
nvcc warning : incompatible redefinition for option |
|
|
[rank7]:[W527 21:03:55.648168998 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank1]:[W527 21:03:55.678747202 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank6]:[W527 21:03:55.686524965 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank2]:[W527 21:03:55.720479225 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank5]:[W527 21:03:55.729222841 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank4]:[W527 21:03:55.752663867 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank3]:[W527 21:03:55.812318877 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
[rank0]:[W527 21:03:55.876239923 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
loading configuration file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/config.json |
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
Model config LlamaConfig { |
|
|
"architectures": [ |
|
|
"LlamaForCausalLM" |
|
|
], |
|
|
"attention_bias": false, |
|
|
"attention_dropout": 0.0, |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"head_dim": 64, |
|
|
"hidden_act": "silu", |
|
|
"hidden_size": 2048, |
|
|
"initializer_range": 0.02, |
|
|
"intermediate_size": 5632, |
|
|
"max_position_embeddings": 2048, |
|
|
"mlp_bias": false, |
|
|
"model_type": "llama", |
|
|
"num_attention_heads": 32, |
|
|
"num_hidden_layers": 22, |
|
|
"num_key_value_heads": 4, |
|
|
"pad_token_id": 32000, |
|
|
"pretraining_tp": 1, |
|
|
"rms_norm_eps": 1e-05, |
|
|
"rope_scaling": null, |
|
|
"rope_theta": 10000.0, |
|
|
"tie_word_embeddings": false, |
|
|
"torch_dtype": "float32", |
|
|
"transformers_version": "4.52.1", |
|
|
"use_cache": true, |
|
|
"vocab_size": 32001 |
|
|
} |
|
|
|
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
loading weights file /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000/pytorch_model.bin |
|
|
Will use torch_dtype=torch.float32 as defined in model |
|
|
Instantiating LlamaForCausalLM model under default dtype torch.float32. |
|
|
Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
|
|
Generate config GenerationConfig { |
|
|
"bos_token_id": 1, |
|
|
"eos_token_id": 2, |
|
|
"pad_token_id": 32000 |
|
|
} |
|
|
|
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
|
|
|
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000. |
|
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
|
|
Generation config file not found, using a generation config created from the model config. |
|
|
loading file tokenizer.model |
|
|
loading file tokenizer.json |
|
|
loading file added_tokens.json |
|
|
loading file special_tokens_map.json |
|
|
loading file tokenizer_config.json |
|
|
loading file chat_template.jinja |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
Detected CUDA files, patching ldflags |
|
|
Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
|
|
/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
|
|
If this is not desired, please set os.environ[ |
|
|
warnings.warn( |
|
|
Building extension module fused_adam... |
|
|
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
|
|
|
|
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
Loading extension module fused_adam... |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login --relogin` to force relogin |
|
|
wandb: Tracking run with wandb version 0.19.11 |
|
|
wandb: Run data is saved locally in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000-Q2-200/wandb/run-20250527_210413-yqdyjjre |
|
|
wandb: Run `wandb offline` to turn off syncing. |
|
|
wandb: Syncing run imdb-tinyllama-3T-s3-Q1-5000-Q2-200 |
|
|
wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment_IMDb |
|
|
wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment_IMDb/runs/yqdyjjre |
|
|
Training 1/1 epoch: 0%| | 0/25 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
|
Training 1/1 epoch (loss 2.7574): 0%| | 0/25 [00:10<?, ?it/s]
Training 1/1 epoch (loss 2.7574): 4%|β | 1/25 [00:10<04:20, 10.85s/it]
Training 1/1 epoch (loss 2.5256): 4%|β | 1/25 [00:13<04:20, 10.85s/it]
Training 1/1 epoch (loss 2.5256): 8%|β | 2/25 [00:13<02:15, 5.90s/it]
Training 1/1 epoch (loss 2.8368): 8%|β | 2/25 [00:14<02:15, 5.90s/it]
Training 1/1 epoch (loss 2.8368): 12%|ββ | 3/25 [00:14<01:24, 3.84s/it]
Training 1/1 epoch (loss 2.6624): 12%|ββ | 3/25 [00:16<01:24, 3.84s/it]
Training 1/1 epoch (loss 2.6624): 16%|ββ | 4/25 [00:16<01:03, 3.05s/it]
Training 1/1 epoch (loss 2.4198): 16%|ββ | 4/25 [00:16<01:03, 3.05s/it]
Training 1/1 epoch (loss 2.4198): 20%|ββ | 5/25 [00:16<00:42, 2.12s/it]
Training 1/1 epoch (loss 2.7459): 20%|ββ | 5/25 [00:18<00:42, 2.12s/it]
Training 1/1 epoch (loss 2.7459): 24%|βββ | 6/25 [00:18<00:39, 2.06s/it]
Training 1/1 epoch (loss 2.7951): 24%|βββ | 6/25 [00:20<00:39, 2.06s/it]
Training 1/1 epoch (loss 2.7951): 28%|βββ | 7/25 [00:20<00:33, 1.86s/it]
Training 1/1 epoch (loss 2.8293): 28%|βββ | 7/25 [00:21<00:33, 1.86s/it]
Training 1/1 epoch (loss 2.8293): 32%|ββββ | 8/25 [00:21<00:26, 1.56s/it]
Training 1/1 epoch (loss 2.5869): 32%|ββββ | 8/25 [00:22<00:26, 1.56s/it]
Training 1/1 epoch (loss 2.5869): 36%|ββββ | 9/25 [00:22<00:24, 1.56s/it]
Training 1/1 epoch (loss 2.7101): 36%|ββββ | 9/25 [00:24<00:24, 1.56s/it]
Training 1/1 epoch (loss 2.7101): 40%|ββββ | 10/25 [00:24<00:21, 1.45s/it]
Training 1/1 epoch (loss 2.5700): 40%|ββββ | 10/25 [00:26<00:21, 1.45s/it]
Training 1/1 epoch (loss 2.5700): 44%|βββββ | 11/25 [00:26<00:24, 1.74s/it]
Training 1/1 epoch (loss 2.5706): 44%|βββββ | 11/25 [00:28<00:24, 1.74s/it]
Training 1/1 epoch (loss 2.5706): 48%|βββββ | 12/25 [00:28<00:22, 1.73s/it]
Training 1/1 epoch (loss 2.8329): 48%|βββββ | 12/25 [00:29<00:22, 1.73s/it]
Training 1/1 epoch (loss 2.8329): 52%|ββββββ | 13/25 [00:29<00:17, 1.48s/it]
Training 1/1 epoch (loss 2.8145): 52%|ββββββ | 13/25 [00:31<00:17, 1.48s/it]
Training 1/1 epoch (loss 2.8145): 56%|ββββββ | 14/25 [00:31<00:19, 1.79s/it]
Training 1/1 epoch (loss 2.7156): 56%|ββββββ | 14/25 [00:33<00:19, 1.79s/it]
Training 1/1 epoch (loss 2.7156): 60%|ββββββ | 15/25 [00:33<00:18, 1.84s/it]
Training 1/1 epoch (loss 2.7377): 60%|ββββββ | 15/25 [00:34<00:18, 1.84s/it]
Training 1/1 epoch (loss 2.7377): 64%|βββββββ | 16/25 [00:34<00:13, 1.50s/it]
Training 1/1 epoch (loss 2.4893): 64%|βββββββ | 16/25 [00:35<00:13, 1.50s/it]
Training 1/1 epoch (loss 2.4893): 68%|βββββββ | 17/25 [00:35<00:11, 1.38s/it]
Training 1/1 epoch (loss 2.7379): 68%|βββββββ | 17/25 [00:37<00:11, 1.38s/it]
Training 1/1 epoch (loss 2.7379): 72%|ββββββββ | 18/25 [00:37<00:10, 1.57s/it]
Training 1/1 epoch (loss 2.5085): 72%|ββββββββ | 18/25 [00:38<00:10, 1.57s/it]
Training 1/1 epoch (loss 2.5085): 76%|ββββββββ | 19/25 [00:38<00:09, 1.53s/it]
Training 1/1 epoch (loss 2.4908): 76%|ββββββββ | 19/25 [00:40<00:09, 1.53s/it]
Training 1/1 epoch (loss 2.4908): 80%|ββββββββ | 20/25 [00:40<00:07, 1.50s/it]
Training 1/1 epoch (loss 2.7256): 80%|ββββββββ | 20/25 [00:41<00:07, 1.50s/it]
Training 1/1 epoch (loss 2.7256): 84%|βββββββββ | 21/25 [00:41<00:05, 1.48s/it]
Training 1/1 epoch (loss 2.5723): 84%|βββββββββ | 21/25 [00:42<00:05, 1.48s/it]
Training 1/1 epoch (loss 2.5723): 88%|βββββββββ | 22/25 [00:42<00:04, 1.42s/it]
Training 1/1 epoch (loss 2.5233): 88%|βββββββββ | 22/25 [00:44<00:04, 1.42s/it]
Training 1/1 epoch (loss 2.5233): 92%|ββββββββββ| 23/25 [00:44<00:03, 1.56s/it]
Training 1/1 epoch (loss 2.5882): 92%|ββββββββββ| 23/25 [00:46<00:03, 1.56s/it]
Training 1/1 epoch (loss 2.5882): 96%|ββββββββββ| 24/25 [00:46<00:01, 1.46s/it]
Training 1/1 epoch (loss 2.6040): 96%|ββββββββββ| 24/25 [00:47<00:01, 1.46s/it]
Training 1/1 epoch (loss 2.6040): 100%|ββββββββββ| 25/25 [00:47<00:00, 1.49s/it]
Training 1/1 epoch (loss 2.6040): 100%|ββββββββββ| 25/25 [00:47<00:00, 1.90s/it] |
|
|
tokenizer config file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000-Q2-200/tokenizer_config.json |
|
|
Special tokens file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000-Q2-200/special_tokens_map.json |
|
|
wandb: |
|
|
wandb: |
|
|
wandb: Run history: |
|
|
wandb: train/epoch ββββββββββββββ
β
β
βββββββββ |
|
|
wandb: train/loss ββββ
βββββββββββββββββββββ |
|
|
wandb: train/lr βββββββββββββββββββββββββ |
|
|
wandb: train/step βββββββββββββ
β
β
β
βββββββββ |
|
|
wandb: |
|
|
wandb: Run summary: |
|
|
wandb: train/epoch 1 |
|
|
wandb: train/loss 2.60398 |
|
|
wandb: train/lr 1e-05 |
|
|
wandb: train/step 25 |
|
|
wandb: |
|
|
wandb: π View run imdb-tinyllama-3T-s3-Q1-5000-Q2-200 at: https://wandb.ai/xtom/Inverse_Alignment_IMDb/runs/yqdyjjre |
|
|
wandb: βοΈ View project at: https://wandb.ai/xtom/Inverse_Alignment_IMDb |
|
|
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) |
|
|
wandb: Find logs at: /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-3T/tinyllama-3T-s3-Q1-5000-Q2-200/wandb/run-20250527_210413-yqdyjjre/logs |
|
|
|