论文得到的一个重要结论是:与直接微调GPT-3得到baseline模型相比,训练一个verifier能和一个参数大小x30倍的baseline达到大致相同的性能提升,并且随着数据的增加,verifiers明显更好。
下面是几点说明:
- 论文直接采用GPT-3预训练模型作为初始化模型,没有进行额外预训练。Pretraining环节仅介绍数据集;
- Fine-tuning环节将记录Finetuning方法;
- Verifification环节将记录Verification方法。
Pretraining
- 1.which foundation models are based on?
- 2.which datasets are collected specific for “math”?
Fine-tuning
- what types of pre-processing methods are used?
- what types of test/evaluation methods are used?
Verification