【HuggingFace Transformers】（StackOverflow问答）使用Huggingface Transformers从磁盘加载预训练模型

本文主要是介绍【HuggingFace Transformers】（StackOverflow问答）使用Huggingface Transformers从磁盘加载预训练模型，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

这是在Stack Overflow上的一个问答，链接如下：

Load a pre-trained model from disk with Huggingface Transformers - Stack Overflowhttps://stackoverflow.com/questions/64001128/load-a-pre-trained-model-from-disk-with-huggingface-transformers从这个问答中，我们可以了解到

使用from_pretrained方法加载预训练模型，无需每次下载权重数据。
在加载模型时，需要确保提供的路径是正确的模型标识符或包含config.json文件的目录路径。
使用相对路径或绝对路径。
使用save_pretrained方法保存文件。
示例代码展示了如何加载和保存预训练模型。

🚚🚒🚑🚎🚐🚌🛻🚙🛺🚕🚓🚗🚚🚒🚑🚎🚐🚌🛻🚙🛺🚕🚓🚗

问题描述：

根据from_pretrained的文档，我了解到我不必每次都下载预训练向量(权重数据)，我可以使用以下语法将它们保存并从磁盘加载：

 - a path to a `directory` containing vocabulary files required by the tokenizer, for instance saved using the :func:`~transformers.PreTrainedTokenizer.save_pretrained` method, e.g.: ``./my_model_directory/``.- (not applicable to all derived classes, deprecated) a path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (e.g. Bert, XLNet), e.g.: ``./my_model_directory/vocab.txt``.

- 一个指向包含分词器所需词汇文件的目录的路径，例如使用 :func:`~transformers.PreTrainedTokenizer.save_pretrained` 方法保存的目录，例如：``./my_model_directory/``。
- （不适用于所有派生类，已弃用）仅当分词器仅需要单个词汇文件（例如Bert，XLNet）时，才适用于指向单个保存的词汇文件的路径或URL，例如：``./my_model_directory/vocab.txt``。

所以，我去了模型中心：

https://huggingface.co/models

我找到了我想要的模型：

https://huggingface.co/bert-base-cased

我从他们提供的链接下载了它：

使用掩码语言建模（MLM）目标在英语语言上预训练的模型。它在这篇论文中被介绍，并在这个代码库中首次发布。该模型区分大小写：它区分英语和English。

存储在这个路径下：

  /my/local/models/cased_L-12_H-768_A-12/

这个路径下包含

 ./../bert_config.jsonbert_model.ckpt.data-00000-of-00001bert_model.ckpt.indexbert_model.ckpt.metavocab.txt

配置了路径，并加载分词器：

PATH = '/my/local/models/cased_L-12_H-768_A-12/'tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)

结果报错：

>           raise EnvironmentError(msg)
E           OSError: Can't load config for '/my/local/models/cased_L-12_H-768_A-12/'. Make sure that:
E           
E           - '/my/local/models/cased_L-12_H-768_A-12/' is a correct model identifier listed on 'https://huggingface.co/models'
E           
E           - or '/my/local/models/cased_L-12_H-768_A-12/' is the correct path to a directory containing a config.json file

同样的问题发生在我直接链接json文件时：

  PATH = '/my/local/models/cased_L-12_H-768_A-12/bert_config.json'tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)if state_dict is None and not from_tf:try:state_dict = torch.load(resolved_archive_file, map_location="cpu")except Exception:raise OSError(
>                   "Unable to load weights from pytorch checkpoint file. ""If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. ")
E               OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

answer

相对路径？绝对路径？

文件相对于您的模型文件夹的位置在哪里？我认为它必须是相对路径而不是绝对路径。因此，如果您编写代码的文件位于'my/local/'中，则您的代码应如下所示：

```
PATH = 'models/cased_L-12_H-768_A-12/'
tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)
```

您只需要指定包含所有文件的文件夹，而不是直接指定文件。我认为这绝对是与路径有关的问题。尝试更改“斜杠”的样式：'/' vs'\'，这些在不同的操作系统中是不同的。还可以尝试使用“.”，例如./models/cased_L-12_H-768_A-12/等。

推荐【save_pretrained】方法保存文件。

不确定你从哪里获取这些文件。当我检查链接时，我可以下载以下文件：config.json，flax_model.msgpack，modelcard.json，pytorch_model.bin，tf_model.h5，vocab.txt。此外，最好通过tokenizer.save_pretrained（'YOURPATH'）和model.save_pretrained（'YOURPATH'）保存文件，而不是直接下载。- cronoik
2020年10月4日21:59

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfigYOURPATH = 'E:/workspace/Qwen/Qwen-7B-Chat'name = 'Qwen/Qwen-7B-Chat'
tokenizer = AutoTokenizer.from_pretrained(name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(name, device_map="auto", trust_remote_code=True, bf16=True).eval()
tokenizer.save_pretrained(YOURPATH)
model.save_pretrained(YOURPATH)

这篇关于【HuggingFace Transformers】（StackOverflow问答）使用Huggingface Transformers从磁盘加载预训练模型的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！