6.3. 预先训练的模型¶
We are releasing the BERT-Base and BERT-Large models from the
paper. Uncased means that the text has been lowercased before
WordPiece tokenization, e.g., John Smith becomes john smith. The
Uncased model also strips out any accent markers. Cased means
that the true case and accent markers are preserved. Typically, the
Uncased model is better unless you know that case information is
important for your task (e.g., Named Entity Recognition or
Part-of-Speech tagging).
These models are all released under the same license as the source code (Apache 2.0).
For information about the Multilingual and Chinese model, see the Multilingual README.
When using a cased model, make sure to pass ``–do_lower=False`` to the training scripts. (Or pass ``do_lower_case=False`` directly to ``FullTokenizer`` if you’re using your own script.)
The links to the models are here (right-click, ‘Save link as…’ on the name):
`BERT-Base, Uncased<https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip>`__: 12-layer, 768-hidden, 12-heads, 110M parameters`BERT-Large, Uncased<https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-24_H-1024_A-16.zip>`__: 24-layer, 1024-hidden, 16-heads, 340M parameters`BERT-Base, Cased<https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip>`__: 12-layer, 768-hidden, 12-heads , 110M parameters`BERT-Large, Cased<https://storage.googleapis.com/bert_models/2018_10_18/cased_L-24_H-1024_A-16.zip>`__: 24-layer, 1024-hidden, 16-heads, 340M parameters`BERT-Base, Multilingual Cased (New, recommended)<https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip>`__: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters`BERT-Base, Multilingual Uncased (Orig, not recommended)<https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip>`__(Not recommended, use ``Multilingual Cased`` instead): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters`BERT-Base, Chinese<https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip>`__: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
Each .zip file contains three items:
A TensorFlow checkpoint (
bert_model.ckpt) containing the pre-trained weights (which is actually 3 files).A vocab file (
vocab.txt) to map WordPiece to word id.A config file (
bert_config.json) which specifies the hyperparameters of the model.