6.3. 预先训练的模型¶

We are releasing the BERT-Base and BERT-Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith. The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part-of-Speech tagging).

These models are all released under the same license as the source code (Apache 2.0).

For information about the Multilingual and Chinese model, see the Multilingual README.

When using a cased model, make sure to pass ``–do_lower=False`` to the training scripts. (Or pass ``do_lower_case=False`` directly to ``FullTokenizer`` if you’re using your own script.)

The links to the models are here (right-click, ‘Save link as…’ on the name):

`BERT-Base, Uncased <https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip>`__: 12-layer, 768-hidden, 12-heads, 110M parameters
`BERT-Large, Uncased <https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-24_H-1024_A-16.zip>`__: 24-layer, 1024-hidden, 16-heads, 340M parameters
`BERT-Base, Cased <https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip>`__: 12-layer, 768-hidden, 12-heads , 110M parameters
`BERT-Large, Cased <https://storage.googleapis.com/bert_models/2018_10_18/cased_L-24_H-1024_A-16.zip>`__: 24-layer, 1024-hidden, 16-heads, 340M parameters
`BERT-Base, Multilingual Cased (New, recommended) <https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip>`__: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
`BERT-Base, Multilingual Uncased (Orig, not recommended) <https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip>`__(Not recommended, use ``Multilingual Cased`` instead): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
`BERT-Base, Chinese <https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip>`__: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

Each .zip file contains three items:

A TensorFlow checkpoint (bert_model.ckpt) containing the pre-trained weights (which is actually 3 files).
A vocab file (vocab.txt) to map WordPiece to word id.
A config file (bert_config.json) which specifies the hyperparameters of the model.