王树森Transformer学习笔记

目录TransformerAttention结构Self-Attention结构Multi-head Self-AttentionBERT:Bidirectional Encoder Representations from TransformersSummaryReference Transforme…