Paper Reading - A Joint Many-Task Model, Growing a Neural Network for Multiple NLP Tasks


Perform multi-tasks in a hierarchical manner. Train a multi-layer model for multitasks. Different layers handle different tasks, from morphology, syntax to semantics.

Key Ideas

  1. Different layers handle different tasks.
  2. Low-level layer handle easy task, high-level layer handle difficult task.
  3. Tasks are stacked: POS - CHUNK - DEP - Relatedness - Entailment.
  4. Train tasks sequentially, from easy to hard; add regularization term to prevent significant catastrophic forgetting.


Joint Many-Task Model

  1. Structure

    • Each task utilizes one layer BiLSTM
    • n + 1 layer dependends on n-th layer output.
  2. Data

    Use different existing labeled training data

  3. Training

    Train tasks in sequence: POS, CHUNK, DEP, Relatedness, Entailment (from low-level to high level). Add successive regularization: make the previous layer output not change too much after training current layer.


  1. Joint model performs better than single task models.
  2. Joint performance are comparable with existing single models.
  3. Sequential training is better than shuffle.
  4. Regularization is useful for tasks with small amount of data.


  1. Multi-task, task hierarchy are useful.
  2. Train from bottom to top.
  3. We can learn task dependency structure.
  4. We can utilize better techniques for catastrophic forgetting. Besides, maybe sampling to improve tasks with limited data can help.


Paper link: