shikhar sur Twitter : "Instead of asking whether tree structure should be baked into NNs, our new paper asks if transformers already have a tendency to learn tree structured computations when trained on language, and if this structure is predictive of generalization! "
