image text to text · transformers model
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks