Meet PC agent: A hierarchical multi-agent collaborative framework for complex task automation on PC
Multimodal Large Language Models (MLLMS) have shown remarkable capabilities across different domains, which propel their development into multimodal means of human help. GUI automation agents for PCs are facing particularly scary challenges compared to smartphone counterparts. PC environments present significantly more complex interactive elements with dense, different icons and widgets that often lack text marks, … Read more