JARVIS: Connecting LLMs with the ML Community for AGI Exploration
Summary
JARVIS is an innovative system developed by Microsoft that aims to bridge Large Language Models (LLMs) with the broader Machine Learning community. It serves as a collaborative platform, using an LLM as a controller to orchestrate numerous expert models from Hugging Face Hub, thereby facilitating the exploration of Artificial General Intelligence (AGI) and solving complex AI tasks. This system streamlines the process of task planning, model selection, execution, and response generation.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
JARVIS, developed by Microsoft, is a pioneering system designed to connect Large Language Models (LLMs) with the vast Machine Learning (ML) community. Its core mission is to explore Artificial General Intelligence (AGI) and deliver cutting-edge research. JARVIS acts as a collaborative platform, leveraging an LLM as a central controller to orchestrate numerous expert models available on the Hugging Face Hub. This innovative approach, detailed in the "HuggingGPT" paper, enables JARVIS to tackle complex AI tasks through a four-stage workflow: Task Planning, Model Selection, Task Execution, and Response Generation.
Installation
To get started with JARVIS in server mode, follow these steps. Ensure you have git-lfs installed for model downloads.
Set up environment:
cd server conda create -n jarvis python=3.8 conda activate jarvis conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia pip install -r requirements.txtDownload models: (Required when
inference_modeislocalorhybrid)cd models bash download.shConfigure API Keys:
Replaceopenai.keyandhuggingface.tokeninserver/configs/config.default.yamlwith your personal OpenAI Key and Hugging Face Token, or set them as environment variablesOPENAI_API_KEYandHUGGINGFACE_ACCESS_TOKEN.Run server:
cd .. python models_server.py --config configs/config.default.yaml # required when `inference_mode` is `local` or `hybrid` python awesome_chat.py --config configs/config.default.yaml --mode serverOnce the server is running, you can access JARVIS's services via its Web API.
Examples
JARVIS can handle a variety of complex requests by intelligently combining different AI models. Here are a couple of examples from its CLI mode:
Counting Zebras in Images:
[ User ]: Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture?
[ Jarvis ]: Based on your request, I have used several models to process the images you provided. For the image A: /examples/a.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "a cat sitting on a window sill looking out". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/a3dc.jpg', which contains two objects with labels 'potted plant' and 'cat'. For the image B: /examples/b.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "two zebras standing in a field of dry grass". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/f5df.jpg', which contains three objects with labels 'zebra'. For the image C: /examples/c.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "a zebra standing in a field of tall grass". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/69c7.jpg', which contains one object with label 'zebra'. Finally, I used the canny-control model canny-control to generate the image '/images/7e18.png' based on the image /examples/f.jpg. Therefore, there are four zebras in these pictures. Is there anything else I can help you with?
Named Entity Recognition:
[ User ]: Please answer all the named entities in the sentence: Iron Man is a superhero appearing in American comic books published by Marvel Comics. The character was co-created by writer and editor Stan Lee, developed by scripter Larry Lieber, and designed by artists Don Heck and Jack Kirby.
[ Jarvis ]: Sure, I understand your request. Based on the inference results of the models, I will answer your request step by step. Firstly, I used the token-classification model dslim/bert-base-NER to recognize the named entities in the sentence. The inference result is that the named entities are Iron Man, Stan Lee, Larry Lieber, Don Heck and Jack Kirby. Then, I used the question-answering model deepset/roberta-base-squad2 to answer your request. The inference result is that there is no output since the context cannot be empty. Therefore, I cannot make it. I hope this answer your request. Is there anything else I can help you with?
Why Use JARVIS?
JARVIS offers a powerful framework for anyone looking to push the boundaries of AI. Its key advantages include:
- LLM Orchestration: Seamlessly connects powerful LLMs with a diverse array of expert ML models, enabling complex multi-modal and multi-task AI solutions.
- AGI Exploration: Provides a research platform for exploring Artificial General Intelligence by integrating various AI capabilities.
- Flexible Deployment: Supports local, Hugging Face, and hybrid inference modes, allowing users to balance performance and resource requirements.
- Task Automation: Capable of planning, selecting, executing, and integrating results from different models to automate sophisticated AI tasks.
- Community-Driven: A project by Microsoft, fostering collaboration and cutting-edge research in the ML community.
Links
- GitHub Repository: https://github.com/microsoft/JARVIS
- HuggingGPT Paper: https://arxiv.org/pdf/2303.17580.pdf
- Hugging Face Space: https://huggingface.co/spaces/microsoft/HuggingGPT
- EasyTool Paper: https://arxiv.org/abs/2401.06201
- TaskBench Paper: https://arxiv.org/abs/2311.18760