0 1
0 1
0 8192
MultimodalTextbox
Real World Image Cases
Real World Video Case

Terms of use

By using this service, users are required to agree to the following terms: The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The service may collect user dialogue data for future research.

We deploy our model backend with SGLang. However, there could be congestion during the serving process, leading to delayed responses. If you encounter any issues with the webpage, kindly refresh it.

License

The service is a research preview and is subject to the License of Qwen2, the License of LLaVA-NEXT, and the Terms of Use governing the data generated by OpenAI. Users are required to strictly adhere to the terms outlined in these licenses. Please contact us if you identify any potential violations.

Citation

@article{visualwebinstruct,
    title={VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search},
    author = {Jia, Yiming and Li, Jiachen and Yue, Xiang and Li, Bo and Nie, Ping and Zou, Kai and Chen, Wenhu},
    journal={arXiv preprint arXiv:2503.10582},
    year={2025}
}