Using ollama and open-webui to Play with Open-Source Large Models
I have been using large models for over a year now, starting from the initial ChatGPT 3.5/4 to the current domestic and open-source models, which are becoming increasingly powerful, such as Llama and SD. Today, I will introduce two tools that I have been using for a long time: Ollama and Open-WebUI.
Ollama
Ollama is an open-source deep learning framework designed for convenient deployment and running of large language models (LLMs) on local machines. It provides a complete deep learning toolchain, including data preprocessing, model building, training, evaluation, and deployment functions.
Project Address: https://github.com/ollama/ollama
Main Features of Ollama:
- Local Deployment and Running: One of the main goals of Ollama is to simplify the process of deploying large language models in Docker containers. This makes it easy for non-professional users to manage and run these complex models.
- Lightweight and Scalable: As a lightweight framework, Ollama maintains a small resource footprint while offering good scalability. This allows users to adjust configurations based on the scale of their projects and hardware conditions.
- API Support: Ollama provides a simple API, enabling developers to easily create, run, and manage instances of large language models. This lowers the technical barrier for interacting with the models.
- Pre-built Model Library: Ollama includes a range of pre-trained large language models that users can directly use in their applications without having to train or find the models themselves.
- Model Import and Customization:
- Import from GGUF: Supports importing existing large language models from specific platforms like GGUF.
- Import from PyTorch or Safetensors: Compatible with these two deep learning frameworks, allowing users to integrate models trained on these frameworks into Ollama.
- Custom Prompts: Allows users to add or modify prompts for the models to guide them in generating specific types or styles of text output.
- Cross-platform Support: Provides installation guides for macOS, Windows (preview), Linux, and Docker, ensuring that users can deploy and use Ollama smoothly across multiple operating system environments.
- Command-line Tools and Environment Variables:
- Command-line Start: The Ollama service can be started via the command
ollamaserveor its aliasesserve,start. - Environment Variable Configuration: Such as
OLLAMA_HOST, used to specify the host address and port for the service. Users can modify these as needed.
- Command-line Start: The Ollama service can be started via the command
In addition, Ollama offers rich API interfaces, community support, and documentation, making it a powerful tool for developers and individual users to run and manage large language models locally.
Quick Installation of Ollama
Linux installation command for Ollama:
|
|
Basic Ollama Commands
Dialogue with Llama3 Chinese Version:
|
|
Using Ollama to Adjust Large Model Prompts and Parameters
Here, we create a Modelfile based on Alibaba’s open-source Qwen large model:
|
|
|
|
REST API Requests
Generate response:
Model conversation:
Ollama supports many large models. For more experiences, visit: https://ollama.com/library
Open-WebUI
Open-WebUI is an extensible, feature-rich, and user-friendly self-hosted web UI, developed using the Svelte front-end framework, designed to run completely offline. It supports various LLM runners, including APIs compatible with Ollama and OpenAI.
Previously, operations were done via Ollama command line, but now they can be fully managed through the web interface.
Open-WebUI Features
-
Easy Installation: Seamless installation using Docker or Kubernetes (kubectl, kustomize, or helm). Supports :ollama and :cuda tagged images for a hassle-free experience.
-
Ollama/OpenAI API Integration: Easily integrate OpenAI-compatible APIs for versatile conversations with Ollama models. Customize OpenAI API URLs to link to LMStudio, GroqCloud, Mistral, OpenRouter, etc.
-
Pipelines, Open Web Plugin Support: Seamlessly integrate custom logic and Python libraries into Open Web using the Pipelines Plugin Framework. Launch pipeline instances, set the OpenAI URL to the pipeline URL, and explore endless possibilities. Examples include function calls, user rate limiting to control access, usage monitoring with tools like Langfuse, real-time translation with LibreTranslate for multilingual support, toxic message filtering, and more.
-
Responsive Design: Enjoy a seamless experience on desktops, laptops, and mobile devices.
-
Mobile Progressive Web App (PWA): Use our PWA to enjoy a native app-like experience on your mobile device, providing offline access and a seamless user interface on localhost.
-
Full Markdown and LaTeX Support: Enhance your LLM experience with comprehensive Markdown and LaTeX features for rich interactions.
-
Model Builder: Easily create [recommended] models through the Web UI. With Open Web Community integration, easily create and add custom roles/agents, custom chat elements, and import models.
-
Local RAG Integration: Dive into the future of chat interactions with groundbreaking Retrieval-Augmented Generation (RAG) support. This feature seamlessly integrates document interactions into your chat experience. You can directly load documents into the chat or add files to the document library and easily access them with the # command before querying.
-
Web Search for RAG: Perform web searches using providers like SearXNG, Google PSE, Brave Search, serstack, and serper, and inject the results directly into your chat experience.
-
Web Browsing Functionality: Integrate websites seamlessly into your chat experience by using the # command followed by a URL. This feature allows you to merge web content directly into your conversations, enhancing the richness and depth of interactions.
-
Image Generation Integration: Seamlessly integrate image generation capabilities using options like AUTOMATIC1111 API or ComfyUI (local) and OpenAI’s DALL-E (external) to enrich your chat experience with dynamic visual content.
-
Many Model Conversations: Effortlessly engage with a variety of models simultaneously, leveraging their unique strengths for the best responses. Enhance your experience by utilizing a diverse set of models in parallel.
-
Role-Based Access Control (RBAC): Ensure secure access with restricted permissions; only authorized individuals can access your Ollama, and exclusive model creation/extraction permissions are reserved for administrators.
-
Multilingual Support: Experience Open Web in your preferred language with our internationalization (i18n) support. Join us and expand the languages we support! We are actively seeking contributors!
-
Continuous Updates: We are committed to improving Open Web through regular updates, fixes, and new features.
Deploying Open-WebUI
To use it with Ollama, quickly deploy Open-WebUI using docker-compose:
Refer to: https://github.com/valiantlynx/ollama-docker
Since I am using an Nvidia Tesla T4 card locally, I will use the GPU configuration (drivers need to be installed beforehand, which is not covered here).
docker-compose-ollama-gpu.yaml
|
|
Start the services:
|
|
Access https://xxx.xxx.xxx.xxx:8080/ and register a management account to start using it.

Summary
The combination of Ollama and Open-WebUI brings many advantages. First, through an intuitive UI, even non-technical users can easily set up and train deep learning platforms and models. Second, Open-WebUI provides Ollama with rich features such as data visualization and model performance monitoring, making it easier for users to understand the training status and performance of the models. Finally, the open-source nature of Ollama and Open-WebUI allows users to customize and extend them according to their needs.