Local Models

Connect to a model running on your local machine

Intro & Initial Setup (0:00-1:18), LM Studio (1:19-2:25), Ollama (2:26-3:10), Jan (3:11-3:59)

What is a Local Model?

When you run a local model, the LLM runs on your own computer. Then Steve and your Agents can use that model instead of using one in the cloud. The primary benefit of this is that your requests never leave your machine, so all your data is kept private and secure. The secondary benefit is that it's FREE. We have tested Ask Steve with the providers below, but you should be able to use any system that can expose a model via a local web server.

What are the downsides?

You have to download big model files (e.g. 3+GB)
You need to have a capable machine
The models your local machine can run generally aren't as good as the ones you can access in the cloud
It can be slower, even though it's running locally

How Do I Use LM Studio With Ask Steve?

Download and install LM Studio
Open LM Studio. Click the purple Discover icon in the left nav to find a model and download it.
Once it's downloaded, press the Load Model button on the 'Download Complete` dialog.
When it's done loading, go to the Local Server page in LM Studio by pressing the green icon in the left nav. Switch the Status toggle on to Running. This will start up a local server on port 1234 with the selected model.
Finally, go to the Models page in Ask Steve Settings, press ADD NEW MODEL and select Local: Streaming from the menu. You won't need an API Key since it's running on your own machine.
Change the port number in the URL field to match what LM Studio is serving on. LM Studio's port is typically 1234, so the beginning of the URL should be http://localhost:1234...
Change the model name to match what you downloaded, and any other attributes that you want (context window, output tokens, temperature, etc.). Press TEST to ensure it works, then SAVE NEW MODEL to save it.
Congratulations! Steve is now using a model running completely on your own computer!

How Do I Use Jan.ai With Ask Steve?

Download and install Jan
Open Jan. Click the Explore the Hub button and pick a model to download.
Once it's downloaded, press the button near the bottom of the left nav with < > inside. Press the Start Server button.
Finally, go to the Models page in Ask Steve Settings, press ADD NEW MODEL and select Local: Streaming from the menu. You won't need an API Key since it's running on your own machine.
Change the port number in the URL field to match what Jan is serving on. Jan's port is typically 1337, so the beginning of the URL should be http://localhost:1337...
Change the model name to match what you downloaded, and any other attributes that you want (context window, output tokens, temperature, etc.). Press TEST to ensure it works, then SAVE NEW MODEL to save it.
Congratulations! Steve is now using a model running completely on your own computer!

How Do I Use gpt4all With Ask Steve?

Download and install gpt4all
Open gpt4all. Click the Install a Model button and pick a model to download.
Once it's downloaded, press the Settings icon and under Application Settings check Enable Local API Server
Finally, go to the Models page in Ask Steve Settings, press ADD NEW MODEL and select Local: Non-Streaming from the menu. You won't need an API Key since it's running on your own machine.
Change the port number in the URL field to match what gpt4all is serving on. gpt4all's port is typically 4891, so the beginning of the URL should be http://localhost:4891...
Change the model name to match what you downloaded, and any other attributes that you want (context window, output tokens, temperature, etc.). Press TEST to ensure it works, then SAVE NEW MODEL to save it.
Congratulations! Steve is now using a model running completely on your own computer!

How Do I Use Ollama With Ask Steve?

Download and install Ollama
Download a model. For example, this terminal command will pull Meta's Llama 3: ollama pull llama3
You will need to enable the Ask Steve extension to connect to Ollama. To do this you need to configure the Ollama server with the environment variable OLLAMA_ORIGINS, and it needs to be set to "chrome-extension://gldebcpkoojijledacjeboaehblhfbjg”.
Instructions for how to do so on various platforms are here. So on Chrome on a Mac you'd issue this terminal command: launchctl setenv OLLAMA_ORIGINS "chrome-extension://gldebcpkoojijledacjeboaehblhfbjg" to give Ask Steve access to Ollama. Other alternatives to starting Ollama with the correct configuration are described here.
After setting OLLAMA_ORIGINS, you will need to restart the Ollama server. On a Mac you can do this by quitting and restarting Ollama from the Mac Taskbar.
Finally, go to the Models page in Ask Steve Settings, press ADD NEW MODEL and select Local: Streaming from the menu. You won't need an API Key since it's running on your own machine.
Change the port number in the URL field to match what Ollama is serving on. Ollama's port is typically 11434, so the beginning of the URL should be http://localhost:11434...
Change the model name to match what you downloaded, and any other attributes that you want (context window, output tokens, temperature, etc.). Press TEST to ensure it works, then SAVE NEW MODEL to save it.
Congratulations! Steve is now using a model running completely on your own computer!