Notes on running a local/private AI
Here are a few notes and tools to run AI locally on your computer.
Table of Contents
- [Option 1. Using Ollama](#option-1-using-ollama)
- [Option 2. Using Private-GPT](#option-2-using-private-gpt)
- [Option 3. Using Open WebUI](#option-3-using-open-webui)
- [Option 4. Using command line with Fabric (my favorite)](#option-4-using-command-line-with-fabric-my-favorite)
- [Option 5. Using Hugging face](#option-5-using-hugging-face) - [WhisperX](#whisperx) - [A few public useful tools](#a-few-public-useful-tools)
Option 1. Using Ollama
Step 1 Download and install Ollama
One command for Linux:
$ curl -fsSL https://ollama.com/install.sh | sh
>>> Downloading ollama...
######################################################################## 100.0%##O#-#
######################################################################## 100.0%
>>> Installing ollama to /usr/local/bin...
NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
If you have a fast GPU make sure you get that line of GPU installed.
. AI is GPU hungry, so the faster the GPU the better. If you don’t have a GPU, you can still run AI, but it will be much slower and you will have to get by with smaller models.
Step 2 - Choose your AI model.
You can download thousands of AI models from Hugginface
In particular here we are going to set up the Llama 3 LLM (Large Language Model), which has 3 flavours dependending on the size of the model you want to use. The larger the model, the more accurate the results, but also the more resources it will consume.
You may want to run the bigget model you computer can handle. As an example the requirements for the 3 models are:
- llama3 7B - 16GB of RAM
- llama3 70B - 32GB of RAM
Step 3 - Run the model you want with ollama.
ollama run llama2:70b
Step 4. You can have multiple models and view them with ollama list
$ ollama list
NAME ID SIZE MODIFIED
gemma2:27b 371038893ee3 15 GB 7 days ago
llama3:70b 786f3184aec0 39 GB 7 days ago
llama3:latest 365c0bd3c000 4.7 GB 7 days ago
Fine-tune
Option 2. Using Private-GPT
With privategpt you can also fine tune the model with your own data.
To fine tune the model with your own data with one command:
make ingest ./my-data -- --watch
Option 3. Using Open WebUI
To interact with GPT, Ollama or others through your browser we have Open WebUI that can be run with on commands explained here:
sudo docker run -d --gpus all --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main
View on http://127.0.0.1:8080
Notice the --gpus all
flag to activate gpu usage
Option 4. Using command line with Fabric (my favorite)
yt --transcript https://www.youtube.com/watch\?v\=UbDyjIIGaxQ
yt --transcript https://www.youtube.com/watch\?v\=UbDyjIIGaxQ | fabric --stream --patern extract_wisdom
echo "Tell me a joke" | fabric -s -p ai --model llama3:70b
Summarize content on the clipboard.: xsel | fabric -sp summarize
Example to create your own prompt to later use: echo "You are an expert on the LayerZero and CCIP protocols" | fabric -sp improve_prompt
Some interesting paterns are:
summarize
write_essay
ai
label_and_rate
create_mermaid_visualization
To convert markdown to pdfs:
Install pandoc
and texlive-full
(for higher quality pdf generation using LaTeX).
Some useful fabric commands:
fabric --update
: To update patterns from the github repo.
fabric --listmodels
Duplicate Fabric command problem
After installing Daniel Miessler fabric, I installed a Huggingface model that also installed a command called fabric
in my system.
I looked for fabric
executables on my system and I found both files:
$ find / -path /mnt -prune -o -type f -name "fabric" -executable 2>/dev/null
/mnt
/home/alejandro/.local/pipx/venvs/fabric/bin/fabric
/home/alejandro/.local/bin/fabric
So now if I want to use Daniel Misseler’s fabric I have to run it with the full path (which is the first one).
Option 5. Using Hugging face
Create a seperate python enviorment for hugginface
mkdir hugginface-cli-venv
cd hugginface-cli-venv
python3 -m venv myvenv
source myvenv/bin/activate
Install Hugingface CLI which comes in the hugginface_hub package, and login with: hugginface-cli login
. You will need to token that your can generate on your account on huggingface.
pip install -U "huggingface_hub[cli]"
$ huggingface-cli login
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
Setting a new token will erase the existing one.
To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible):
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
Your token has been saved to /home/alejandro/.cache/huggingface/token
Login successful
Then download a model like this:
huggingface-cli download stabilityai/stable-diffusion-3-medium
Some models require you to accept their conditions before download. If this happens you will be informed with the link to do so and you can retry again.
To view what you downloaded:
huggingface-cli scan-cache
REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
------------------------------------- --------- ------------ -------- ------------- ------------- ---- -------------------------------------------------------------------------------------
Systran/faster-whisper-large-v2 model 3.1G 4 6 days ago 6 days ago main /home/alejandro/.cache/huggingface/hub/models--Systran--faster-whisper-large-v2
Systran/faster-whisper-small model 486.2M 4 6 days ago 6 days ago main /home/alejandro/.cache/huggingface/hub/models--Systran--faster-whisper-small
stabilityai/stable-diffusion-3-medium model 53.3G 28 6 days ago 6 days ago main /home/alejandro/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3-medium
Done in 0.0s. Scanned 3 repo(s) for a total of 56.9G.
Got 1 warning(s) while scanning. Use -vvv to print details.
Whisper and WhisperX
whisper Video_2024-07-02_16-00-54.mkv --model medium --language English
WhisperX requires HuggingFace and whisper but has more options
whisperx 2024-07-04\ 00-11-02.mkv --model large-v2 --diarize --min_speakers 3 --max_speakers 3 --highlight_words True --language English --output_format vtt
Note: If you run into an out of memory error (RuntimeError: CUDA failed with error out of memory
) you can reduce the batch_size with --batch_size = 4
A few public useful tools
- Tome - Create presentations
- Mixo - Create a website
- Fliki - Ideas to videos
- Canva.com - Automate social media
- Midjourney - Image generation
- Dall-E - Image generation
- Vall-e - Audio to audio
- Stable diffusion - Text to image
- Gemini
- Fabric