Here are a few notes and tools to run AI locally on your computer.

Table of Contents

- [Option 1. Using Ollama](#option-1-using-ollama)
- [Option 2. Using Private-GPT](#option-2-using-private-gpt)
- [Option 3. Using Open WebUI](#option-3-using-open-webui)
- [Option 4. Using command line with Fabric (my favorite)](#option-4-using-command-line-with-fabric-my-favorite)
- [Option 5. Using Hugging face](#option-5-using-hugging-face) - [WhisperX](#whisperx) - [A few public useful tools](#a-few-public-useful-tools)

Option 1. Using Ollama

Step 1 Download and install Ollama

One command for Linux:

$ curl -fsSL https://ollama.com/install.sh | sh
>>> Downloading ollama...
######################################################################## 100.0%##O#-# 
######################################################################## 100.0%
>>> Installing ollama to /usr/local/bin...
NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

If you have a fast GPU make sure you get that line of GPU installed.. AI is GPU hungry, so the faster the GPU the better. If you don’t have a GPU, you can still run AI, but it will be much slower and you will have to get by with smaller models.

Step 2 - Choose your AI model.

You can download thousands of AI models from Hugginface

In particular here we are going to set up the Llama 3 LLM (Large Language Model), which has 3 flavours dependending on the size of the model you want to use. The larger the model, the more accurate the results, but also the more resources it will consume.

You may want to run the bigget model you computer can handle. As an example the requirements for the 3 models are:

  • llama3 7B - 16GB of RAM
  • llama3 70B - 32GB of RAM

Step 3 - Run the model you want with ollama.

ollama run llama2:70b

Step 4. You can have multiple models and view them with ollama list

$ ollama list
NAME            ID              SIZE    MODIFIED   
gemma2:27b      371038893ee3    15 GB   7 days ago
llama3:70b      786f3184aec0    39 GB   7 days ago
llama3:latest   365c0bd3c000    4.7 GB  7 days ago

Fine-tune

Option 2. Using Private-GPT

With privategpt you can also fine tune the model with your own data.

To fine tune the model with your own data with one command:

make ingest ./my-data -- --watch

Option 3. Using Open WebUI

To interact with GPT, Ollama or others through your browser we have Open WebUI that can be run with on commands explained here:

sudo docker run -d --gpus all --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

View on http://127.0.0.1:8080

Notice the --gpus all flag to activate gpu usage

Option 4. Using command line with Fabric (my favorite)

Fabric

yt --transcript https://www.youtube.com/watch\?v\=UbDyjIIGaxQ

yt --transcript https://www.youtube.com/watch\?v\=UbDyjIIGaxQ | fabric --stream --patern extract_wisdom

echo "Tell me a joke" | fabric -s -p ai --model llama3:70b

Summarize content on the clipboard.: xsel | fabric -sp summarize

Example to create your own prompt to later use: echo "You are an expert on the LayerZero and CCIP protocols" | fabric -sp improve_prompt

Some interesting paterns are:

  • summarize
  • write_essay
  • ai
  • label_and_rate
  • create_mermaid_visualization

To convert markdown to pdfs: Install pandoc and texlive-full (for higher quality pdf generation using LaTeX).

Some useful fabric commands:

fabric --update: To update patterns from the github repo. fabric --listmodels

Duplicate Fabric command problem

After installing Daniel Miessler fabric, I installed a Huggingface model that also installed a command called fabric in my system.

I looked for fabric executables on my system and I found both files:

$ find / -path /mnt -prune -o -type f -name "fabric" -executable 2>/dev/null
/mnt
/home/alejandro/.local/pipx/venvs/fabric/bin/fabric
/home/alejandro/.local/bin/fabric

So now if I want to use Daniel Misseler’s fabric I have to run it with the full path (which is the first one).

Option 5. Using Hugging face

Create a seperate python enviorment for hugginface

mkdir hugginface-cli-venv
cd hugginface-cli-venv
python3 -m venv myvenv
source myvenv/bin/activate

Install Hugingface CLI which comes in the hugginface_hub package, and login with: hugginface-cli login. You will need to token that your can generate on your account on huggingface.

pip install -U "huggingface_hub[cli]"
$ huggingface-cli login 

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
Your token has been saved to /home/alejandro/.cache/huggingface/token
Login successful

Then download a model like this:

huggingface-cli download stabilityai/stable-diffusion-3-medium 

Some models require you to accept their conditions before download. If this happens you will be informed with the link to do so and you can retry again.

To view what you downloaded:

huggingface-cli scan-cache


REPO ID                               REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH                                                                            
------------------------------------- --------- ------------ -------- ------------- ------------- ---- ------------------------------------------------------------------------------------- 
Systran/faster-whisper-large-v2       model             3.1G        4 6 days ago    6 days ago    main /home/alejandro/.cache/huggingface/hub/models--Systran--faster-whisper-large-v2       
Systran/faster-whisper-small          model           486.2M        4 6 days ago    6 days ago    main /home/alejandro/.cache/huggingface/hub/models--Systran--faster-whisper-small          
stabilityai/stable-diffusion-3-medium model            53.3G       28 6 days ago    6 days ago    main /home/alejandro/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3-medium 

Done in 0.0s. Scanned 3 repo(s) for a total of 56.9G.
Got 1 warning(s) while scanning. Use -vvv to print details.

Whisper and WhisperX

Whisper

whisper Video_2024-07-02_16-00-54.mkv --model medium --language English

WhisperX requires HuggingFace and whisper but has more options

whisperx 2024-07-04\ 00-11-02.mkv --model large-v2 --diarize --min_speakers 3 --max_speakers 3 --highlight_words True --language English --output_format vtt

Note: If you run into an out of memory error (RuntimeError: CUDA failed with error out of memory) you can reduce the batch_size with --batch_size = 4

A few public useful tools