Building LLM with LLM

Bringing ChatGPT into Ecommerce

January 21, 2025

OldSkool Koding – more detail

May 8, 2025

Published by admin on April 24, 2025

Building LLM (platform) with LLM

LLM has already established itself as a (popular?) common keyword in the global news lately

What is LLM? let’s ask ChatGPT:…

LLM typically stands for Large Language Model in the context of AI and machine learning.

A Large Language Model is:

A type of artificial intelligence model trained on huge amounts of text data.
Designed to understand and generate human language, including answering questions, summarizing content, translating languages, writing code, and more.
Built using deep learning, especially architectures like Transformers (e.g., GPT, BERT, LLaMA).

…….

Examples of LLMs:

GPT-4 (by OpenAI)
Claude (by Anthropic)
LLaMA (by Meta)
Gemini (by Google)
Phi (by Microsoft)
Mistral, Mixtral (open-source)

My Exercise:

I’m keen to develop a LLM platform that is cost-effective, and a totally open-source framework which makes use use of a back-end language which is completly my own choice.

The product that I want to build is an “offline”, locally-hosted LLM (I’m thinking deepseek, or a smaller footprint model like phi), with a RAG built-in

In short, this means it’s a totally self-hosted product (no ai subscription necessary) which enables someone to upload a contextual document, then make AI queries based on it

So with this in mind, I’m choosing to code this framework in nodejs, with RESTful API-based communications

My first steps:

Initially I though the best way to do this is to search online for some tutorials, and resources. I mean I like to code, but there’s no point in reinventing the wheel is there.

The main hurdle that I came across was finding resources and materials that can help me to produce this framework with purely nodejs. All the python material out there is very helpful, but in my stubbornness I’m insisting on keeping all he coding within nodejs.

Now this lack of nodejs material has left me scratching my head for a bit! I’ve tried experimenting with stitching together multiple samples – and this has repeatedly ended up in a Frankenstein’s monster! there has to be a better way!

Eureka!

Of course the solution has been right under my nose all along… I’m building an AI product, why not engage AI from the start, just ask ChatGPT to help build this – from the ground up…. so this is where it starts…

As I’ve been exploring AI for a while, I’ve already got a ChatGPT plus subscription, so i’m good to go.
– note: :For a small fee, let me tell you, this is so worth it, even for the small – but crucial benefits it gives you (ie. RAG, file upload quota, less limitations on chats etc…)
This is for another conversation topic, go to openai homepage and check out the plans for yourself.

Here’s how my conversation went with ChatGPT:
As I’m still in my “AI-groove”, I decided to ask ChatGPT to summarise the whole process for me – see below…

🚀 Getting Started: Can I Build an Offline RAG Chatbot with Node.js?

I began with the question: “Can I build a completely local RAG chatbot using Node.js and host my model locally?”

Answer: Yes — with tools like Ollama for local LLM inference and Qdrant for vector search, Node.js can fully power an offline RAG setup.

I chose to run DeepSeek or Phi via Ollama and use @xenova/transformers for local embeddings. Qdrant served as the vector database.

🌐 No Python. No Internet. All Local.

A key requirement for me was that the entire platform must run:

Without internet connection
Without any Python dependencies
Entirely inside Docker

I confirmed that @xenova/transformers downloads model weights once and caches them. Qdrant and Ollama also work fully offline once their dependencies are loaded.

🛋️ Dockerized Stack Setup

I created a minimal Node.js backend and paired it with Qdrant. Docker Compose was used to manage both services.

My Docker Compose stack:

services:
  backend:
    build: .
    command: sh -c "npm install && node ingest.js && node server.js"
  qdrant:
    image: qdrant/qdrant

The backend server performed document ingestion, chunking, vector embedding, and served a /chat endpoint for inference.

📁 Ingesting Files: PDFs, Markdown, Text, and JSON

A core milestone was ingesting multiple document types:

PDFs via pdfjs-dist
Markdown via marked
TXT via native fs
JSON via a custom flattener for product data

Useful insight:

Flattening structured JSON into natural language sentences improved both embedding quality and search relevance.

Example:

Product Title: Jarrah Bee Pollen
Vendor: Bushranger Honey
Description: Seasonal superfood harvested in Western Australia.

💪 Embedding and Retrieval

I initially used Xenova/all-MiniLM-L6-v2 but later upgraded to bge-small-en for better semantic results.

Key learning: You must re-ingest all documents when changing embedding models, as vector spaces change.

I set limit: 5 for retrievals, which balances precision and recall well for most queries.

✍️ Prompt Tuning for Factual, Tight Responses

I noticed verbose outputs at first. To improve LLM accuracy and conciseness, I changed the prompt:

You are an assistant that answers questions briefly and factually.
User: [query]
Assistant (respond with a short, direct answer):

This alone dramatically reduced rambling and improved clarity.

🪡 Final Features

Query endpoint with embedded context
PDF/Markdown/JSON ingestion
Smart chunking with token estimation
Local embeddings + local LLM inference
Works 100% offline

📖 Want the Full Package?

I packaged the entire codebase with:

Dockerfile & docker-compose
Node backend
Embedding logic
Query prompt handler

Ready to run fully offline with a single docker-compose up --build

Thanks for reading! If you’re building your own offline assistant or chatbot, I hope this walkthrough helps you avoid some of the roadblocks I encountered. Feel free to reach out for tips or code snippets!

admin

Comments are closed.

Bringing ChatGPT into Ecommerce

OldSkool Koding – more detail

admin

Optimally Powering Ecommerce Needs

contact@openresourcing.sg

Cart

Bringing ChatGPT into Ecommerce

OldSkool Koding – more detail

Bringing ChatGPT into Ecommerce

OldSkool Koding – more detail

Building LLM (platform) with LLM

LLM typically stands for Large Language Model in the context of AI and machine learning.

A Large Language Model is:

Examples of LLMs:

My Exercise:

My first steps:

Eureka!

🚀 Getting Started: Can I Build an Offline RAG Chatbot with Node.js?

🌐 No Python. No Internet. All Local.

🛋️ Dockerized Stack Setup

📁 Ingesting Files: PDFs, Markdown, Text, and JSON

💪 Embedding and Retrieval

✍️ Prompt Tuning for Factual, Tight Responses

🪡 Final Features

📖 Want the Full Package?

admin

Related posts

Claude delivers on Shopify App framework

old skool koding

OldSkool Koding – more detail