Bringing ChatGPT into Ecommerce
January 21, 2025Building LLM (platform) with LLM
LLM has already established itself as a (popular?) common keyword in the global news lately
What is LLM? let’s ask ChatGPT:…
LLM typically stands for Large Language Model in the context of AI and machine learning.
A Large Language Model is:
-
A type of artificial intelligence model trained on huge amounts of text data.
-
Designed to understand and generate human language, including answering questions, summarizing content, translating languages, writing code, and more.
-
Built using deep learning, especially architectures like Transformers (e.g., GPT, BERT, LLaMA).
…….
Examples of LLMs:
-
GPT-4 (by OpenAI)
-
Claude (by Anthropic)
-
LLaMA (by Meta)
-
Gemini (by Google)
-
Phi (by Microsoft)
-
Mistral, Mixtral (open-source)
My Exercise:
I’m keen to develop a LLM platform that is cost-effective, and a totally open-source framework which makes use use of a back-end language which is completly my own choice.
The product that I want to build is an “offline”, locally-hosted LLM (I’m thinking deepseek, or a smaller footprint model like phi), with a RAG built-in
In short, this means it’s a totally self-hosted product (no ai subscription necessary) which enables someone to upload a contextual document, then make AI queries based on it
So with this in mind, I’m choosing to code this framework in nodejs, with RESTful API-based communications
My first steps:
Initially I though the best way to do this is to search online for some tutorials, and resources. I mean I like to code, but there’s no point in reinventing the wheel is there.
The main hurdle that I came across was finding resources and materials that can help me to produce this framework with purely nodejs. All the python material out there is very helpful, but in my stubbornness I’m insisting on keeping all he coding within nodejs.
Now this lack of nodejs material has left me scratching my head for a bit! I’ve tried experimenting with stitching together multiple samples – and this has repeatedly ended up in a Frankenstein’s monster! there has to be a better way!
Eureka!
Of course the solution has been right under my nose all along… I’m building an AI product, why not engage AI from the start, just ask ChatGPT to help build this – from the ground up…. so this is where it starts…
As I’ve been exploring AI for a while, I’ve already got a ChatGPT plus subscription, so i’m good to go.
– note: :For a small fee, let me tell you, this is so worth it, even for the small – but crucial benefits it gives you (ie. RAG, file upload quota, less limitations on chats etc…)
This is for another conversation topic, go to openai homepage and check out the plans for yourself.
Here’s how my conversation went with ChatGPT:
As I’m still in my “AI-groove”, I decided to ask ChatGPT to summarise the whole process for me – see below…
🚀 Getting Started: Can I Build an Offline RAG Chatbot with Node.js?
I began with the question: “Can I build a completely local RAG chatbot using Node.js and host my model locally?”
Answer: Yes — with tools like Ollama for local LLM inference and Qdrant for vector search, Node.js can fully power an offline RAG setup.
I chose to run DeepSeek or Phi via Ollama and use @xenova/transformers for local embeddings. Qdrant served as the vector database.
🌐 No Python. No Internet. All Local.
A key requirement for me was that the entire platform must run:
- Without internet connection
- Without any Python dependencies
- Entirely inside Docker
I confirmed that @xenova/transformers
downloads model weights once and caches them. Qdrant and Ollama also work fully offline once their dependencies are loaded.
🛋️ Dockerized Stack Setup
I created a minimal Node.js backend and paired it with Qdrant. Docker Compose was used to manage both services.
My Docker Compose stack:
services:
backend:
build: .
command: sh -c "npm install && node ingest.js && node server.js"
qdrant:
image: qdrant/qdrant
The backend server performed document ingestion, chunking, vector embedding, and served a /chat
endpoint for inference.
📁 Ingesting Files: PDFs, Markdown, Text, and JSON
A core milestone was ingesting multiple document types:
- PDFs via
pdfjs-dist
- Markdown via
marked
- TXT via native
fs
- JSON via a custom flattener for product data
Useful insight:
Flattening structured JSON into natural language sentences improved both embedding quality and search relevance.
Example:
Product Title: Jarrah Bee Pollen
Vendor: Bushranger Honey
Description: Seasonal superfood harvested in Western Australia.
💪 Embedding and Retrieval
I initially used Xenova/all-MiniLM-L6-v2
but later upgraded to bge-small-en
for better semantic results.
Key learning: You must re-ingest all documents when changing embedding models, as vector spaces change.
I set limit: 5
for retrievals, which balances precision and recall well for most queries.
✍️ Prompt Tuning for Factual, Tight Responses
I noticed verbose outputs at first. To improve LLM accuracy and conciseness, I changed the prompt:
You are an assistant that answers questions briefly and factually.
User: [query]
Assistant (respond with a short, direct answer):
This alone dramatically reduced rambling and improved clarity.
🪡 Final Features
- Query endpoint with embedded context
- PDF/Markdown/JSON ingestion
- Smart chunking with token estimation
- Local embeddings + local LLM inference
- Works 100% offline
📖 Want the Full Package?
I packaged the entire codebase with:
- Dockerfile & docker-compose
- Node backend
- Embedding logic
- Query prompt handler
Ready to run fully offline with a single docker-compose up --build
Thanks for reading! If you’re building your own offline assistant or chatbot, I hope this walkthrough helps you avoid some of the roadblocks I encountered. Feel free to reach out for tips or code snippets!