Any organisation faces challenges deriving from the vast amounts of organisational knowledge—from policies to standard operating procedures (SOPs). Small and medium-sized enterprises (SMEs) often struggle to manage and access this information efficiently while staying lean and productive.

I decided to write this short guide to help organisations deploy an organisational assistant so employees can simply ask the assistant for a procedure, a financial threshold, or even the company’s holiday policy. Leveraging AI, specifically a customised organisational system based on large language models (LLMs), offers a practical and scalable solution to transform knowledge management and empower employees with instant access to relevant information.

AI Deployment Strategies for SMEs

There are several approaches to deploying an AI assistant. The chart below provides a subjective assessment of three primary strategies—New Custom Models, Retrieval-Augmented Generation (RAG), and Third-Party AI Platforms—against the main criteria that might affect selection.

Comparison chart for AI Deployment Options - RAG vs Model Training vs Third-Party Platforms by Radko Diev, sofpact

Key Approaches

Custom Models: Essentially, this involves training an LLM from scratch based on organisational data. It is extremely costly and suitable for organisations with highly unique needs and significant resources, or those intending to commercialise their AI. For instance, Bloomberg trained their own LLM in 2023, requiring over a million GPU hours. The advantage is its high customisation potential, allowing for domain-specific tailoring to exact organisational needs.
Retrieval-Augmented Generation (RAG): This approach offers a balanced and practical option for most organisations. Someone else has already invested in training the foundational model (e.g., Llama), and by using vector databases, the AI assistant can access organisational data dynamically to generate responses. It is cost-effective, adaptable, and scalable, making it an ideal choice for SMEs.
Third-Party AI Platforms: These solutions are the fastest to deploy and might resemble RAG in functionality but rely on APIs provided by platforms like Microsoft Azure, OpenAI or Cohere. The initial deployment cost is often low, as this approach is again based on RAG, as heavy implementation is not required, but these platforms may come with limitations in terms of customisation and reliance on external providers.

While subjective, the chart reflects a generalised comparison of these strategies to help SMEs navigate their decision-making process. The ultimate choice depends on unique organisational priorities.

I will focus on RAG because it is the most feasible option for SMEs. It provides a balance between cost, scalability, and functionality, whether leveraging open-source models or third-party APIs.

Building Your AI Assistant

Once the right approach is selected, the next step is to implement the AI assistant effectively. Here are the key steps for building a RAG-based assistant:

1. Data Collection

Gather and organise all relevant internal documents, emails, and databases. Ensure the data is up-to-date, high-quality, and relevant to the use cases you aim to address. Examples include HR policies, customer service logs, and operational guidelines. Confidentiality is a critical consideration here; ensure that sensitive information is handled securely. You might consider compartmentalising data to control access or building unique assistants tailored to specific departments or needs.

2. Model Selection

Select a pre-trained language model based on your requirements and resources:

Proprietary APIs: OpenAI, Google’s Gemini, Microsoft Azure OpenAI Service, Anthropic’s Claude, IBM Watson, Amazon Bedrock. These platforms offer ready-to-use models and tools for integrating AI functionalities into workflows.
Open-Source Models: Meta’s LLaMA, GPT-NeoX, EleutherAI’s GPT-J, MosaicML’s MPT. Platforms like Hugging Face provide a wide variety of open-source models and tools for fine-tuning and deployment, catering to highly unique organisational needs.

3. Setting Up the RAG Pipeline

Use embedding models like OpenAI’s embeddings or SentenceTransformers to convert documents into vector representations.
Store these embeddings in a vector database (e.g., Pinecone, Weaviate, or FAISS).
Use frameworks like LangChain or LlamaIndex to integrate the vector database with the LLM.

4. Deployment

Deploy the assistant through user-friendly applications, such as:

Deploy internal chatbots for employees.
Automated email responders or CRM tools.
Interfaces for drafting reports or content generation.

5. Monitoring and Maintenance

Continuously monitor the system’s performance and refine it as needed:

Regularly update the database with up-to-date documents.
Track user feedback to improve system responses.
Refine workflows or prompts for better accuracy.

Balancing ROI and Scalability

A key differentiator between approaches lies in the cost structure. While all models, including custom-built ones, require ongoing operational costs such as API usage or computational resources, the significant upfront training cost of custom models sets them apart drastically. Building a custom model requires significant investment in data science expertise, time, and GPU resources, making it a feasible option only for organisations with substantial budgets and specialised needs.

RAG, by contrast, avoids these training costs as it leverages pre-trained models. According to a recent analysis by AdaSci, the operational costs of a RAG pipeline primarily stem from embedding generation, database maintenance, and LLM API calls. These costs vary significantly depending on the scale of implementation and usage.

Affordable AI for SMEs

As seen with Otto Group and Walmart, AI tools are revolutionising workplaces by saving time and enhancing efficiency. It allows SMEs, for the first time, to keep up with large organisations by leveraging AI to boost efficiency and drive growth.