What Is a RAG System? How Law Firms Use It to Search Their Own Documents with AI
A managing partner asked us a question last month that captured the actual gap in how legal AI is described to attorneys: "Everyone is talking about RAG. What is it actually, and how is it different from ChatGPT?"
The honest answer is that the term is a piece of architectural jargon that vendors use without defining. Most attorneys nod along and approve a purchase decision based on a description that does not explain what the product does. This article fixes that. It defines retrieval-augmented generation (RAG), explains how the architecture applies to law firm document search, and identifies what RAG does and does not do.
The audience is attorneys, paralegals, and firm administrators evaluating legal AI. No technical background is assumed. References to specific technologies (vector databases, embedding models) are kept to footnote-equivalent depth.
The Plain-Language Definition
Retrieval-augmented generation is an architecture that combines two steps:
- Retrieval: When you ask a question, the system searches a body of documents and pulls out the most relevant passages.
- Generation: The system then uses an AI language model to write an answer based on the retrieved passages, with citations to where each piece of information came from.
The key word is "based on." The AI does not answer from its general training data. It answers from the specific documents the firm has provided. The retrieval step is what makes this possible; without it, the language model has nothing specific to draw on.
Compare this to a general-purpose chatbot like ChatGPT. ChatGPT answers from the patterns in its training data, which is the broader internet as of its training cut-off. It does not know about your firm's case files unless you paste them in. RAG is built specifically to answer from a defined document set, not from the open web.
Why This Architecture Fits Law Firms
Law firms have a specific information retrieval problem. The firm has thousands or millions of pages of documents (case files, contracts, pleadings, correspondence, research memos), and an attorney needs to find the answer to a specific question buried in that corpus.
Traditional search (keyword search in a document management system) is how firms have addressed this for two decades. Keyword search is reliable but limited: it finds documents that contain the exact words you typed. It does not find documents that contain a synonym, a related concept, or an answer phrased differently from the question.
RAG addresses both limitations. It retrieves documents based on semantic similarity (the underlying meaning of the question, not just the exact words), and it generates an answer by reading the retrieved passages and synthesizing them. The output is not a list of documents to review. It is an answer with citations.
For a firm whose attorneys spend 35% to 50% of billable time on document review and information retrieval (a figure consistent with ABA-affiliated time tracking research), the time recovered by an effective RAG system is significant.
How the System Actually Works (Without the Jargon)
The end-to-end flow has three phases.
Phase 1: Indexing
The firm's documents are processed once and stored in a way the system can search efficiently. For each document, the system:
- Extracts the text (using OCR if the document is scanned)
- Splits the text into smaller passages (typically a few hundred words each)
- Converts each passage into a numerical representation that captures its meaning
- Stores those representations in a searchable index
The numerical representation is what allows semantic search. Two passages with similar meaning end up with similar numerical representations, even if they use different words. The index is the searchable structure that makes retrieval fast.
Indexing is a one-time process per document. New documents added to the firm's case files are picked up incrementally. The firm does not need to re-index everything when a new file arrives.
Phase 2: Retrieval
When an attorney asks a question, the system:
- Converts the question into the same kind of numerical representation
- Searches the index for the passages with the most similar representations
- Pulls back the top-ranked passages along with their source documents
The retrieved passages are the raw material for the answer. They typically include 5 to 20 passages drawn from across the firm's corpus, ranked by relevance to the question.
Phase 3: Generation
The system then uses an AI language model to synthesize an answer from the retrieved passages. The model:
- Reads the question and the retrieved passages
- Constructs an answer that draws on the passages
- Includes citations to the specific documents and page locations where each piece of information appeared
The output is a coherent answer with footnoted sources. The attorney can read the answer and click through to the cited source to verify.
What RAG Does Well
The architecture is well suited to specific tasks:
- Question answering across a defined corpus: "What were the depositions of Dr. Smith in our prior cases involving lumbar injury?"
- Summarization with citations: "Summarize the key terms in our standard commercial lease form, with citations to the relevant clauses."
- Cross-document analysis: "Across all our family law cases involving high-net-worth divorces, what were the most common discovery dispute issues?"
- Compliance retrieval: "What does our document retention policy say about emails older than seven years?"
In each case, the system retrieves from the firm's own documents and answers from those documents. Citations make verification fast.
What RAG Does Not Do
The architecture has clear limits. Attorneys evaluating RAG-based products should understand them.
- It does not search outside the indexed corpus. A RAG system indexed on the firm's case files does not know about Florida statutes, current case law, or anything else not in the firm's documents. For external research, attorneys still use Westlaw, Lexis, or other research tools.
- It does not draft pleadings or contracts. The architecture is optimized for retrieval and synthesis, not for drafting first drafts of legal documents. Generative drafting is a different workflow with different competence implications under Florida Bar Rule 4-1.1.
- It does not provide legal advice. The output is a retrieval and summary tool. The attorney remains responsible for the legal analysis and any work product derived from the retrieval.
- It is not perfect. The retrieval can miss relevant passages if they are phrased very differently from the question. The generation can make small errors in summarization. Source citations exist precisely so the attorney can catch these errors.
The verification duty under Florida Bar Opinion 24-1 (see our companion article) is the operational answer to the imperfection. Citations make verification fast. Verification catches the errors before they become problems.
RAG and Florida Bar Rule 4-1.6: The Architecture Question
The architecture of a RAG system has direct implications for Florida Bar Rule 4-1.6 and ABA Model Rule 1.6, which prohibit disclosure of confidential information without informed client consent.
A cloud-hosted RAG system processes the firm's documents on the vendor's servers. The documents are transmitted to the vendor for indexing. Even with strong vendor data handling commitments, the transmission is itself a disclosure under Rule 4-1.6.
An on-premise RAG system processes the firm's documents on the firm's own hardware. The documents do not leave the firm's network. No third-party disclosure occurs. The Rule 4-1.6 consent requirement is not triggered for that processing.
This architectural distinction is not new to RAG; it applies to any AI system that processes firm documents. RAG is simply a category in which the distinction matters because RAG works best when it has access to the firm's full corpus, which means the architectural decision is consequential.
A Worked Example: PI Case File Search
To make the abstraction concrete, consider an actual workflow at a Florida personal injury firm.
The firm has 600 active and closed PI cases. The total document corpus is approximately 2 million pages. An attorney handling a new shoulder injury case wants to find the firm's prior cases that involved similar diagnoses and what those cases settled for.
Without RAG, the attorney searches the document management system for keywords like "shoulder," "rotator cuff," and specific ICD-10 codes. The keyword search returns hundreds of documents. The attorney spends two hours scanning them to identify the most analogous prior cases.
With RAG, the attorney asks: "Find prior cases involving rotator cuff tears with surgical treatment, and tell me the settlement amounts." The system retrieves passages from the prior case files that match the diagnostic and procedural pattern, synthesizes them into an answer with case-by-case citations, and returns settlement amounts where they appear in the records.
The attorney spends 10 minutes reviewing the citations to verify that the retrieved cases are genuinely analogous, then uses the verified information to inform the demand letter on the new case. The two-hour search becomes a 10-minute verification.
What to Ask a RAG Vendor
A managing partner evaluating a RAG-based legal AI product should be able to obtain clear answers to the following before signing a contract:
- Where does the indexing happen, and where are the indexed documents stored?
- Where does the retrieval happen, and where is the retrieval engine hosted?
- Where does the generation happen, and which AI model is used?
- Does the system return source citations on every answer?
- What is the indexing time for a corpus of our size, and what is the retrieval latency?
- How does the system handle scanned documents and OCR quality issues?
- What is the contractual posture on data retention and training on firm inputs?
A vendor that cannot answer these questions has a product transparency problem. A vendor that answers them in a way that exposes firm data to a third party has a Rule 4-1.6 issue the firm needs to address through client consent or an architectural change.
What Mi Assist Legal Does
Mi Assist Legal is an on-premise RAG system installed on a Mac Mini or compatible server inside the firm's office. The system indexes the firm's case files, contracts, and correspondence locally. Retrieval and generation both happen on the firm's hardware. Every answer includes source citations to the specific document and page location.
Because the system is on-premise, the firm's documents do not leave the firm's network in the course of routine processing. This addresses the Rule 4-1.6 disclosure question and simplifies vendor due diligence. How the system works and the security architecture describe the deployment in detail.
Frequently Asked Questions
Q: Is RAG the same as ChatGPT?
No. ChatGPT is a general-purpose chatbot that answers from its training data. RAG is an architecture that retrieves from a specific document corpus and generates answers from the retrieved content. RAG can use the same kind of language model that powers ChatGPT, but the retrieval step is what distinguishes the architecture.
Q: Does RAG hallucinate the way ChatGPT does?
RAG significantly reduces but does not eliminate hallucination. Because the system answers from retrieved passages with citations, the attorney can verify the answer against the cited source. Errors that do occur are typically retrieval errors (the system retrieved the wrong passages) rather than fabricated content.
Q: Can a RAG system search Florida case law and statutes?
Only if the system has been indexed on a corpus that includes those sources. A RAG system installed on the firm's case files only does not know about Florida case law or statutes outside what appears in the firm's documents. For external research, attorneys continue to use traditional research tools.
Q: How much does it cost to set up a RAG system for a small firm?
Cost varies widely depending on architecture (cloud vs. on-premise), corpus size, and support level. Cloud-based RAG products typically use per-attorney monthly licensing. On-premise systems typically use a one-time installation cost plus optional support. Our pricing page describes the on-premise structure.
Q: How long does it take to deploy a RAG system on a 5-30 attorney firm's documents?
For an on-premise system, initial deployment is typically a few days plus indexing time, which depends on corpus size. A firm with 1 to 2 million pages of documents typically completes initial indexing within a week of installation. The system is usable on partial indexes during the indexing process.
---
This article is intended as a plain-language introduction to RAG architecture for legal professionals evaluating AI tools. It does not constitute legal advice or technical engineering guidance. Specific implementation decisions should involve consultation with the firm's IT advisors and review of vendor documentation.
Mi Assist Legal
Private AI document search for Florida law firms.
Mi Assist Legal installs on a Mac Mini or server inside your firm. No cloud. No third-party access. Designed for Florida Bar Rule 4-1.6 and ABA Model Rule 1.6 compliance by architecture.
Book a Consultation