AI SERVICES · SEMANTIC SEARCH & RETRIEVAL
Search that understands the question, not just the keywords.
RankSaga builds end-to-end retrieval systems for the workloads that matter: customer support knowledge bases, internal document search, regulatory and policy retrieval, and the retrieval layer underneath enterprise RAG. Chunking strategy, dense and sparse retrieval, re-ranking, query understanding, and the eval framework that keeps it honest.
Semantic search is not a model. It is a pipeline. The model is one component, alongside chunking, indexing, query understanding, hybrid retrieval, re-ranking, and the relevance metric that tells you when one of those components quietly broke.
WHY THIS MATTERS
Retrieval is the thing your users actually feel.
When a search experience is bad, users do not blame the embedding model. They blame the product. They give up, they ask a colleague, they file the ticket the search bar was supposed to prevent. The cost of bad retrieval is invisible to the engineering team and obvious to everyone else.
RankSaga treats search as a system. We design the chunking strategy against the actual document distribution. We pick dense and sparse retrieval components against the query distribution we observe in real logs. We add re-ranking where the precision matters and skip it where the latency does not justify it. We instrument the relevance metrics that tell us when something has regressed before a user notices.
The work composes with our other AI services capabilities. The vector index sits on Vector Database Management. The embeddings come from Embedding Model Optimisation. The system feeds into Retrieval-Augmented Generation when generation is the next step. Every layer is measured against the application-level outcome it is supposed to produce.
WHAT WE SHIP
Six concrete pieces of work.
01 / Capability
Chunking Strategy
Document segmentation tuned to the corpus shape. Fixed-size, semantic, recursive, layout-aware. Choice driven by measured retrieval quality on your real queries, not on a default.
02 / Capability
Hybrid Retrieval
Dense vector retrieval combined with sparse retrieval (BM25, SPLADE) and metadata filtering. Reciprocal rank fusion, learned weighting, or query-classifier routing where appropriate.
03 / Capability
Re-Ranking
Cross-encoder re-rankers (BGE, Cohere Rerank, ColBERT) applied where precision matters more than latency. The trade-off is measured against the application-level metric, not assumed.
04 / Capability
Query Understanding
Intent classification, query rewriting, expansion, and decomposition. The pre-retrieval layer that turns ambiguous user queries into something the index can answer.
05 / Capability
Relevance Evaluation
Labelled eval sets drawn from real query logs. Recall@k, MRR, nDCG, plus the downstream metric the search system is supposed to move (task completion, ticket deflection, time-to-answer).
06 / Capability
Production Search Service
The API, the caching layer, the observability, and the operational posture. Latency under load, query cost, drift detection, and the runbooks the operating team uses.
HOW WE OPERATE
Measure, design, deploy, monitor.
01 / Step
Measure What You Have
We sample real queries from production logs, label them against expected results, and measure the current system on recall@k, MRR, and the downstream task metric. Baseline before any changes.
02 / Step
Design the Pipeline
Chunking, embedding, indexing, retrieval, re-ranking, query understanding, all selected against the measured baseline. Every component change is an experiment with a measured outcome.
03 / Step
Deploy and Monitor
Production service inside your environment. Drift detection, alerting on relevance regression, and the next round of tuning as the corpus and query mix evolve.
WHAT YOU GET
A search system you operate.
01 / Deliverable
A working production search system
Inside your environment. API contract, latency targets, and operational documentation.
02 / Deliverable
An evaluation harness you own
Labelled eval sets from your real query logs. Regression suite that runs on every change. Dashboards your team operates.
03 / Deliverable
Re-indexing and re-tuning playbook
The runbooks for what to do when the corpus changes shape, when a new document type lands, when relevance drifts.
04 / Deliverable
Embedded knowledge transfer
Your engineers work alongside ours throughout. By the end of the engagement the system is operable by your team without us.
PROOF
BEIR-grade retrieval methodology, applied to enterprise corpora.
RankSaga's published BEIR work demonstrates the methodology in the open: measured recall lift through embedding optimisation, hybrid retrieval, and re-ranking. The same engineers apply the same methodology against customer corpora inside customer environments under NDA.
RANKSAGA · BEIR BENCHMARK · 2026
- ·Published BEIR results across multiple datasets.
- ·Open-source RankSaga-Optimised-E5-v2 model on HuggingFace.
- ·Retrieval engineering deployed inside customer VPC, on-premise, and air-gapped environments.
- ·Same team that operates AI in live deployment for the Australian Armed Forces.
RELATED CAPABILITIES
Where retrieval connects.
Adjacent
Vector Database Management →
The index underneath retrieval. Architecture, hybrid retrieval, ops.
Adjacent
Embedding Model Optimisation →
The embeddings retrieval depends on. Domain-tuned for measurable lift.
Adjacent
Retrieval-Augmented Generation →
The application layer when retrieval feeds an LLM. Grounded generation, attribution.
QUESTIONS
What customers ask before we start.
Do we need re-ranking, or is dense retrieval enough?+
It depends on the precision the application requires and the latency budget. For RAG over technical documentation, re-ranking with a cross-encoder typically lifts precision@10 by 10-30 percent at the cost of 50-200ms of latency. We measure the trade-off rather than assume it.
How do we handle queries that mix natural language and structured filters?+
Query understanding. We classify the query, extract structured constraints (dates, products, document types), and route the natural-language portion to semantic retrieval with the constraints applied as metadata filters. This is one of the highest-leverage changes we make on existing search systems.
What chunking strategy works best?+
There is no universal answer. Fixed-size chunks work for short, homogeneous documents. Layout-aware chunking is essential for technical PDFs and forms. Semantic chunking pays off when the corpus has natural section breaks. We measure each strategy against a representative query set on your corpus before committing.
Can you work over our existing search stack (Elasticsearch, OpenSearch)?+
Yes. Many engagements augment an existing keyword search system rather than replacing it. We add a vector retrieval branch, fuse the results, and demonstrate the lift on the production query mix before any cutover.
How is this different from buying a search SaaS?+
A search SaaS sells you defaults. We build the system around your corpus, your query distribution, and your latency and residency constraints, and we leave behind the eval harness and the operational documentation. SaaS is the right answer when defaults are good enough; engagement is the right answer when they are not.
ENGAGE
If users are bouncing off your search, we want to look at it.
Search quality is one of the most measurable surfaces in an enterprise stack. The first deliverable is usually a number that nobody on the team has seen before. Engagements typically start there.