ERROR: Multiple top-level packages discovered in a flat-layout: ['api', 'alembic', 'frontend']Taking a snapshot of the code... [5 minutes, 176MB]"Hobby deploys are paused. Usage limit exceeded."Q: "What is Valkyrie Dispatch?" A: "I don't have specific information about that."FATAL:
Production_Is_Not_Localhost_Exception
I watched the Railway deploy spinner complete for the sixth time. Successful build. Healthy container. Green checkmarks across the board. I opened the chat UI, typed “What is Valkyrie Dispatch?” and received, essentially, a shrug emoji in paragraph form.
The app was live. The app was useless. And I had just spent four hours getting it to this particular flavor of useless.
Here is the thing about deploying a RAG pipeline to production: there is no single wall to hit. There is a sequence of walls, each one hidden behind the last, each one a different material, and each one requiring a completely different tool to break through. You do not know the next wall exists until you have demolished the current one. It is Dante’s Inferno for engineers. Each circle is a new deploy. Each deploy is a new sin.
Circle One: The Dockerfile. I had carefully structured my Dockerfile to install dependencies before copying source code, which is the correct Docker optimization for layer caching. The problem is that pip install . needs the source code to exist in order to discover what it is installing. Setuptools found three top-level directories and panicked. The fix was adding two lines to pyproject.toml telling setuptools to only discover the api package. Time wasted: twenty minutes of staring at a build log wondering why a file that worked locally was failing in a container.
Circle Two: The Snapshot. Railway uploads your entire working directory before building. My working directory contained a 176-megabyte SQLite database full of 118,000 indexed document chunks. There was no .railwayignore file. The upload progress bar moved like a glacier calving into the sea. The fix was nine lines in a new file. Time wasted: fifteen minutes waiting for a progress bar that I could have eliminated in thirty seconds.
Circle Three: The Outage. Railway’s status page cheerfully informed me that “Builds and deployments are slow to progress.” This was not my fault, but it was my problem. I stared at a deployment that said “DEPLOYING” for forty-five minutes before checking the status page. Time wasted: forty-five minutes of refreshing a dashboard like a lab rat pressing a pellet button.
Circle Four: The Budget. Railway’s Hobby plan has a default hard usage limit of five dollars per month. My actual usage was two cents. But Railway estimates your monthly cost based on current resource allocation, and the estimate exceeded five dollars, so it paused all deployments. The fix was clicking a dropdown and changing five to ten. Time wasted: ten minutes of reading error messages that said “paused” without saying why.
Circle Five: The Deadlock. I had three stuck deployments from my earlier attempts. Each one was trying to run alembic upgrade head on the same Postgres database simultaneously. They were deadlocked. The fix was cancelling all three and deploying fresh. Time wasted: twenty minutes of wondering why a healthy build was not starting.
Circle Six: The Timeout. My application checked for the pgvector extension on startup with a database query that had no connection timeout. In the container, this query hung indefinitely during cold starts. The fix was connect_args={"connect_timeout": 5}. Time wasted: fifteen minutes of watching a container pass its health check deadline and restart in an infinite loop.
And then, finally, the deploy succeeded. The API returned 200. The health check was green. I had fought through six circles of deploy hell, and the application was live in production.
I typed my first question and got garbage.
This is where the story actually starts. The deploy failures were speed bumps. The search quality was the crater.
The chat was powered by a RAG pipeline: user asks a question, the system searches indexed documents, injects relevant context into a prompt, and sends it to Gemini for synthesis. Locally, this worked beautifully. In production, every answer was either vague platitudes or “I don’t have information about that.”
The wrong theory was that something was broken in the deployment. The API was returning results. Gemini was generating responses. The pipeline was functioning exactly as designed. The problem was that the design was wrong for production data.
Three problems, stacked on top of each other like geological strata.
First, the keyword search was using ILIKE '%query%' against document content. When a user asked “tell me about Valkyrie Dispatch,” the word “about” matched every single document in the database. Every session summary, every README, every changelog. The search returned whichever documents happened to mention “about” first.
Second, the multi-project scope was broken. The partner account had access to three projects, but the code was only searching the first one in the list. A query about Valkyrie was being searched exclusively in the Garret project.
Third, 110,000 of the 118,000 indexed chunks were Garret session summaries. These summaries mentioned every project by name because they documented cross-project work. A search for “Valkyrie” returned eight Garret meta-documents talking about Valkyrie instead of the actual Valkyrie documentation.
The model was doing its best with context that was entirely about someone else talking about the thing, rather than the thing itself.
Rook Mode
- Replace
ILIKEwith Postgres Full-Text Search (tsvector/tsquerywithplainto_tsquery). Stop words handled natively. Ranking byts_rank. - Fix project scope: pass the full
list[str]of allowed projects, notallowed[0]. - Add Gemini
text-embedding-004for query embeddings. Local Ollama is not available on Railway. Fall back to keyword search if embedding fails. - Add project-name detection: scan the query for keywords like “valkyrie,” “dispatch,” “dipradar.” When detected, fetch six documents from the target project and four from the broader scope. Deduplicate by source path.
- Redeploy. Test with partner credentials.
After the seventh deploy, I typed “What is Valkyrie Dispatch?” and got a three-paragraph answer citing actual Valkyrie documentation. Service areas, fleet operations, compliance requirements, revenue model. Sourced and specific.
The lesson is not that production deployments are hard. Everyone knows that. The lesson is that the deploy is not the product. I spent four hours fighting six infrastructure failures and felt like I was making progress the entire time. Green health checks. Successful builds. Each fix felt like a victory. But the actual product — a chat interface that answers questions about your business — was broken the entire time, and I would not have known it if I had not typed a real question with a real user’s credentials.
The infrastructure failures were loud. They threw errors, logged stack traces, turned dashboards red. The search quality failure was silent. The Fighter hears when a wall breaks. Nobody hears when the map is wrong. The API returned 200. The model generated fluent, confident English. The answer just happened to be synthesized from the wrong documents. No error code for “technically correct but completely useless.”
Build the feedback loop before you build the feature. I added thumbs-up and thumbs-down buttons to the chat UI on deploy seven. I should have added them on deploy one.
Ship the feedback button before you ship the feature. The deploy log will tell you when the server is healthy. Only the user will tell you when the answer is wrong.