TL;DR
useChat hook, and a polling pattern for long-running jobs like video generation.Check out the full tutorial here:
The whole tutorial rests on one idea. Every AI SaaS app has two halves, frontend and backend, and AI is now genuinely good at one of them. UI code, components, even mobile screens, you can generate in minutes and throw away if you don't like the result. The cost of getting it wrong is low.
The backend is different. A bug in your auth flow logs people out. A bug in your data model leaks data into the wrong tenant. A bug in your billing logic charges people twice. None of these are styling fixes you patch later.
So the split the video proposes is: generate the surface, control the foundation. AI is still useful on the backend, but for extending what you already have (writing a custom controller, a service, a lifecycle hook), not for owning the whole thing end to end. This is exactly where a headless CMS earns its place in the stack — it gives you the backend structure without making you write it.
Strapi shows up in this build because it solves the "I need a real backend, not a toy" problem without forcing you to write one from scratch. A headless CMS is the right shape for an AI SaaS because the data layer needs to serve multiple consumers — your web app today, an API client or AI agent tomorrow — and the CMS shouldn't dictate what any of them look like.
A few things stand out:
The takeaway: a good headless CMS gives you the parts of a backend that are tedious to build and easy to get wrong (data modeling, auth, permissions, an admin UI) in a configuration you fully control. For an AI SaaS where the surface area changes weekly, that stable foundation is the thing that lets the rest of the stack move fast.
One of the biggest time savings in the tutorial is that auth is done on day one. You do not write the registration flow, the password hashing, or the JWT logic. Strapi's built-in Users and Permissions plugin handles all of it: registration, login, token issuance, and session management.
The Next.js side wraps this in a small API layer. The video walks through a handful of helpers, all backed by a shared strappyFetch function:
registerWithStrappy calls the /api/register endpoint to create a new user.loginWithStrappy authenticates an existing user against /api/login.fetchCurrentUser retrieves the logged-in user's data using the JWT.Each of these is mirrored by a Next.js API route (/api/register, /api/login, /api/logout) that handles the server-side flow.
One detail worth copying: on successful login or registration, the API route stores the JWT in an HTTP-only cookie, not in localStorage. An HTTP-only cookie cannot be read by JavaScript running on the page, so a third-party script (an analytics tag, an injected ad, a malicious dependency) cannot steal the token. localStorage gives you no such protection.
The app also uses a requireAuth() helper in the layout to enforce protected routes. Two things happen automatically:
/dashboard are redirected to /login./login or /register are redirected to /dashboard.Small detail, often forgotten, and very annoying when missing.
With the backend and auth in place, the dashboard ties together chat, image, and video generation. These are the three features most AI SaaS products will need at some point, and the interesting thing is that each of them teaches a slightly different backend pattern.
The chat module is not a send-message-wait-render loop. It uses the AI SDK with Google Gemini and streams the response back to the client in chunks. On the frontend, the @ai-sdk/react package's useChat hook handles the streaming lifecycle. Tokens render as they arrive, the way ChatGPT does it. The user doesn't sit on a loading spinner.
The persistence flow on the backend is worth pulling out, and it's where the headless CMS really pays off:
conversationId. If one doesn't exist, it creates a new one.Step 5 is the one to copy. If you save partial chunks as they come in, a stream error halfway through leaves you with half a reply in the database. Wait for the complete response, then write once.
The image module is structurally simpler than chat, but it adds a piece chat doesn't have: a gallery of past generations.
A new Image collection in Strapi stores the prompt, the generated image URL, and the user it belongs to. The flow:
aspectRatio (1:1, 16:9, and so on).generateImage helper.On the frontend, a handleGenerate function manages the loading state and displays the new image when it arrives. A dedicated /api/images endpoint returns the user's past generations, and the UI renders them as a gallery alongside the prompt input.
The gallery is the design choice that matters. Without it, the feature feels like a one-shot tool. With it, the feature feels like a workspace — which is the difference between a demo and an AI SaaS people actually pay for.
Video changes the shape of the backend call. Generation takes long enough that you cannot just await the API and return a response. The HTTP request would time out, or the user would stare at a spinner for thirty seconds with no feedback.
The fix is polling. The backend kicks off the generation job, gets back a job ID, then checks the API status every ten seconds until the video is ready or a timeout hits. Only then does it return the final video URL.
The Strapi side gets a Video collection that mirrors the Image one: prompt, URL, user relation. The frontend adds a duration picker (4, 6, or 8 seconds) so the user can pick how long the generated clip should be. The render layer swaps the <img> tag for a <video> tag. Everything else stays the same as the image flow.
That last point is worth pausing on. The image and video flows are nearly identical because the architecture was set up consistently the first time. When the backend pattern is clean, adding a new AI capability is a small extension, not a rebuild — which is exactly the property you want when shipping AI SaaS features on a tight cadence.
The architecture the tutorial walks through is modular by design. Each layer has a clear job:
That split is what makes an AI SaaS build scale. You can swap Gemini for another model without touching auth. You can add a mobile client without rewriting your data layer. You can give an AI agent scoped API access without exposing the whole system. The headless CMS keeps the data contract stable while everything around it evolves.
The frontend can be generated. The backend has to be designed. This tutorial is a solid template for getting the second half right, so the first half can move as fast as the tools allow.
Citations