The new and updated Meta's NotebookLlama will first create a transcript from a file and – take as an example a PDF document from a movie or a news article and will then automatically modify and make it more appropriate for a podcast. After making it more cinematic and emotional, it will be fed to an AI text-to-speech generator and there you have it – your podcast has been generated.
The voices of those models have a more robotic feel to them compared with the podcasts generated by NotebookLM. Another characteristic seen in the NotebookLlama podcast generator is the fact that the voices tend to speak one over the other at some point in the speech.
However, Meta-researchers who are behind this new update say that those errors could be worked upon as well as their quality. “The text-to-speech model is the limitation of how natural this will sound,” wrote NotebookLlama’s GitHub page. “[Also,] another approach to writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single model to write the podcast outline.”
However, it’s worth mentioning that this is not the first time NotebookLM is trying to be replicated. Some of them are more successful than others, the final outcome being left to see for the future.