If a podcast falls on a platform and no one is there to listen, does it make a sound?
No it doesn't. Neither would a video. In other words, Google would never know what the author is talking about. Unless, of course, the author also posted an accompanying transcript and managed the technical aspects of SEO correctly.
Sure: AI is making progress in the field of search recognition, but even assuming they can "play" your podcast, search engines' AI are nowhere near understanding what they hear. If you've ever owned a Google phone number, you know that while the search engine will try and convert a message from speech to text in order to notify you, its transcoding is shaky at best. More often than not, the results are mutated beyond recognition.
How to rank a podcast in 2019?
In order to rank your audio content, you still need to convert the (spoken) audio content to the only media that Google can "read:" text. There's a number of speech-to-text utilities out there that will do a respectable job at translating your spoken words into readable characters. But at least at this time, they still need your help hefoe they can help you. Depending on the nature and the context of the speech, that help may be more than you can afford. The whole purpose of a podcast (or a video) is to shorten the time it takes an author to get the message out. Speech is far more natural and intuitive than text, but to a computer that reality is turned on its head.
That doesn't mean that Google will never be able to read your audio content. One day, soon, it won't matter whether you've uploaded a voiced or a text post: both will rank the same for an equivalent content. This means that you have to plan ahead and massage your audio content as if Google could read it. As this article points out:
Ultimately, audio SEO will mean treating our audio content in a more structured and deliberate way. The broader evolution of Google across many devices also means that we need to be more aware of what type of content best fits our audience's needs. Is the searcher looking for text, video, or audio? Each modality fits a different need and a different device (or set of devices) in the broader search ecosystem.