I continued with this method for my second video. It only took half the time the first video took. Mostly due to my improved knowledge regarding the software. But also due to less unnecessary effects ;-).
After creating the first two videos, I felt reasonably confident in my ability to create the voice-over and tried to record it without any script. This was harder than I thought and I had to repeat some sentences multiple times before they sounded right. Mostly, I had to get rid of some awkward mid-sentence break offs. After I had the voice-over, I created the video to match it. Creating the video was the same as with a script. This resulted in the following video.
After I had created my first three videos, I had four weeks until I had to create the next batch. I had no access to my recording setup for a few days. Therefore, I started with the creation of the videos. Fortunately, this worked really well. I had to cut the voice-over a lot more than before and I also had to change the speed of the recording quite a lot to match the voice-over, but overall this process is not only the fastest but also results in better videos. I think this is because I could create the videos without the limitation of already having a voice-over (or a script) and had a script in form of the video to record the voice-over. Hence, this is overall the best way to create videos (at least for me).