Publication: How Frequency and Harmonic Profiling of a ‘Voice’ Can Inform Authentication of Deepfake Audio: An Efficiency Investigation
Type:
Article
Date
2025-01
Journal Title
Journal ISSN
Volume Title
Publisher
SLIIT, Faculty of Engineering
Abstract
As life in the digital era becomes more complex, the capacity for criminal activity within the
digital realm becomes even more widespread. More recently, the development of deepfake media
generation powered by Artificial Intelligence pushes audio and video content into a realm of doubt,
misinformation, or misrepresentation. The instances of deepfake videos are numerous, with some
infamous cases ranging from manufactured graphic images of the musician Taylor Swift, through to the
loss of $25 million dollars transferred after a faked video call. The problems of deepfake are becoming
increasingly concerning for the general public when such material is submitted into evidence in a court
case, especially a criminal trial. The current methods of authentication against such deepfake evidence
threats are insufficient. When considering speech within audio forensics, there is sufficient
‘individuality’ in one’s own voice to enable comparison for identification. In the case of authenticating
audio for deepfake speech, it is possible to use this same comparative approach to identify rogue or
incomparable harmonic and formant patterns within the speech. The presence of deepfake media within
the realms of illegal activity demands appropriate legal enforcement, resulting in a requirement for
robust detection methods. The work presented in this paper proposes a robust technique for identifying
such AI-synthesized speech using a quantifiable method that proves to be justified within court
proceedings. Furthermore, it presents the correlation between the harmonic content of human speech
patterns and the AI-generated clones they produce. This paper details which spectrographic audio
characteristics were found that may prove helpful towards authenticating speech for forensic purposes
in the future. The results demonstrate that using specific frequency ranges to compare against a known
audio sample of a person’s speech, indicates the presence of deepfake media due to different harmonic
structures.
Description
Keywords
Artificial Intelligence, Digital Forensics, Speech Processing, Speech Analysis
