Voice Conversion and Spoofing Countermeasure in Speaker Verification
Automatic speaker verification (ASV) offers a flexible biometric solution to person authentication. While the reliability of ASV systems is considered sufficient for mass market adoption, there are concerns on the vulnerabilities to spoofing, which refers to an attack whereby a fraudster attempts to manipulate an ASV system with synthetic/replay voice. On the other hand, due to the availability of high quality and low-cost recording devices, such as smartphones, replay spoofing attack are arguably the most accessible and therefore present a significant threat; similarly, speaker adaptation in speech synthesis and voice conversion techniques attempt to mimic a target speaker's voice automatically, and hence present a genuine threat to ASV systems.The research community has responded to replay, speech synthesis and voice conversion spoofing attacks with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being solved; ASV systems remain vulnerable to spoofing, and a deeper understanding of speaker verification, speech synthesis and voice conversion will be fundamental to the pursuit of spoofing-robust speaker verification. In this talk, we will look into the fundamentals of voice conversion and spoofing countermeasures.
Haizhou Li received the B.Sc, M.Sc, and Ph.D degrees in electrical and electronic engineering from South China University of Technology, Guangzhou, China in 1984, 1987, and 1990 respectively. He is now a Professor at the Department of Electrical and Computer Engineering, and the Department of Mechanical Engineering of the National University of Singapore.Professor Li's research interests include speech information processing, natural language processing, and human-robot interaction. He has published over 300 technical papers. Since 1988, he has taught in Hong Kong, Mainland China, Singapore, Finland, and Australia. As a technologist, he held a number of research and technical leader positions, and most recently, as Research Director of the Institute for Infocomm Research of Singapore. He co-founded Baidu-I2R Research Centre in Singapore (2012). Professor Li was known for his technical contributions to several award-winning speech products, such as Apple's Chinese Dictation Kits for Macintosh (1996) and Lernout & Hauspie's Speech-Pen-Keyboard Text Entry Solution for Asian languages (1999). He was the architect of a series of major technology deployments that include TELEFIQS voice-automated call centre service in Singapore Changi International Airport (2001), voiceprint engine for Lenovo A586 Smartphone (2012), and Baidu Music Search (2013).Professor Li is currently serving as Editor-in-Chief (2015-2017) of IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, and was a Member of the IEEE Speech and Language Processing Technical Committee. Professor Li has served as President of the International Speech Communication Association (ISCA, 2015-2017), President of Asia Pacific Signal and Information Processing Association (APSIPA, 2015-2016), President of the Chinese and Oriental Language Information Processing Society (COLIPS, 2011-2013), Vice President of the Asian Federation of Natural Language Processing (AFNLP, 2015-2016). He was the recipient of Singapore's National Infocomm Awards 2002, Institution of Engineers Singapore (IES) Prestigious Engineering Achievement Award 2013 and 2015, President's Technology Award 2013, and MTI Innovation Activist Gold Award 2015 in Singapore. He is an IEEE Fellow and was named one of the two Nokia Visiting Professors in 2009 by Nokia Foundation.
Parallel and Distributed Stochastic Learning
Big data machine learning (BDML) studies machine learning techniques for big data applications. BDML has become one of the driving forces for the advancement of artificial intelligence. Stochastic learning, such as SGD and its extensions, has become one of the key techniques in BDML. This talk will introduce our recent works on parallel and distributed stochastic learning. Furthermore, a BDML platform called LIBBLE (https://github.com/LIBBLE/LIBBLE-Spark/), which is developed by our group and has been open sourced, will also be introduced in the talk.
Wu-Jun Li is currently an Associate Professor at the Department of Computer Science and Technology, Nanjing University, P. R. China. His research interests include machine learning, big data, and artificial intelligence. In these areas he has published more than 30 peer-reviewed papers, most in prestigious journals such as TKDE and top conferences such as AAAI, CVPR, ICML, IJCAI, NIPS, and SIGIR. He has served as PC member of most top conferences in machine learning and artificial intelligence, including AAAI, CVPR, ICCV, ICML, IJCAI, NIPS, KDD, etc. For more information, please refer to: http://cs.nju.edu.cn/lwj/ .
Weakly Supervised Image Understanding
Sematic segmentation of nature images is a fundamental problem in computer vision. While significant research progresses have been made in the last few years, the success of most existing method highly rely on large scale accurate pixel accurate annotations. However, humans effortlessly learn robust and accurate visual cognitive modes without the requirement of huge amount of pixel accurate semantic annotation. During childhood, we learn to robustly recognize and precisely locate the object regions with limited supervision from parents and other sources. Inspired by this process, our research focus on human cognitive inspired weakly supervised image understanding, by utilizing visual attention, category independent edge detection, region clustering etc., we observed consistent performance boost in weak supervised image upstanding.
Ming-Ming Cheng is a professor with CCCE, Nankai University. He received his PhD degree from Tsinghua University in 2012. Then he worked as a research fellow for 2 years, working with Prof. Philip Torr in Oxford. Dr. Cheng’s research primarily centers on algorithmic issues in image understanding and processing, including image segmentation, editing, retrieval, etc. He has published over 30 papers in leading journals and conferences, such as IEEE TPAMI, ACM TOG, ACM SIGGRAPH, IEEE CVPR, and IEEE ICCV. He has designed a series of popular methods and novel systems, indicated by 5000+ paper citations (1700+ citations to his first author paper on salient object detection). His work has been reported by several famous international media, such as BBC, UK telegraph, Der Spiegel, and Huffington Post.
Object Detection at Google
At Google, we develop flexible state-of-the-art machine learning (ML) systems for computer vision that not only can be used to improve our products and services, but also spur progress in the research community. Creating accurate ML models capable of localizing and identifying multiple objects in a single image remains a core challenge in the field, and we invest a significant amount of time training and experimenting with these systems. In this talk, I will introduce the TensorFlow Object Detection API, our in-house object detection codebase. We have used this codebase in both research projects and multiple production settings using different platforms, as well as to build the model that placed first in the 2016 COCO Detection Challenge. And most recently, we open sourced our entire codebase --- we believe that this will be useful for many researchers and welcome contributions from the community.
Jonathan Huang is a senior research scientist at Google and currently works on deep learning for machine perception. He received his M.Sc degree and Ph.D from the School of Computer Science at Carnegie Mellon University in 2008 and 2011 respectively. From 2011 to 2014 he was an NSF Computing Innovation (CI) postdoctoral fellow at the geometric computing group at Stanford University where he also received his B.S. degree in Mathematics in 2005. His research interests lie primarily in deep learning, and probabilistic reasoning with combinatorially structured data with applications in computer vision and online education. To see a list of publications and projects, visitwww.jonathan-huang.org.
Data-based Source Localization for a Moving Source: Theory and Experimental Results
Matched field/mode processing was introduced sometimes ago for source localization based on the replica field for a hypothesized source location that best matches the acoustic data received on a vertical or horizontal line array. However, this method has the well-known mismatch problem, due to the fact that the acoustic environment used to calculate the replica field is often time-varying and not known accurately. A data-based matched-mode source localization method is introduced for a moving source, using mode wavenumbers and depth functions estimated directly from the moving source data, without requiring any environmental acoustic information and assuming any propagation model to calculate the replica field. The method is in theory free of the environmental mismatch problem since the mode replicas are estimated from the same data used to localize the source. As is the case for many methods, there are technical problems when applying the theory to the data. The basic theory is presented in this talk as well as some experimental results showing the technical problems and how they have been solved so far.
T.C. Yang received the Ph.D. degree in high energy physics from the University of Rochester, Rochester, NY, USA, in 1971.
He is currently a Professor and previously a Pao Yu-Kong Chair Professor at Zhejiang University, Hangzhou, China. From 2012 to 2014, he was a National Science Counsel Chair Professor at the National Sun Yat-Sen University, Kaohsiung, Taiwan. Before that, he spent 32 years working at the Naval Research Laboratory, Washington, DC, USA, serving as the Head of the Arctic Section, Dispersive Wave Guide Effects Group, and the Head of the Acoustic Signal Processing Branch, and consultant to the division on research proposals. His current research focuses on environmental impacts on underwater acoustic communications and networking, exploiting the channel physics to characterize and improve performance, environmental acoustic sensing and signal processing using distributed networked sensors, and methods for improved channel tracking and data-based source localization. In earlier years, he pioneered matched mode processing for a vertical line array, and matched-beam processing for a horizontal line array. His other areas of research included geoacoustic inversions, waveguide invariants, effects of internal waves on sound propagation in shallow water, Arctic acoustics, etc.
Prof. is a Fellow of the Acoustical Society of America.
Z. Jane Wang
University of British Columbia
Joint Blind Source Separation (JBSS) for Multiset, Multimodal Data Analysis
Blind Source Separation (BSS) has been attracting increasing attention due to its promising applications in numerous areas. Joint blind source separation (JBSS) represents both challenges and opportunities for multiset, multimodal data analysis, e.g., the neurophysiological signal processing community attempts to enhance understanding of normal brain function and the pathophysiology of many brain diseases by extracting information from complementary modalities using JBSS. We will discuss (1.) the over-determined JBSS case by investigating different statistical assumptions and tradeoffs between different JBSS methods, with focus on applications on cortico-muscular coupling analysis and biosensor based heart beat rate monitoring; and (2.) the less-studied under-dertermined JBSS (UJBSS) case, where the number of sensors M is smaller than the number of sources N, with focus on developing new UJBSS methods for 2 and multiple datasets by exploring the second-order statistics of the underlying sources. We present a novel UJBSS approach for artifacts removal (e.g., removing Electromyogram (EMG) from Electroencephalography (EEG) signals).
Z. Jane Wang received the B.Sc. degree from Tsinghua University, China, in 1996, and the M.Sc. and Ph.D. degrees from the University of Connecticut in 2000 and 2002, respectively, all in electrical engineering. She has been Research Associate of Electrical & Computer Engineering Department at the University of Maryland, College Park. Since Aug. 1, 2004, she has been with the Department Electrical and Computer Engineering at the University of British Columbia, Canada, and is currently a Professor. She is an IEEE Fellow. Her research interests are in the broad areas of statistical signal processing theory and applications. She co-received the EURASIP Journal on Applied Signal Processing (JASP) Best Paper Award 2004, and the IEEE Signal Processing Society Best Paper Award 2005. She has published over 100 journal papers and about 90 conference papers. She served as or is serving as Associate Editor for IEEE journals including IEEE Trans. on Signal Processing, IEEE Trans. on Information Forensics & Security, IEEE Trans. on Biomedical Engineering, IEEE Signal Processing Letters and IEEE Trans. on Multimedia.
University of Southern California
An Overview on Graphs Signal Processing (GSP)
Antonio Ortega received the Telecommunications Engineering degree from the Universidad Politecnica de Madrid, Madrid, Spain in 1989 and the Ph.D. in Electrical Engineering from Columbia University, New York, NY in 1994. His Ph.D. work was supported by the Fulbright Commission and the Ministry of Education of Spain. He joined the University of Southern California as an Assistant Professor in 1994 and is currently a Professor. At USC he is a member of the Integrated Media Systems Center, an NSF Engineering Research Center. He was Director of the Signal and Image Processing Institute (2004-2006) and Associate Chair of Electrical Engineering-Systems (2004-2007). In 1995 he received the NSF Faculty Early Career Development (CAREER) Award. He is a Fellow of the IEEE, a member of the ACM. He has been an Associate Editor of the IEEE Transactions on Image Processing and of the IEEE Signal Processing Letters. He is also a member of the IEEE Signal Processing Society Multimedia Signal Processing (MMSP) and Image and Multidimensional Signal Processing (IMDSP) technical committees. He was Chair of the IMDSP committee in 2004-5. He received the 1997 Northrop Grumman Junior Research Award awarded by the School of Engineering at USC. In 1998 he received the Leonard G. Abraham IEEE Communications Society Prize Paper Award for the best paper published in the IEEE Journal on Selected Areas in Communications in 1997, for his paper co-authored with Chi-Yuan Hsu and Amy R. Reibman. He also received the IEEE Signal Processing Society, Signal Processing Magazine Award in 1999 for a paper co-authored with Kannan Ramchandran, which appeared in the Signal Processing Magazine in November 1998. He also received the 2006 EURASIP Journal on Advances in Signal Processing Best Paper award for his paper A Framework for Adaptive Scalable Video Coding Using Wyner-Ziv Techniques co-authored with Huisheng Wang and Ngai-Man Cheung. He is the technical program co-chair for ICIP 2008. His research interests are in the area of digital image and video compression, with a focus on systems issues related to transmission over networks, application-specific compression techniques, and fault/error tolerant signal processing algorithms.