Should We Expect an Intelligence Explosion
Ricky Mouser
Johns Hopkins Berman Institute of Bioethics
Philosophy PHD (2024)
Mentor: Gary Ebbs
The AI literature broadly accepts that we should expect an intelligence explosion—a point when AI becomes so good at improving itself that it becomes superintelligent within a short period of time. Here intelligence just means effectiveness at figuring out how to achieve a wide variety of goals. Contrast that with wisdom, which we can gloss as correctly seeing which goals are worthwhile. But since the Orthogonality Thesis suggests an agent’s instrumental effectiveness and ultimate goals are unrelated, a foolish AI might very effectively turn the galaxy into paperclips. We can make sense of intelligence explosions because the standards of progress can largely be stated in advance (Can you achieve X?) and demonstrated empirically (Did you?). A superintelligent being could basically do anything. But we are much less clear on what superwisdom would look like. What would it be like for a being to basically see anything correctly? Given this, should we expect a wisdom explosion, a point when AI becomes so wise that further reflection makes it superwise within a short period of time? If so, I worry whether we would confuse a superwise AI’s moral progress for moral confusion or regress. Might we even be taken in by highly effective supersophists with persuasive arguments?
Me, my Mom, and my Grandmother! Privacy with Robots for All Ages
Leigh Levinson and Long-Jing Hsu
Department of Informatics, Luddy School of School of Informatics, Computing, and Engineering
Both are PhD Candidates in the Informatics Program under the supervision of Dr. Selma Šabanović
In the age of artificial intelligence, and particularly embodied agents like social robots which provide additional functional and emotional benefits (Lutz & Tamó-Larrieux, 2020), it is imperative to understand not only users' privacy concerns but also how these concerns differ across intergenerational populations. In the following presentation, we discuss three studies that used different methods and catered to other populations to understand user perceptions of privacy with social robots and design more privacy-aware systems.
We integrate findings from a focus group with 15 teens about robots in the home, a co-learning workshop with 6 families about robots in the home, a panel with 6 older adults about robots in their daily lives, and a cohabitation with 11 families. During panels, workshops, and co-habitation, we found evidence of the robot privacy paradox given that user behavior did not always match a user’s vocalizedprivacy concerns. We also assert that users of all ages are not only capable of recognizing privacy concerns but different situations and contexts affect their comfort with a robot collecting, analyzing, and sharing information. Overall, our work brings attention to privacy discussions with social robots as robots become integrated into people’s living spaces.
Creating Granular Textures Through Dimension Reduction and Bespoke Two-Dimensional Spaces
Isaac Smith
Jacobs School of Music, Composition Department
Mentor: Prof. John Gibson
While granulation and granular synthesis are widely used digital music techniques, they are limited by a one-dimensional temporal organization of their source material; that is, audio samples are organizedand auditioned linearly in time. “Grains,” short snippets of audio used as building blocks for granulation, are defined parametrically through time codes in an audio file. Changing the granular profile of a synthesizer is most commonly accomplished by “scrubbing” forward or backward through the audio.
Machine learning can help categorize and assemble granular profiles by defining and analyzing grains algorithmically, enabling the creation of textures made up of grains with similar sonic profiles, even if they are not temporally adjacent in the audio file. This project employs the Max/MSP scripting language and the Fluid Corpus Manipulation (FluCoMa) library to create more complex, organic granular sound textures than can be easily achieved with traditional granulation techniques.
To accomplish this, multiple sound files from a variety of sources are analyzed and broken into sections or “grains” via an amplitude-based break point analysis. Once these grains are established, they are categorizedusing FluCoMa’s statistical analysis tools, and mapped onto a bespoke two-dimensional space accessible by a performer. This space allows for sound grains from different files – or different time points in the same files – to be “near” the location of other samples that are similar in timbre. A performer or electronic algorithm can then move smoothly through this space, exploring and creating unified granular textures comprised of grains from many different sources simultaneously. It also allows for the smooth interpolation of textures impossible with traditional granular synthesis.
Because the computational analysis occurs prior to performance, this project will enable machine-learning assisted live performance of granulation and granular synthesis and provide performers with a broader and more cohesive timbral palette for their expressive endeavors.
None of These Words Appear in the Bible: The ConservativeāEvangelical Foundation of Meta's Massively Multilingual Speech
Eli Beaton
History Department, College of Arts and Sciences
PhD in History, MA in Art History
Meta’s generative text-to-speech AI model, Massively Multilingual Speech, always pronounces certain words incorrectly: machine, financial, development, accumulation, framework, erosion, and any number bigger than 40. Massively Multilingual Speech is capable of generating and transcribing over 1,100 languages and identifying over 4,000 languages. Yet in all of these languages, there is pattern of errors, buried underneath ambiguous pronunciations and speech quirks: Massively Multilingual Speech struggles to pronounce words that do not appear in translations of the Bible.
These errors in Meta’s speech generation are not failures in the architecture of the AI model, but rather the inevitable artifacts of the data used to train the model: thousands and thousands of hours of audio recordings of the New Testament. The dataset powering Massively Multilingual Speech was first produced by a non-profit called Faith Comes by Hearing during the conservative–evangelical movement of 1980s–2010s as an effort to spread "Christian Values" globally. Faith Comes by Hearing relied on a system of low-wage and volunteer labor to produce each recording of the Bible in a new language. Meta harvested and further anonymizedthose recordings for the development of Massively Multilingual Speech; it is unclear whether Faith Comes by Hearing even realizes their recordings are so foundational to Meta's AI development. The labor, cost, information, and ideology which produced the data are obscured in Massively Multilingual Speech, but the model speaks uncannily in the voices of the original readers. This project correlated errors in output to words in the Bibles using MMS to generate and interpret it's own speech. This circular loop of data translations reveals a clear relationship between the errors in Massively Multilingual Speech and words which do not appear in translations of the Bible, gesturing to the now doubly-hidden labor of the readers that power Meta's AI model.
Prepared piano detection for real-time applicationsĀ
Tommaso Rosati
Indiana University Music and Arts Technology
Mentor: Dr. Timothy Hsu
Prepared piano and extended techniques have been widely used in contemporary music since the last century. Combinedwith live electronics, they offer a vast range of sonic possibilities and allow composers and performers to explore new timbres and textures. Before the use of the score-following technologies, introduced by Barry Vercoe and Roger Dannenburg in 1984 at ICMC and followed by IRCAM Antescofo systems, the connection between the acoustic performers and the live electronics was indirect and required manual coordination. With the advancement of machine learning and deep learning techniques, it is now possible to detect and analyze piano performances in real-time, opening up new possibilities for composers and performers and crossing the boundaries between the acoustic and electronic worlds. This research analyzes how the different machine and deep learning techniques can predict pure and prepared piano sounds and how they can be fast and efficient to perform in real-time live electronic music. It also develops two plug-ins that apply the theory to a real musical context.
Answer, real-time MIDI AI re-elaboration
Tommaso Rosati
Indiana University Music and Arts Technology
Mentor: Dr. Timothy Hsu
Answer is a "max for live" device that listens to input MIDI, trains its model, and starts playing when the input stops, reworking what it has heard. A new MIDI input by the performer stops the device's output, which reverts the listening and training phase.
During the listening phase, the device divides the incoming input into three different Markov Chains: one for the single-note basses, one for the other single-notes, and one for the chords. This provides a maximum of 3 lines with different tempos, determining the complexity of the results. It also records durations and dynamics for each category and then uses all this information to re-elaborate the material in a new stream of notes. The device can be used with every kind of MIDI input: keyboard controllers, percussive controllers, and MIDI files.