Sonny Yan
What Do (AI) See
Project Type: Capstone Project (Individual)
Time: May, 2022
(Exhibited on Illumination: A Symphony of Emerging Media Experiences)
A collection of handwritten Chinese input alongside AI’s predictive output.
Abstract
Contrary to most language scripts, Chinese is a pictographic script whose structure and composition consist not of a simple linear arrangement of letters, but of interwoven combinations of strokes in different shapes. Therefore, there has been a phenomenon among current Chinese people called “character amnesia” (提笔忘字), which refers to the situation of forgetting how to write Chinese characters under the prevalence of the Smart Pinyin Input Method. To avoid the worsening situation of character amnesia, some people have started to use the handwriting input method instead of the Smart Pinyin Input Method to practice writing in their daily lives.
The handwriting input method refers to using a finger to handwrite Chinese on the phone, which will “translate” the handwritten scripts into standard typefaces to display on the screen. What Do (AI) See is a project based on a collection of people’s handwritten Chinese, the AI predictions of what the characters were intended to be, as well as the extraction of new appearances for existing Chinese characters based on the handwritten script. These new forms of written Chinese are derived from the moment of every recognition made by the handwriting input method during the writing process. For the AI to decipher the Chinese handwriting input method, they need to be trained by enormous sample datasets. However, this process is largely invisible to the user. This project shed light on the transformation of handwritten strokes into digital language. It asks what makes handwritten strokes a language and how is it that artificial intelligence sees and understands Chinese calligraphy.
People write characters by memory emphasizing the character's most important strokes. As people write on their phones, this digitized handwritten character is interpreted by the AI which presents us with its best guesses. By recording the whole process of these interactions, this project monitors AI’s guesses as individual strokes of a character are drawn one by one by the human. Despite the obvious differences in the actual stroke combinations, it detects stroke compositions that AI recognizes as specific Chinese characters. Upon the detection, the stroke composition, as well as the AI’s best guesses, are captured and collected. What emerges from programming and assembling these collections is a database of new appearances of existing Chinese characters.
AI learning for Chinese characters has become sophisticated. But as the widespread use of Chinese begins to incorporate AI learning, we can take a deeper look and gain more insight into the characters by observing the learning of AI. This project not only serves as a lens to peek into AI's mindset but also reveals the beauty of the strokes and structure of Chinese characters.
Recording Collection & Editing
As I was documenting people using the handwriting input method, I noticed that something very interesting was the real-time recognition that handwriting input methods provide for our handwritten input. The AI of the handwriting input method keeps making guesses as we add strokes to complete one character. And for the AI to decipher the Chinese handwriting input method, it needs to be trained by enormous sample datasets. However, this process is largely invisible to the user. So, I decided to put my focus on the transformation of handwritten strokes into digital language, which centered on the topic of what makes handwritten strokes a language and how is it that artificial intelligence sees and understands Chinese calligraphy.
While I was editing and analyzing these videos, I found that the guesses given by the AI appeared to be incorrect, but not unreasonable. Sometimes these guesses do seem similar to the incomplete characters that people write out, and sometimes they even have the same stroke order. These incorrect guesses made in the process are correct in the AI's eyes. So I decided to crop out each of the AI's recognitions as a new look for the existing characters. I started to work on the recorded videos, extracting each moment when the AI made the recognition and analyzing it as individual cases.
Coding
Python for Building the Database
To standardize the cropped images, I used Python to assemble the captured images into folders, using the AI-recognized character as the folder name, with each folder containing both the handwritten part of the image and the first guess of character that the AI made. Since the AI will generate the same character in different cases during the collection, I also used a json file to merge different characters which are recognized, such that one character will be written in many different forms. For the interaction part, these characters that have multiple forms will be randomly selected and presented.
Webpage Design
For the layout of the website, I decided to keep it as simple as possible. The goal of the website is to introduce the meaning of this project and also show the process of collecting the data as well as building the database. So I turned it into a blog-like interface where it contains only a few steps that I've done during the process, along with some short descriptions. At the end of the page, there is an input box where the audience can try to type in Chinese using the Smart Pinyin Input Method. What they'll receive is the new forms of characters that I've collected. If there is a character that has not yet been collected in the database, it will be shown in its original form.
The link to the website is here: https://sonnyyy77.github.io/WhatDoAISee/intro/
Reflection
Overall, it was successful for me because it meets my expectations of turning it into a character database that people can use to type out sentences in the new forms of characters. Throughout the process of completing this project, I also managed to observe more closely how the AI recognizes our handwritten characters, which I think is something that makes it different from other projects because others are either focusing on how people memorize the characters or how to train the AI of the handwriting input method to be more intelligent and accurate in recognizing handwritten input. Also, by turning it into a fully screen-based project, it allows other people to play with my project, especially under the current lockdown situation.
For the next step, the first thing I would do is to collect more handwriting data to enrich the database. Due to time constraints, there is still not sufficient data available, which may have an impact on the audience's experience of playing with the website. So I will continue to collect more data to improve the experience and hopefully, there will be no original forms appearing or reduce the possibility as much as possible when people type in the input box. I will also try to polish the description on the website to make it more clear and more straightforward for the audience to understand my project. If possible, I also want to implement more forms of interaction, so that it’s not just a screen-based project, but can also be turned into an interactive installation to have a better experience.
Physical Installation
This project has been exhibited in the exhibition Illumination: A Symphony of Emerging Media Experiences.
Reference
Music in the video: Awake by Sappheiros | https://soundcloud.com/sappheirosmusic
Music promoted on https://www.chosic.com/free-music/all/
Creative Commons Attribution 3.0 Unported (CC BY 3.0)
Special Thanks to...
Leon Eckert and Anna Greenspan, who guided me throughout this project and helped me finalize it🎉
My friends who participated in my data collection and offered suggestions as well as insights into my project🌹
My family who always support me and encourage me along the way💗
My precious NYU Shanghai IMA family for always providing encouragement to each other💜