Text-to-image generation in AI is a task where a machine learning model is trained to generate an image based on a given text description. The text description can include information about the object or scene in the image, such as its size, shape, color, and location. The goal of text-to-image generation is to generate an image that is visually similar to what the text describes. This task is challenging because it requires the model to understand the meaning of the text and be able to generate an image that corresponds to it. - ChatGPT
To study whether these images had any relationship to each other or were solely decided by the nature of the data on which it was trained (training data) and to find whether images generated on different apps ( and hence data sets) had different images generatged, I set up this small experiment. These are 27 images generated by AI on the seed words temple on a mountain using AI app Dream in my attempt to understand AI and how it performs. As we see the aim of this exercise is to generate an image similar to the description given by the text.
This text-to-image generation is by diffusion method. The potential in limitless, but we cannot predict what will be the output.
I wanted to check whether the images generated on the same prompt words Temple on a mountain with other AI apps, would be different. The analogy in this case is of a craftsman chipping on blocks of marble got from different geographical locations to arrive at different output 3D carvings.
Learning 2 - the words generate different calculations on the image bank on which the basic images data set works (the craftsman uses different tool set on the marble block)
Learning 3 - the different themes available which in turn generates the different images (each craftsman uses the tools in his own style)
If I were to use the same prompt words on a different app which has been fed on different data set, the output would also differ.
The data set is like a black box, we do not know what comes out of it. i.e. if we ask it to do repeat generating on the same prompt words and the same style, the images are different. There is no firm clarity on the image output, it is totally dependent on the data training set on which the app has been trained.
To give a detailed view, I have also generated images from the text prompt Temple on a mountain in another AI image generation app Dalle 2e. The images were generated using the same Internet network, on the same phone, during a time gap of 3 hours from 8-11 am (same internet traffic). It was a controlled experiment, only the apps were changed, all other conditions remained the same.
From ChatGPT -
Text-to-image generation can be useful in a variety of ways. Some examples include:
- Image creation: Text-to-image generation can be used to generate images from text descriptions, which can be useful for creating images for websites, advertisements, and other online content.
Data augmentation: Text-to-image generation can be used to augment image datasets for training computer vision models, providing more diverse and realistic images for model to learn from.AI images created with Dalle 2e editor - Text-based search: Text-to-image generation can be used to improve text-based image search, by generating images from text descriptions that can be used to visually match images to the text descriptions.
- Artistic expression: Text-to-image generation can be used to create new forms of digital art by allowing artists to create images from text-based descriptions.
- Assistive technology: Text-to-image generation can be used to assist visually impaired individuals by generating images from text-based descriptions of objects or scenes, allowing them to better understand their surroundings.
- Game development: Text-to-image generation can be used in game development to generate new scenes, characters, and environments based on text descriptions.
- Robotics: Text-to-image generation can be used in robotics to generate images that can be used to train robots to recognize objects and navigate environments.
--------------------
In the short experiment that I did with many instances of image generation on two different AI image generation apps, using many different scenarios, the same prompt words, there is not much that one can conclude, except that the variety is infinite and no two images are the same. I do not see any trend in the output image generation, the range and intensity of colours used etc.
In short, it is really marvelous to see the variety of images created by the different apps on the same prompt words TEMPLE ON THE MOUNTAIN.
The only way we can explain, keeping aside the mathematical way images are generated, is to tell volumes of the black box of data sets that generate the images.
Other than for artistic expression and games, can we use such black box techniques in real life in situations that affect the daily life of mankind as a whole ?
According to me, Machine Learning is not an original and exact brance of science or mathematics or even programming neural networks, but a fair statistical (more often inacurate) approximation of the original cognitive thinking ability of humans based on past data. It is not an exact science, but an applied science.
George.
No comments:
Post a Comment