Michelle Dong, Caden Cheng, Julia Jin, David Wan
inVISION is a lightweight 3D visualizer that helps you envision what an object portrayed in 2D is like in 3D. The operation works in the following manner: first, inVISION takes in a photo input on its website. Then, the program renders the object into 3D in our modeling software. After that, the user is able to use hand gestures to rotate and change the scale of the object and truly visualize it. Our vision for the project is for it to be able to take in a photo, drawing, or CAD file and envision those ideas into reality through AR or hologram technology, the projection of which is able to be manipulated around or edited, allowing it to be used for a variety of different purposes—art, engineering, interior design, film and entertainment, etc.
We originally intended for our project to serve as a tool for artists: inVISION would serve as a portable 3D reference tool. By taking a photo of a person, object, or etc., our model would inVISION the scene in 3D so artists can practice drawing different perspectives using an easily portable reference, rather than having to attempt to recapture the same scene over and over again. For people with aphantasia—the inability to visualize images mentally—inVISION proves useful in helping them map out their ideas in a more concrete manner.
inVISION serves as a way to visualize anything that you want: for example, when you’re decorating your house (interior design), objects can be difficult to move around for a number of reasons. In this case, inVISION would prove useful as you could envision what each room of your house would look like with the furniture in the location that you imagined without having to move the objects around constantly. For fashion too, whether you just like to dress up or you’re a designer trying to come up with a new design, inVISION can help you conceptualize how different pieces of clothing look together, or decide whether your idea for upcycling an article of clothing would work out (without the permanency and irreversibility of altering it before you’re satisfied with the design being an issue, as the alteration(s) is simulated).
Imagine projecting a hologram of the design you envisioned or using AR to be able to see it in reality—inVISION could also help direct construction, such as the building of a chair, by providing guidance and displaying where a part should be placed or attached with audio instructions included too. We envision inVISION being able to serve as a 3D building tutorial with audio.
In the future, we were thinking of adding voice recognition to our model. That way, for people who may not be able to use their hands or arms (or simply just for convenience), users would be able to tell the program to “Rotate the object left… A little more.” We would also incorporate lip reading technology as an accessibility feature for those who both can’t talk or use their hands and arms.
We also wanted to incorporate inVISION into AR/VR technology, and also add physics and time simulations into our model. By doing so, we would be able to test what normally is expensive in a cost-efficient way (e.g. sending a rocket into space), imagine what a vision of creation looks like in reality, or visualize how a scene looks as a whole in VR to-scale. From an engineering perspective. inVISION could help map out city plans and facilitate building construction. It would create models based on blueprints so that the idea can really be envisioned in 3D, allowing the user to easily see the project from every angle and make changes to it—like a better, 3D CAD design tool and visualizer. Another idea we had was to integrate inVISION in a navigation system so that instead of having to look to the screen on your right in your car to navigate, by wearing AR glasses while you drive, inVISION visualize arrows on the road pointing and verbally announcing for you to turn when you approach the target intersection or etc.
Furthermore, mind-reading or thought detection technology continues to evolve in current times; we also wanted to incorporate inVISION into such technology in the future so that a user can visualize what they’re envisioning in their minds with a single thought. This would not only prove useful to those with aphantasia, but also helpful to make the interface even more user-friendly and convenient.
Currently, the only well-known existing 3D visualizer programs are Blender and CAD modeling software like Onshape. inVISION is not merely a copy of such programs; it seeks to combine traditional CAD engineering with art visualization, and a physics and time experimental simulator in an efficient, non-costly operation that is much easier to learn and understand than Blender and Onshape are (as it is navigated using hand gestures). Furthermore, given ample time and resources to fully conceptualize our vision for this project, inVISION is applicable and useful across many different fields, whereas Blender is mainly for art and CAD is targeted towards engineering design. Our program also is efficient, generating 3D models in a short processing time and works on our Mac CPUs, as compared to Blender which requires high processing power to run or other interactive hologram technology which, at the present, is very costly.
Our system is made up of 3 key components: AI generation of 3d meshes, a custom 3d renderer, and custom hand controls. Our project pipeline combines all 3 components by taking a single photo of an object, generating a 3D mesh of the object from it, and then displaying it in an extremely lightweight 3D renderer that the user can interact with through gesture controls.
For our project, we wanted to highlight the near endless possibilities for entertainment and curiosity that our project could provide by allowing a user to visualize almost anything they want in three dimensions. In order to achieve this goal, we looked into using AI to generate 3D meshes from image data. Online, we found a few models that were able to generate rough 3D meshes from 2D images. Unfortunately, the AI generated 3D meshes tended to be inconsistent, which necessitated postprocessing to make the 3D meshes look accurate. During the hackathon, we looked into solving this problem by designing a new pipeline that utilizes web scraping to find multiple perspectives of the given object, and then utilizes the existence of different perspectives to make the 3D mesh as accurate as possible. We believe that web scraping is better than other AI centered avenues for this task because web scraping harnesses the already massive scale of the internet. If we were to utilize AI, then we would have to essentially create a new dataset of every object that ever exists. Web scraping circumvents this problem, making our overall architecture more cost efficient. In the future, we want to continue to look into this potential solution to AI based mesh generation to allow our users to bring whatever vision they may have into reality.
Our Custom 3D Renderer with Hand Controls is made entirely in Python. The 3D renderer is made with Pygame, and the hand controls are made with a combination of OpenCV, Mediapipe, and multitude of matrix algebra calculations done with Numpy.
We built our first version of the custom 3D renderer by simply running hand keypoint detection with Mediapipe and OpenCV. We added these baseline controls to a simple wireframe 3D visualizer we made with Pygame, one that could render vertices and vectors.
To further develop our idea, we repurposed the wireframe visualizer into a visualizer that rendered the edges of triangles (3 point polygons) instead of plain lines (2 point polygons), allowing us to render basic .obj files in greater depth than before.
For the true hand controls, we tested many control schemes and eventually settled on a control scheme that felt intuitive. Scale was controlled by the distance between two fingers and rotation was controlled by the overall orientation of the hand. The user can imagine their palm as a camera, where the roll, pitch, and yaw (the orientation of the hand) determines the angle that the object is viewed from. Lastly, we added hand gestures to swap between modes as well as controls to pause and reset the render. By making a highly responsive and intuitive control scheme, we hope that almost anyone is able to utilize the system we built as both a tool and a toy.