# AR : Placing 3D Object in 2D picture using picked points

## Recommended Posts

Posted (edited)

Hi everybody!

I'm working on an Augmented Reality problem that involves math too complex for me.
To be honest, I'm not sure there's a solution to this problem.
Any help is of course welcome (even if this help consists in telling me "Impossible, you cannot go from an N dimension to an N+1 dimension).

I would like to place a 3D object on a photo uploaded by a user.

I have the intuition that by asking the user to place the origin and the axes of the coordinate data system, specifying for each of the axes a "real" length from a known reference frame, it could be possible to determine the sequence of transformations that led to such a projection of the 3D coordinate system.
And deduce the position of the corresponding camera etc... to get the correct parameters to simulate the presence of a 3D object on a picture.

This could of course only work if the user is able to draw the 3 axes and define their real length on the picture.

Here is a nice drawing of the goal I would like to achieve :

The user has entered the 3 axis on the projection picture, and thanks to known distance, has dimensioned them.

The magic of math goes through there...

We are able to draw a 3D object in this scene : here is a perfectly drawn unit cube on the XOZ plane😁

Here is a PG that reproduces the first part (the data entered by the user are directly entered in the code😞
https://playground.babylonjs.com/#2XZ6M5
I don't own that background, I just needed an image to show the problem.

PS I know that the field of view of the camera is something to be taken in account as well, but I think a simple slider may allow user to change it to match picture's fov.

Edited by Amarth2Estel

##### Share on other sites

Hiya A2E.  I don't have any fresh knowledge, but isn't this the idea behind AR (augmented reality)?  There has been some talk on the forum about that... were you able to find it?

Yes, AR uses live cam video (not stills) with inserted 3D... but somewhat related, I would think.  Not sure.

Anyway, I saw no reference to "AR" in your message, so I wanted to make sure that you knew the term, so you could do some forum/web searches for more info.

Cool project!  Challenging project, too.  I hope you get more responses, and/or make useful discoveries.

I was once a primary candidate for a gov job that did these things... and aspect ratio was very important.  In this job, they were building walls along busy highways that ran thru urban neighborhoods (noise reduction)... and they wanted to show neighborhoods WHAT the wall WOULD look like, from their side, after completion (a NIMBY consideration, Not In My Back Yard).

A still-pending-at-the-time marijuana possession case... took me out-of the running.

It would be handy if cameras had (laser) range-finders (distance measuring equipment) on-board... and the cameras could/would package that data WITH the picture.  (picture files that also contain spatialization data).  Hopefully, that will become commonplace in the future.  Oil-drilling speculators "thump" the ground and then record reflected sound... to produce a 3D spatial map of the ground below.  I hope the future cameras won't need to "thump" things... to gather their spatialization data.  Picture-taking would become a much louder activity, and likely require some large amplifiers or percussive explosives.

##### Share on other sites

Hi Wingnut !

Yes, this is an augmented reality project and you are completly right, I should have mention it. I will edit my post.
But, from what I saw, it seems that most of AR apps uses a video stream and a marker with a specific pattern. Using pattern recognition, the marker can be found.. and the magic of math appends !
I won't use a specific marker because I cannot ask users to print a carpet-wide pattern and because I only want to use static images. However, I am well aware of the relation between these two approaches.

I found an algorithm named "POSIT" which seems to fit my needs and data. It seems compute the pose of the camera from points taken in 2D.. It exists for non coplanar points (the case of my problem) and coplanar points (markers with specific patterns).
Video
Source code
Maths and algorithm alternatives
I will go on eating math all day until I found an answer ! I will post here!

I completely agree with you about spatialization data. I hope it will become commonplace as soon as possible !

##### Share on other sites

You're so kind and gracious A2E... it's a pleasure to converse with you.  Highly intelligent, too.  Keep hanging around with us and don't ever leave, ok?  thx.

I'm not sure what your end-goal is, but, is there any chance for... "Hey user, keep adjusting these sliders until this added 3D 1x1x1 meter box... looks good in your picture."

After they do this, and press the OK button, that would store those settings... possibly usable for placement of additional 3D objects.

But still, those values are only for the INITIAL fov/scale/etc... of future-added 3D objects.  Perhaps you can provide dragging/sliders after placing each ADDITIONAL 3D object, too... and store THOSE with each added object, as well.

There's one sure way to ensure all the 3D objects... look good/right to the user.  Let THEM make it look good/right.  *shrug*

Is that possible in your project/hopes?  I suppose you could store a "display info" object in each mesh.metadata property.... stored/refreshed each time the user pressed the "done editing" button.  I'm not sure if/when necessary.  For example, storing a camera.fov setting ON an object... is quite useless and unnecessary, but not a .requiredFOV.

It starts to venture into smart 3D objects... which KNOW what picture URL they are being used-with, and carry a database of URL-specific/keyed settings... for each picture URL they ever get used-with.  The 3D objects know how to set the correct look per any picture that the user store on that object.

For example, Wingnut's 3D vase.  In vase.metadata object, there might be 3 data records... one record for when the vase is being displayed in Wingy's bathroom picture, another record for when it is being displayed in Wingy's living room picture, and a third record to "composite" it into my bedroom picture.  (sorry for the over-expounding).  Just thinkin', and likely not very well.

I imagine lights are a similar issue, even though not rendered.  .emmissive might be able to handle the object lighting, via self-illumination.

I went "out there" looking at room pictures, and thought about the issues.  During my tour, "let the user adjust the 3D object"... kept being whispered in my ear.

I think my dog was doing the whispering, cuz I felt a little tongue here and there, too.

##### Share on other sites

You don't need more info than the corners of the room. this is more than enough info to calculate an accurate tracking of your environment.

DB

##### Share on other sites

Had  play around and produced this result using a method that may be developable but at the moment is a bit fiddly and not very user friendly.

Here is the playground I used https://playground.babylonjs.com/#2XZ6M5#1

Method

Create two planesl perpendicular to each other for walls.

Use an arcRotateCamera, with initial target (0, 0, 0) and radius 100, alpha -Math.PI/2 and beta Math.PI/2.

Rotate camera until bottom of both planes are near parallel to the wall in the image.

Using the results in the console read the alpha value and substitute in the alpha value in the arcRotateCamera constructions.

Play around with the camera target value for y and the wall position for y and the x value for the wall position until the join of the two planes lies on the image as close to correct as possible.

create box that has the transformNode as a parent using these value to place the box in the corner of the two planes at floor level.

``````box.position.x = width /2 - size / 2;
box.position.y = -height / 2 + size / 2;
box.position.z = -size / 2;``````

Subtract and z values to move box into "room".

Whether GUI sliders would make the operation easier is yet to be determined.