In one aspect, images of a person's interactions with images presented on a display are captured. The person's interactions are segmented from the presented images in the captured images. A multimedia data object is generated. The multimedia data object includes a presentation media object containing digital representations of the presented images, an image presence media object containing the segmented interactions of the person, and at least one link synchronizing the presentation media object and the image presence media object.
|
1. A method, comprising:
capturing images of a person's interactions with images presented on a display;
segmenting the person's interactions from the presented images in the captured images; and
generating a multimedia data object comprising a presentation media object containing digital representations of the presented images, an image presence media object containing the segmented interactions of the person, and at least one link synchronizing the presentation media object and the image presence media object.
23. A system, comprising:
an image recording device operable to capture images of a person's interactions with images presented on a display; and
a processing system operable to
segment the person's interactions from the presented images in the captured images, and
generate a multimedia data object comprising a presentation media object containing digital representations of the presented images, an image presence media object containing the segmented interactions of the person, and at least one link synchronizing the presentation media object and the image presence media object.
15. A machine-readable medium storing machine-readable instructions for causing a machine to perform operations comprising:
capturing images of a person's interactions with images presented on a display;
segmenting the person's interactions from the presented images in the captured images; and
generating a multimedia data object comprising a presentation media object containing digital representations of the presented images, an image presence media object containing the segmented interactions of the person, and at least one link synchronizing the presentation media object and the image presence media object.
2. The method of
3. The method of
4. The method of
5. The method of
7. The method of
8. The method of
9. The method of
10. The method of
11. The method of
12. The method of
13. The method of
14. The method of
16. The machine-readable medium of
17. The machine-readable medium of
18. The machine-readable medium of
19. The machine-readable medium of
20. The machine-readable medium of
21. The machine-readable medium of
22. The machine-readable medium of
24. The system of
25. The system of
26. The system of
27. The system of
28. The system of
29. The system of
30. The system of
31. The system of
|
A person may interact with image-based media in a variety of different ways. For example, one common way for a person to convey information to others is by interacting with images (e.g., slides of a slideshow presentation) that are presented on a display. In some cases, the images may be presented by a computer-controlled projection system that includes a computer that generates image data and a projector that projects the image data onto a projection screen. A person may interact with the projected images by pointing to notable areas of the projected images with a finger, laser pointer, or some other pointing device or instrument.
During a presentation, a person's interactions with the presented images augments a presentation with additional information. For example, by providing additional context and meaning spoken words together with gestures pointing out particular areas of interest in the presented images expand the total information conveyed beyond the information that is summarized in the presented images. This additional information may be captured in video and audio recordings of the person's interactions with the presented images. These recordings, however, have several drawbacks. For example, the resolution of the video recording may be insufficient to allow users to discern features in the projected images. In addition, portions of various ones of the presented images may be obscured by the person's body during his or her interactions with the images.
Rather than passively record a person's interactions with image-based media, other systems have been designed to interpret a person's interactions with images that are presented on a display. For example, some presentation systems include a camera that captures images of a person's interactions with the images that are presented on a display. The presentation system processes the captured images to determine the location of a particular object (e.g., a finger, a hand, or a pointing device) or to determine the intended meanings associated with particular movements, gestures, or configurations of the person's body. The results of these interpretations may be used to control the presentation on the display.
In one aspect, the invention features a method in accordance with which images of a person's interactions with images presented on a display are captured. The person's interactions are segmented from the presented images in the captured images. A multimedia data object is generated. The multimedia data object includes a presentation media object containing digital representations of the presented images, an image presence media object containing the segmented interactions of the person, and at least one link synchronizing the presentation media object and the image presence media object.
The invention also features a machine-readable medium storing machine-readable instructions for causing a machine to implement the above-described method and a system for implementing the above-described method.
Other features and advantages of the invention will become apparent from the following description, including the drawings and the claims.
In the following description, like reference numbers are used to identify like elements. Furthermore, the drawings are intended to illustrate major features of exemplary embodiments in a diagrammatic manner. The drawings are not intended to depict every feature of actual embodiments nor relative dimensions of the depicted elements, and are not drawn to scale.
The embodiments that are described in detail below enable a person's interactions with image-based media to be captured and his or her presence with respect to the image-based media to be flexibly encapsulated in a multimedia data object that may be exploited in a wide variety of different application environments to enhance a user's experience with the image-based media. Some embodiments encapsulate a person's interactions that augment image-based media with visual and verbal annotations in a multimedia data object that preserves the person's interactions with the image-based media without losing any information contained in the image-based media.
Other embodiments allow a user to experience the encapsulated presence of the person in a remote setting, thereby enabling an enhanced joint interaction and collaboration between the person and the user with the image-based media. For example, in some implementations, a multimedia data object encapsulating the presence of a local person is generated quickly enough that a remote user watching a presentation of the multimedia object can send feedback to the local person in real time. This feedback can either be in the form of changes to the digital media object that reflect the interactions of the remote user with an input device (e.g., a computer) or in the form of an object encapsulating the presence of the remote user that is combined with the original multimedia object into a joint multimedia object that may be presented to the local person.
The display 12 may be any type of display that is capable of presenting images, including a light-emitting display and a light-reflecting display. Among the exemplary types of light-emitting displays are LED-based display screens and plasma display screens. Among the exemplary types of light-reflecting displays are projection screens, which are designed to reflect light that is projected from one or more light projectors. In the illustrated embodiment, the display 12 is a remote-controlled light-emitting display. In other embodiments, the display may be a projection screen that is arranged to receive images that are projected from one or more remote-controlled light projectors.
The image recording device 14 may be any type of imaging device, including a computer-controllable digital camera and a video camera. USB video cameras or “webcams” generally capture images 30 fps (frames per second) at 320×240 resolution, while Firewire video cameras can capture at higher frame rates and/or resolutions. The image recording device 14 typically remains fixed in place and is oriented toward the display 12. Some embodiments may include more than one image recording device.
The audio recording device 16 may be any type of device that is capable of recording sounds that are produced in the vicinity of the display 12, including wired and wireless microphones. The audio recording device 16 may be fixed in position or it may be attached to the person 20. Some embodiments may include more than one audio recording device.
The data processing and control unit 18 may be implemented by any type of processing system that is capable of (1) choreographing the presentation of images on the display 12 with the capture of images by the image recording device 14, (2) receiving image and audio data from the image recording device 14 and the audio recording device 16, and (3) generating from the received data a multimedia data object that encapsulates the person's interactions with images presented on the display 12 together with the one or more presented images in a multimedia data object, as described in detail below. In some embodiments, the data processing and control unit 18 is implemented by a computer (e.g., a workstation computer, a desktop computer, or a laptop computer).
Referring to
If it already has been calibrated (block 54), the data processing and control unit 18 proceeds by executing the image capture process in block 58 of
The calibration module 50 generates a correspondence mapping F(x,y)=(u,v), which maps the calibration object locations (x,y) in either the plane of the display 12 (when the display is a light-emitting display) or the projection plane of a projector (when the display is a light-reflecting display) to corresponding calibration object locations (u,v) of the images captured at the capture plane of the image recording device 14. The locations of the calibration object in the display plane or the projection plane are determined from the image data that are used to generate the presented images. The locations of the calibration object in the captured images are determined by searching for the calibration object in a predetermined area of the captured images corresponding to the display 12. The predetermined display area may be determined in a variety of different ways. In one approach, a known test pattern is presented on the display, a perspective transform that is computed based on the known test pattern, and a correspondence mapping is derived from the computed perspective transform. The predetermined display area is determined from the correspondence mapping.
After the system 10 has been calibrated, the data processing and control unit 18 captures images of the person's interactions with images that are presented on the display 12 (block 58). In this process, the image presentation module 40 transmits image data and control data for presentation on the display and the image capture module 42 transmits to the image recording device 14 control data that causes the image recording device 14 to begin recording images of a scene that encompasses the display 12. The image recording device 14 transmits the recorded images to the image capture module 42, which stores the received image recordings in the data store 52. During this time, the audio recording device 16 transmits recordings of the sounds captured in the vicinity of the display 12 to the audio capture module 44, which stores the received audio recordings in the data store 52.
Referring to
In general, the image segmentation module 46 determines the parts of the display in the captured images that are occluded (e.g., by the person or an object carried by the person). In one embodiment, the image segmentation module 46 compares ones of the captured images with corresponding ones of the warped images (block 72). In this process, the image segmentation module 46 compares coordinate regions of one or more pixels in the warped images 68 in the capture plane 70 to corresponding coordinate regions in the predetermined display area 74 in the corresponding captured images 76. The image segmentation module 46 identifies the person's interactions in the captured images 76 as portions of the captured images 76 that are different from comparable portions of corresponding ones of the warped images (block 78). In some implementations, the image segmentation module 46 computes the magnitude of the difference δ(ui,vi) between corresponding intensity values in the warped images 68 and the captured images 76. That is,
δ(ui,vi)=∥warped_image(ui,vi)−captured_image(ui,vi)∥ (1)
The coordinate regions in the captured images 76 that are associated with difference values that exceed a threshold are identified as part of the person's interactions. The threshold typically is a constant and may be determined based on the lighting conditions, the nature of the presented images, and the parameters of the image recording device 14.
As shown in
Referring to
The multimedia data object 84 additionally includes at least one link for synchronizing the presentation media object 88, the image presence media object 90, and (if present) the audio presence media object 92. The synchronization links choreograph the rendering of these objects with respect to a common time reference. In some implementations, the time reference may be provided by synchronized timestamps that are associated with the constituent elements of the objects 88, 90, 92.
In some embodiments, the multimedia data object 84 may be a SMIL 2.0 (Synchronized Multimedia Integration Language) file that contains a link to the presentation media object 88, the image-presence media object 90, and the audio presence media object 92, as well as data specifying the rendering parameters for these objects 88, 90, 92 and indications of relative rendering locations and times of these objects 88, 90, 92. In other embodiments, the multimedia data object 84 may be in a streaming file format that includes multiple synchronized, overlayed replayable, bitstreams representing the real-time presentation of the images in the display 12, the segmented person's interactions, and the audio recordings captured by the audio recording device 16.
As explained above, the multimedia data object 84 may be used in a wide variety of different application environments to enhance a user's experience with the image-based media. For example, by embedding the local user's interaction within the multimedia data object 84 in real-time, some implementations allow a remote user to interact directly with a local user either by 1) directly interacting with digital media via a mouse and keyboard or 2) using the same technology to embed himself/herself in the same multimedia data object 84 and interact with the digital version of the local user.
In the first scenario, the remote and local users may be a teacher and a student, or a computer support technician and a customer, respectively. In the first case, the teacher may appear live overlaid on top of a computerized lesson plan in the presentation to the student. In this presentation, the teacher may lead the student through the lesson by indicating which buttons to press or indicating the correct answer if the student chose the wrong answer via his/her mouse. In the second case, the technical support person may appear live overlaid on top of customer's desktop in the presentation to the customer. In this presentation, the technical support person may, for example, lead the customer person to start up their web browser, download a driver, and install it correctly.
In the second scenario, an implementation of the image-based media interaction capture system 10 may be presented in each of the local and remote locations. In one exemplary application, a presenter in the local location may give a presentation to one or more persons in the remote location. A person in the remote location who wanted to question the presenter, for example, may stand up in front of the display in the remote location, ask his/her question and supplement his/her question with interactions with the current presentation. The remote image-based media interaction capture system encapsulates the questioner's presence in a multimedia data object and sends the multimedia data object for presentation to the presenter in the local location. The local image-based media interaction capture system may encapsulate the presenter's response to the remote person's question in a multimedia data object and send the multimedia data object for presentation at the remote location, whereby the questioner would receive the answer to his/her question.
The image-based media interaction capture system 100 captures the person's interactions with the projected images 101 and encapsulates these interactions along with the presented images 101 in a multimedia data object 104. In this process, the image segmentation module 46 segments the person's interactions from the images captured by the image recording device 14. The multimedia object generation module 48 incorporates into the multimedia data object 104 a presentation media object 106 that contains digital representations of the presented images, an image presence media object 108 that contains the segmented interactions with the presented images 101, and an audio presence media object 110 that contains a digital representation of the audio recordings captured by the audio recording device 16.
The resulting multimedia data object 104 may be rendered in a wide variety of different local and remote settings. In some implementations, the presentation media object 106 preserves the original resolution of the presented images. In these implementations, the multimedia data object 104 may be rendered to provide a high-definition presentation of the images along with the person's physical and verbal commentary relating to the images. In addition, the format of multimedia data object 104 allows a user to easily browse its contents while preserving the context and meanings associated with the person's interactions with the presented images 101.
In this embodiment, the display 12 is a light-emitting screen that is controlled by the data processing and control unit 18. The multimedia object presentation system 122 is a desktop computer system that includes a monitor 126, a pair of speakers 128, and a microphone 130. The graphical user interface 124 is a windows-based graphical user interface that includes multiple windows 132, icons 134, and a pointer 136. The images 121 of the graphical user interface 124 and the multimedia data objects that are generated by the data processing and control unit 18 are transmitted between the data processing and control unit 18 and the customers multimedia data object presentation system 122 over a global communication network 138 (e.g., the internet).
In operation, the person 20 (e.g., a customer support person) interacts with images 121 of the remote customer's graphical user interface 124 that are presented on the display 12. For example, the person 20 may guide the remote customer through a series of steps for reconfiguring the remote customer's system 122. Because the person 20 can see the images 121 of the graphical user interface 124 on the display 12, the person is able to interact seamlessly with the presented images 121. For example, the person 20 can accurately describe and point to locations of interest synchronously with the different images 121 of the graphical user interface 124 that are presented on the display 12.
The image-based media interaction capture system 120 captures the person's interactions with the projected images 121 and encapsulates these interactions along with the presented images in a multimedia data object 104. In this process, the image segmentation module 46 segments the person's interactions from the images that are captured by the image recording device 14. The multimedia object generation module 48 incorporates into the multimedia data object 104 a presentation media object 106 that contains digital representations of the presented images 121, an image presence media object 108 that contains the segmented interactions with the presented images 121, and an audio presence media object 110 that contains a digital representation of the audio recordings captured by the audio recording device 16.
The resulting multimedia data object is transmitted to the remote customer's system 122 where one or more components of the multimedia data object are rendered. In this regard, the remote customer's system 122 superimposes the segmented images of the person's interactions onto the graphical user interface 124 that is presented on the monitor 126. The remote customer's system 122 also synchronously renders the sounds (e.g., “Click Here”) that are encapsulated in the audio presence media object through the speakers 128. The remote customer's interactions with the graphical user interface 124 are encapsulated in multimedia data objects that are generated by a version of the data processing and control unit 18 that is executing on the remote customer's system 122. Any voice recordings that are captured by the microphone 130 also may be encapsulated in these multimedia data objects. The multimedia data objects that are generated by the remote customer's system are transmitted to the image-based media interaction capture system 120, where changes to the graphical user interface are reflected in the images presented on the display 12. Audio recordings that are encapsulated in the multimedia data objects that are generated by the remote customer's system may be rendered through a speaker 139 that is located near the person 20.
The embodiment of
In operation, the persons 20, 144 interact with the images 141, 143 that are presented on the displays 12, 146. Because the persons 20, 144 can see the images presented on the displays 12, 146, the persons 20, 144 are able to interact seamlessly with the presented images.
The image-based media interaction capture systems 140, 142 capture the persons' interactions with the presented images 141, 143 and encapsulate these interactions along with the presented images 141, 143 in respective sets of multimedia data objects. In this process, the image segmentation modules 46 segment the persons' interactions from the images captured by the image recording devices 14, 152. The multimedia object generation modules 48 incorporate the following into the multimedia data objects: the presentation media objects that contain digital representations of the presented images 141, 143; the image presence media objects that contain the segmented persons' interactions; and the audio presence media objects that contain digital representations of the audio recordings that are captured by the audio recording devices 16, 154.
The image-based media interaction capture systems 140, 142 exchange and render the resulting multimedia data objects 104. In this regard, the segmented images of the persons' interactions are superimposed on the common set of images that are presented synchronously on the displays 12, 146. The image-based media interaction capture systems 140, 142 also synchronously render the sounds that are encapsulated in the audio presence media objects contained in the multimedia data objects through the speakers 156, 158. The image projection module 40 executed by the data processing and control units 18, 150 may include heuristics for rendering the segmented interactions of the persons 20, 144 in areas of the displays 20, 146 that overlap with areas that are obscured by the physical presences of the persons 20, 144.
After the persons' interactions have been captured and segmented, the multimedia data objects that are generated by the image-based media interaction capture systems 140, 142 can be merged into a single multimedia data object that contains the common set of presented images, the segmented interactions of both persons 20, 144, sound recordings, and at least one synchronization link. The embodiment of
Other embodiments are within the scope of the claims.
Patent | Priority | Assignee | Title |
11153472, | Oct 17 2005 | Cutting Edge Vision, LLC | Automatic upload of pictures from a camera |
11818458, | Oct 17 2005 | Cutting Edge Vision, LLC | Camera touchpad |
9569652, | Nov 09 2010 | Metrologic Instruments, Inc. | Code symbol reading system |
Patent | Priority | Assignee | Title |
4715818, | Dec 04 1984 | Ronald Lewis, Shapiro | Computer training system |
4785472, | May 11 1987 | The Trustees of the Stevens Institute of Technology | Remote teaching system |
5270820, | Jun 25 1992 | IMATTE, INC | Method and apparatus for tracking a pointing device in a video field |
5528263, | Jun 15 1994 | Daniel M., Platzker | Interactive projected video image display system |
5686957, | Jul 27 1994 | International Business Machines Corporation | Teleconferencing imaging system with automatic camera steering |
5833468, | Jan 24 1996 | HANGER SOLUTIONS, LLC | Remote learning system using a television signal and a network connection |
6361173, | Feb 16 2001 | Imatte, Inc. | Method and apparatus for inhibiting projection of selected areas of a projected image |
6512507, | Mar 31 1998 | Seiko Epson Corporation | Pointing position detection device, presentation system, and method, and computer-readable medium |
6542087, | Jan 31 2001 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | System and method for extracting a point of interest of an object in front of a computer controllable display captured by an imaging device |
6595781, | Jun 20 2001 | Aspen Research | Method and apparatus for the production and integrated delivery of educational content in digital form |
6775518, | Jan 25 2002 | CANTONE OFFICE CENTER, LLC; CANTONE OFFICE CENTER, LLC, AS AGENT FOR THE SENIOR SECURED DEBENTURE HOLDERS; MAXIMUS MULTIMEDIA INTERNATIONAL, LLC | Interactive education system |
6789903, | Feb 18 2003 | Imatte, Inc. | Generating an inhibit signal by pattern displacement |
6829394, | Feb 22 2000 | Seiko Epson Corporation | System and method of pointed position detection, presentation system, and program |
7660510, | Dec 28 2000 | Sony Corporation | Device for creating content from multiple video and/or audio materials and method therefor |
20020136455, | |||
20030122954, | |||
20040205478, | |||
20050289453, | |||
20060245600, | |||
20110300527, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 06 2005 | LIN, I-JONG | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016457 | /0823 | |
Apr 07 2005 | Hewlett-Packard Development Company, L.P. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jul 22 2016 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
May 15 2020 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Oct 14 2024 | REM: Maintenance Fee Reminder Mailed. |
Date | Maintenance Schedule |
Feb 26 2016 | 4 years fee payment window open |
Aug 26 2016 | 6 months grace period start (w surcharge) |
Feb 26 2017 | patent expiry (for year 4) |
Feb 26 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 26 2020 | 8 years fee payment window open |
Aug 26 2020 | 6 months grace period start (w surcharge) |
Feb 26 2021 | patent expiry (for year 8) |
Feb 26 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 26 2024 | 12 years fee payment window open |
Aug 26 2024 | 6 months grace period start (w surcharge) |
Feb 26 2025 | patent expiry (for year 12) |
Feb 26 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |