The disclosed technology is a system and computer-implemented method for assembling and editing a video program from spoken words or soundbites. The disclosed technology imports source audio/video clips and any of multiple formats. Spoken audio is transcribed into searchable text. The text transcript is synchronized to the video track by timecode markers. Each spoken word corresponds to a timecode marker, which in turn corresponds to a video frame or frames. Using word processing operations and text editing functions, a user selects video segments by selecting corresponding transcribed text segments. By selecting text and arranging that text, a corresponding video program is assembled. The selected video segments are assembled on a timeline display in any chosen order by the user. The sequence of video segments may be reordered and edited, as desired, to produce a finished video program for export.
|
1. A computer-implemented method comprising:
generating a video assembly interface comprising a video playback window associated with a digital video, a transcribed text window displaying a transcription of an audio track of the digital video, and a video timeline;
receiving an indication of a selection that copies one or more words from the transcription of the audio track in the transcribed text window;
generating, within the video timeline and in response to detecting a user interaction that drags and drops the one or more words copied from the transcription from the transcribed text window to the video timeline, a display of the one or more words copied from the transcription and a first video clip corresponding to the one or more words; and
in response to a detected selection of an additional digital video:
adding the additional digital video to the video playback window and a transcription of an audio track of the additional digital video to the transcribed text window within the video assembly interface; and
maintaining the first video clip within the video timeline of the video assembly interface.
13. A non-transitory computer-readable medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to perform acts comprising:
generating a video assembly interface comprising a video playback window associated with a digital video, a transcribed text window displayed a transcription of an audio track of the digital video, and a video timeline;
receiving an indication of a selection that copies one or more words from the transcription of the audio track in the transcribed text window;
generating, within the video timeline and in response to detecting a user interaction that drags and drops the one or more words copied from the transcription from the transcribed text window to the video timeline, a display of the one or more words copied from the transcription and a first video clip corresponding to the one or more words; and
in response to a detected selection of an additional digital video:
adding the additional digital video to the video playback window and a transcription of an audio track of the additional digital video to the transcribed text window within the video assembly interface; and
maintaining the first video clip within the video timeline of the video assembly interface.
7. A system comprising:
at least one physical processor; and
physical memory comprising computer-executable instructions that, when executed by the at least one physical processor, cause the at least one physical processor to perform acts comprising:
generating a video assembly interface comprising a video playback window associated with a digital video, a transcribed text window displayed a transcription of an audio track of the digital video, and a video timeline;
receiving an indication of a selection that copies one or more words from the transcription of the audio track in the transcribed text window;
generating, within the video timeline and in response to detecting a user interaction that drags and drops the one or more words copied from the transcription from the transcribed text window to the video timeline, a display of the one or more words copied from the transcription and a first video clip corresponding to the one or more words; and
in response to a detected selection of an additional digital video:
adding the additional digital video to the video playback window and a transcription of an audio track of the additional digital video to the transcribed text window within the video assembly interface; and
maintaining the first video clip within the video timeline of the video assembly interface.
2. The computer-implemented method as recited in
3. The computer-implemented method as recited in
4. The computer-implemented method as recited in
receiving an indication of a selection of additional words from the transcription of the audio track of the digital video in the transcribed text window; and
generating, within the video timeline, a display of the selected additional words and a second video clip from the digital video corresponding to the selected additional words.
5. The computer-implemented method as recited in
6. The computer-implemented method as recited in
removing the digital video from the video playback window within the video assembly interface; and
removing the transcription of the audio track of the digital video from the transcribed text window within the video assembly interface.
8. The system as recited in
9. The system as recited in
10. The system as recited in
receiving an indication of a selection of additional words from the transcription of the audio track of the digital video in the transcribed text window; and
generating, within the video timeline, a display of the selected additional words and a second video clip from the digital video corresponding to the selected additional words.
11. The system as recited in
12. The system as recited in
removing the digital video from the video playback window within the video assembly interface; and
removing the transcription of the audio track of the digital video from the transcribed text window within the video assembly interface.
14. The non-transitory computer-readable medium as recited in
15. The non-transitory computer-readable medium as recited in
receiving an indication of a selection of additional words from the transcription of the audio track of the digital video in the transcribed text window; and
generating, within the video timeline, a display of the selected additional words and a second video clip from the digital video corresponding to the selected additional words.
16. The non-transitory computer-readable medium as recited in
17. The non-transitory computer-readable medium as recited in
removing the digital video from the video playback window within the video assembly interface; and
removing the transcription of the audio track of the digital video from the transcribed text window within the video assembly interface.
18. The non-transitory computer-readable medium as recited in
19. The non-transitory computer-readable medium as recited in
20. The non-transitory computer-readable medium as recited in
|
This application claims the benefit of U.S. Provisional Application 63/106,648, titled “TEXT-DRIVEN EDITOR FOR AUDIO AND VIDEO ASSEMBLY”, filed Oct. 28, 2020; and U.S. Provisional Application 63/106,649, titled “TEXT-DRIVEN EDITOR FOR AUDIO AND VIDEO EDITING”, filed Oct. 28, 2020. The priority provisional applications are incorporated by reference herein in their entirety.
This application is related to concurrently filed U.S. patent application No. 17/378,740 (now U.S. Pat. No. 11,508,411) filed Jul. 18, 2021 for “Text-Driven Editor for Audio and Video Assembly.” filed Jul. 18, 2021for. The related application is incorporated by reference herein in its entirety.
The disclosed technology generally relates to a computer-implemented methods, systems, and computer programs for video editing and assembling a video program. More particularly, the present invention relates to a method, system, and computer program for editing and assembling a video program based on keywords or sound bites derived from the transcribed speech in the audio tracks of video clips.
The subject matter discussed in the section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claim technology.
Video editing is the process of editing segments of video footage, video clips, special effects, and sound recordings into a finalized video program. In the past, nonlinear video editing (NLE) has been performed on complex and expensive dedicated machines with dedicated software, but over time video editing software has evolved to be widely available for use on personal computers and an even computer tablets and smart phones. The need for video editing software has grown over the last decade as more social media video platforms have become widely available incorporating video. The exponential growth of social media video platforms have resulted in a corresponding growth in content creators who are generating video content, editing that video content, and uploading the video content to the social media video platforms and elsewhere.
In professional video editing, the computer programs are expensive and complex, requiring that the user be trained in the use of a generally complex user interface. To become adept, users of nonlinear video editing must acquire an expert level of knowledge and training to master the processes and user interfaces for nonlinear video editing systems. Known nonlinear video editing systems can be intimidating for the general user because of the complexity.
There is a need for a simplified method of video editing and method for assembling a video program which is user-friendly and intuitive to a novice user. The present technology simplifies the creation of a video program by using a text-based user interface. The audio track in a video file is transcribed into text representing the speakers' voices in the audio track. In a manner similar to word processing program, user selects and sequences the transcribed text, and video frames corresponding to the text are assembled into a video program. The present technology is highly intuitive and provides an improved user interface and user experience (UX).
The following materials are incorporated by reference as if fully set forth herein:
In the drawings, like reference characters generally refer to like parts throughout the different views. Also, the drawings are not necessarily to scale, with an emphasis instead generally being placed upon illustrating the principles of the technology disclosed. In the following description, various implementations of the technology disclosed are described with reference to the following drawings.
In the drawings, like reference characters generally refer to like parts throughout the different views. Also, the drawings are not necessarily to scale, with an emphasis instead generally being placed upon illustrating the principles of the technology disclosed.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The color drawings also may be available in PAIR via the Supplemental Content tab.
System Overview
The present technology allows a user to edit and assemble a video program from transcribed text. In one aspect, the present technology is implemented as a browser-based or web-based application which includes cybersecurity features. The remote user uploads video files or video clips to an application server. An application server imports the video files or video clips to a storage device. For the purposes of this disclosure, video files or video clips generally include a file containing a synchronized audio track and a video track, along with metadata of various kinds including frame rate, image resolution, and timecode. The audio track and video track a synchronized by timecode. Each frame of audio is synchronized with each frame of video each frame with the same timecode or timestamp.
The video track includes a sequence of images where each image is a frame operating at a known frame rate. Video clip also includes an audio track which carries the audio for sounds represented in each frame of video. Customarily, the audio track will include the vocal speech of a person or persons shown in the video track. The sounds in the video track are synchronized to the images in the audio track. A video file normally consists of a container containing video data in a video coding format alongside audio data in an audio coding format. The container can also contain synchronization information and various metadata.
In one embodiment, the disclosed technology is a browser-based transaction system. The user pays a fee for using the host editing and video assembly facilities, with the cost based on the duration of the audio video editing session or other metrics. Or, the host editing facilities may be provided on a subscription basis.
The technology has the capability of assembling a new video program. The features include video program editing, annotating, collaborating and then exporting to Microsoft Word, subtitles, Adobe Premier, Final Cut Pro, Avid Media Composer, and other nonlinear editing applications (NLE) and related applications for providing effects, captioning, titling, etc.
A user interacts with the first and second panels of the user interface by selecting text from a transcript of a selected resource in the first panel, dragging and dropping it onto the second panel, which causes the application to create a new clip. The order of clips can be arranged by the user by dragging and dropping clips.
The editing and assembling of a video program proceeds in phases. In the first phase, uploading and transcription, video files are uploaded from the user's device to the website and stored in application's video storage, much like the video storage bin a nonlinear editing system. The audio track is extracted and undergoes a speech-to-text transcription, converting spoken voices in the audio track into transcribed speech in the form of formatted text, which is editable and searchable. A lengthy audio track can be transcribed efficiently in a short time. The transcribed text is displayed in the text window on the user's video device. When multiple speakers are involved, the software is capable of automatically assigning speaker identification headings to the various speakers.
The user has the capability of editing the text and adding comments and annotations to the transcription. The video is accurately synchronized to the text representation using timecode. Every text word has an associated timecode, and every video frame has a timecode synchronized to the text words.
In the next phase, the transcribed text is displayed in a text window in the user interface, alongside a video window that displays the video corresponding to the text display. If the cursor is moved to a particular point in the text display, highlighting a particular word, the corresponding video frame is displayed. If several words or sentences are highlighted in the text display, the corresponding video may be played in a video window. The text may be searched, as in a conventional word processor, and text selections may be highlighted. The highlighted text selections or text segments may be moved to a video timeline portion of the display. The video timeline shows the selected text along with the thumbnail icon of the first preview frame of the corresponding video segment.
In one aspect, the disclosed technology is a text-based video editor using spoken informational content. The system provides asynchronous transcription of the audio track in a video clip. The video editing process begins after the initial step of transcribing and editing the text from the audio track. A timecode is associated with every spoken word. Segments of speech are identified by in-and-out timecodes. The transcribed text can be word-searched to locate soundbites, which are used in the assembly of a video program. The video assembly editor is based on text-based editing and assembly. The disclosed technology generally uses short video segments for quick and easy program assembly. Segments can originate all originate from the same media or video file or combination. A preview video can assembled and exported for post-processing to a finishing nonlinear editor such as Avid media composer, Adobe Premier, or Final Cut Pro to produce a fine cut video program.
Transcription of Audio Track Into Text
The speech-to-text transcription disclosed may be based on machine learning or deep machine learning algorithms that select a speech recognition engine from multiple available recognition engines for accurately transcribing an audio channel based on sound and speech characteristics of the audio channel. A method and system of this type is disclosed in U.S. patent application Ser. No. 16/373,503, which is hereby incorporated by reference in its entirety for all purposes. The technology disclosed and the incorporated application relates to a machine learning based speech-to-text transcription method and system that selects one or more speech transcription engines from multiple available speech recognition engines to provide the most accurate transcription of speech from an audio channel based on sound and speech characteristics. Any other speech-to-text transcription programs may be used, as long as they provide an output of high quality and high accuracy text from the video track in a video clip.
Video Editing Workflow
The most accepted method of video editing workflow is the creation of a rough-cut followed by a fine-cut. In the creation of an audio/video project, leading to a final program, it is the accepted process to first create a version of the video that shows the main features fit together in a sort of global view. This is called a rough-cut. In rough-cut form, collaborators may comment on the video, offering suggestions for improvement. It is much easier to work with a rough-cut, which can often be assembled in minutes and in a form wherein changes can easily be made. Once all the major components are laid out in a viable form, the video editing work flow proceeds to the next stage in the process, which can be thought of as fine-tuning, which would include performing transitions, adding visual effects, sound editing, inclusion of captions—anything that brings the video project closer to its final form. The disclosed technology supports this workflow by providing text-based video program assembly. The making of a rough-cut video has typically been a collaborative effort, which involved the cutting out printed transcript and having teams reorder the cut transcript on a whiteboard. The present technology streamlines this process using the video editing workflow described herein.
Assembling a Video Program with Multiple Speakers
One application of the present technology is the assembly of a video program in which multiple persons are interviewed about a specific subject, and the individuals present their views on camera. Often times, in answer to an interviewer's questions, the interviewees present viewpoints on the multiple topics. The problem of the video program editor is to bring together a coherent video program, in which the subjects discussed are organized by topic. This requires editing the video program to show comments on the same topic, before moving on to a different topic.
With conventional video editors, this is a very complicated operation in video cutting and video assembly, where the user must digitally clip small video segments and move them around. The user makes individual decisions about where to clip the video segments and organize those segments that will make up the final program. This can be an unwieldy process with a conventional video editing system, even to produce a rough-cut which can later be refined. Basically, if there are three or more speakers commenting on multiple subjects, the video will need to be clipped and sequenced multiple times in order to make a coherent program.
With the present technology, the desired result is easily achieved. After the transcription phase, where the voices of the different speakers are transcribed into text form, the speakers' names may be automatically applied labeled to assist in searching and locating relevant text and content.
Word processing methods may then be used on the transcribed text to locate relevant content pertaining to a particular question posed. The editor may select the relevant content for each of the multiple speakers in this will be stored in a video content bin. With the disclosed technology, the video assembly editor may be used to easily create a visual timeline that shows the spoken words of each speaker in a chosen sequence. The individual soundbites may be arranged in a simple manner on the timeline to assemble a rough-cut of the video program. The program can be previewed and also be previewed by authorized collaborators, who may provide comments and suggestions after viewing the rough-cut video program.
Automatic Tagging of Speakers' Voices
In one aspect of the present technology, the system provides tagging of speaker's names. The program detects features associated with the speaker's name to provide labels in the transcribed text that identify a speaker by the attributes of the speaker's voice. Using these speaker labels, the text transcription may be searched, for transcribed text that is associated with a particular speaker. One advantage of this feature allows all the video segments associated with a particular speaker to be grouped together by the corresponding video timecodes to assemble a rough video program comprising video segments associated with a single speaker. For example, in an interview, it may be advantageous to bring together, all the comments made by a particular speaker in one series of concatenated video segments without including the segments associated with the interviewer. This is a very fast way of assembling the rough-cut of only those video segments associated with a specific interviewee.
System Architecture
The following describes in architectural form implementations for a browser-based video editing and video assembly method and system 100, which is intentionally simplified to improve clarity in the description.
Additionally, remote computers, tablets, and smart phones may have access through the network 122 to the application 114 as authorized collaborators. As part of the video editing and video assembly method, users 134, 136, 138 may be specified as reviewers and authorized to provide comments or editing suggestions to the finalized video program, as will be described. The authorized collaborators may review the video and comment in virtual real-time, so that the commentary may be received by the user while the video program is being finalized.
Cloud-based services 124 and cloud-based storage 125 provide the end users with a convenient method of storing video files, which may be uploaded to computer 110, and in particular to audio/video storage bin 122, to provide the raw video clips which will be used for the assembly of a finished video program.
The interconnection of the elements of system 100 will now be described. The network 122 couples the computers 130, smart phones 128, and the computer tablets 126 with each other.
The communication path can be point-to-point over public and/or private networks. The communication can occur over a variety of networks, including private networks, VPN, MPLS circuit, or Internet, and can use appropriate application programming interfaces (APIs) and data interchange formats such as Representational State Transfer (REST), JavaScript Object Notation (JSON), Extensible Markup Language (XML), Simple Object Access Protocol (SOAP), Java Message Service (JMS), and/or Java Platform Module System.
Communications may be encrypted. The communication is generally over a network such as the LAN (local area network), WAN (wide area network), telephone network (Public Switched Telephone Network (PSTN), Session Initiation Protocol (SIP), wireless network, point-to-point network, star network, token ring network, hub network, Internet, inclusive of the mobile Internet, via protocols such as EDGE, 3G, 4G LTE, Wi-Fi and WiMAX.
The system components of
The cloud-based services 124 provide functionality to users that are implemented in the cloud or on the Internet. The cloud-based services 124 can include Internet hosted services such as news web sites, blogs, video streaming web sites, social media web sites, hosted services, cloud applications, cloud stores, cloud collaboration and messaging platforms, and/or cloud customer relationship management (CRM) platforms. Cloud-based services 124 can be accessed using a browser (via a URL) or a native application (a sync client).
Categories of cloud-based services 124 include software-as-a-service (SaaS) offerings, platform-as-a-service (PaaS) offerings, and infrastructure-as-a-service (IaaS) offerings.
Examples of common web services today include YouTube™, Facebook™ Twitter™, Google™, LinkedIn™, Wikipedia™, Yahoo™, Baidu™, Amazon™, MSN™ Pinterest™, Taobao™, Instagram™, Tumblr™, eBay™, Hotmail™, Reddit™ IMDb™, Netflix™, PayPal™, Imgur™, Snapchat™, Yammer™, Skype™, Slack™, HipChat™ Confluence™, TeamDrive™, Taskworld™, Chatter™, Zoho™, ProsperWorks™, Google's Gmail™, Salesforce.com™, Box™, Dropbox™, Google Apps™, Amazon AWS™, Microsoft Office365™, Workday™, Oracle on Demand™, Taleo™, Jive™, and Concur™.
In a browser-based system 100 such as disclosed, network security features may be provided to provide network security, including encryption. Security features may be provided by encryption or other means to provide security. In a corporate organization, users may access hundreds of providers of cloud-based services to generate, store, collaborate, and share data with other users inside or outside of the organization. The network interface includes security protocols to protect the user's data and prevent infiltration of malware into the video editing system.
In
The transcription engine 152 may be operated independently of the editing and assembly module as a separate service, to provide accurate transcriptions of audio content, such as video interviews, on a pay-as-you-go or subscription basis.
When a new editing and assembly project is initiated, the user's transcribed text files are transferred into the application's video assembly editor 154, where a finished video program is assembled. The video assembly editor 154 is a graphical user interface and a set of user tools for creating a video program from the user's uploaded video files 150. The disclosed technology uses the transcribed text to select to select and sequence video segments into an assembled video program. Using text transcriptions of the audio track, the user selects segments of text, and drags and drops onto a visual timeline in a chosen sequence. The video frames and audio frames associated with the selected text and moved to the timeline along with the text. The assembly editor 154 allows the user to experiment with different cuts and sequences to assembly of video program. The resultant video program may be a rough-cut video program in which various video segments are assembled in proper sequence. In this case, the user may export 158 the rough-cut video program to a secondary video editing system such as Avid media composer, Adobe Premier, Final Cut Pro, to provide finishing touches to the rough cut and arrive at a fine-cut version. In another embodiment, the uploading may occur directly from the application's video assembly editor and the transcription engine may be operated as component of the application's video assembly editor.
In some uses of the disclosed technology, the video assembly editor 154 will produce a completed video program, without the need for post-processing in a nonlinear editor.
The application provides a collaboration interface 156 for authorized collaborators to sign into the application, play the assembled video, and provide comments to the user while the video is being assembled. This may occur in virtual real-time, so that the comments will be received while the video is still being assembled by the user.
A video clip typically includes a plurality of sequential image frames and an audio track associated with the video clip, by time code, and other metadata, such as resolution and frame rate. The audio track typically includes a speaker or speakers' vocal track comprising spoken words and sentences.
As part of the first phase, the audio track is transcribed into text comprising a plurality of text segments, wherein the text segments map to a plurality of video segments. The audio frames and image frames per segment are always synchronized.
In
Each uploaded video clip 150 is stored in the user's video bin 122. The user may select one or more uploaded video clips 150 to be dragged and dropped into the video assembly editor 154 to create a new video program.
As illustrated in
Synchronized Timecodes
Timecode is a video synchronizing standard which is based on a 24-hour clock readout. Each frame of video assigned unique timecode value. The timecode 330 is a sequence of numeric codes generated at regular intervals by a timing synchronization system. Timecode is used in video production applications which require a temporal coordination of elements related to video frames and audio frames.
Basic time-code standards assume a video frame rate of 30 frames per-second or 25 frames per-second, depending on the country. The NTSC video standard used principally in the United States and some other countries has a frame rate of 30-frames per second (fps), and PAL and SECAM, used mostly in Europe, has a frame rate of 25-frames per second (fps). In NTSC, 30 frames equal one second. Video in color or DTV/HDTV actually has a frame rate in the US of 29.97 fps. Synchronization between 29.97 fps and 30 fps timecodes are maintained by methods such as periodically dropping timecode frames according to a formula, known as drop-frame timecode. Generally in the disclosed technology, drop frame timecode's are not used or required. In the present technology, additional frame rates may be used, including: 23.976, 59.97, 60, and other widely used frame rates.
Timecoding is also referred to as time stamping. One well-known type of timecode used in video production is SMPTE timecode. A SMPTE time code is used to identify a precise location on time-based media such as audio or video in digital systems. SMPTE refers to the standards organization Society of Motion Picture and Television Engineers. The SMPTE standards actually describe a family of timecode encoding formats used in film, video, and audio production, including: Linear timecode (LTC), Vertical interval timecode (VITC), AES-EBU embedded timecode, Burnt-in timecode, CTL control track timecode (control track); and MIDI timecode.
Timecode is displayed in the format HH.MM.SS.FF (hours, minutes, seconds, frames). For example, if a video clip begins at timecode 14:34:01:22, this translates into 14 hours, 34 minutes, 1 seconds, 22 frames. In this way, each frame of a video track 310 and each frame of the corresponding audio track 315 includes a precise digital address.
In the technology disclosed, video assembly editor 154 tracks time code associated with each spoken word in the transcribed text abstracted from the audio track portion of the audio/video clip. A high quality transcription is performed on the audio track so that each individual speaker's audio is provided with the text transcription of the speech each speaker.
The transcription of a speaker's each spoken word is correlated to the audio track 315 timecode and the video track 310 timecode in the video track, such that the video corresponding to each spoken word is identifiable in the text transcript 325 of the audio video track. The transcribed audio text is presented in a text window in the user interface and may be searched as in a conventional word processing program such as Microsoft Word.
As described further on in more detail, a video program may be constructed by selecting words and phrases in text form from the transcribed audio speech in the text window. These selected soundbites may be arranged in a video program window. The video segment corresponding to the soundbites will be automatically selected and arranged on a timeline.
The video program is assembled in video assembly editor 154 essentially by moving around text transcribed from the audio track 315.
In the video assembly editor 154, the user selects soundbites or text segments 345 in a text display window, using conventional word processing editing functions. The user moves the chosen soundbite text 345 onto a visual timeline 350. The clips represented on the timeline 355 may be reordered, at least by a drag-and-drop method, and trimmed to assemble a video rough-cut which can be exported in any of a large number of video media formats such as including H.264 MPEG-4, the most popular format. The fine cut or formal cut can be performed on a dedicated nonlinear editing system such as Adobe Premiere Pro, Final Cut Pro X, SRT Subtitles, Avid Media Composer, DaVinci Resolve, Avid Pro Tools, and Adobe Audition.
The transcribed text and video are always locked in synchronization by the associated timecode 330. If the user moves the text pointer to a different location in the transcribed text 325, the corresponding video in the video window jumps to the new location.
Continuing now to
The transcribed text 506 is structured by timecode. The spoken text is divided into text segments 508 or paragraphs, and each text segment is provided with visual timestamp 510 that indicates a time associated with each text segment 508. The time stamps (00:22) 510 are derived from the transcript map shown in
Also shown in the upper part of the display 500, the audio waveform 512 corresponding to the audio track 315 is shown. Any selected point in the audio waveform 512 will correspond to a text segment 508 and also to the corresponding video segment 502. The user may scroll through the audio waveform 512 waveform to select and play a particular segment of the audio track 315 for that video clip 150.
The Transcript Map
Turning now to
In
In this illustration, the timecode tracks five spoken words in succession by the start timecode and the end timecode. The spoken word “I” is associated with the first text element. The spoken word “think” is associated with the second text element. The spoken word “there's” is associated with the third text element. The spoken word “a” is associated with the fourth text element. And the spoken word “general” is associated with the fifth text element. Together, the spoken phrase is “I think there is a general” in which each text element as a very specific start timecode and end timecode. The editing application tracks all of these individual text elements and the values of each text element. As is shown in
Video Assembly Editor
The rightmost panel in the user interface shows 800 the graphical elements that are used in the assembly of a new video program. A video window 820 is located in the topmost area of the display. The video window 820 includes play 830 and pause 832 controls. Below this video window is an open area 840 for the video timeline 842 as will be described. The video assembly timeline area 840 is the space where the selected soundbites and video preview thumbnails will be dragged-and-dropped and sequenced during the editing and assembly process.
If the user chooses for the text segment and corresponding video segment to be positioned between two previous video segments, the first icon 1002 is selected. If the user chooses for the text segment and corresponding video segment to be positioned at the end of the sequence of video segments, a second icon 1004 is selected. The selected transcribed text segment 1006 is shown highlighted on the text screen.
Turning now to
The preview frame 1104 is displayed alongside the selected text 1102. The preview frame 1104 is similar to a thumbnail or head frame in video editing, providing a visual representation of the video clip as a guide to sequencing the video segments 502 in the assembly of the video program. It is a convenient way for the editor to reorder the clips by dragging and dropping during program creation. In the rightmost panel of the screen, the project name 1110 is shown alongside the timecode 1108 or timestamp associated with the last text element or video frame for that selection. From this point onward, the user may “click” the preview frame to play the video.
The user interacts with the program by user actions, which may include drag-and-drop actions. Selections of text segments 1006 and the associated video segments 502 are made by user actions, which may include by drag-and-drop actions. The order of the video segments 502 may be arranged and reordered by reordering commands, which may include dragging and dropping of clips. Reordering of clips is performed preferably by drag-and-drop actions, but other user reordering commands are contemplated.
As illustrated in
The timeline 842 is configured to play back the successive video segments in order in which the text segments are dropped onto the timeline. The video corresponding to a text segment is displayed, showing the actual text along with the starting and ending timecode of the text segment. A preview frame 1104 is displayed with the selected text segment along with the starting timecode for the video segment.
The playback function on the video player 820 includes a play/resume button 830. The video assembly editor plays the video segments in the order the text segments were selected. The player includes a play/resume button 830, and a pause button 832. The timeline 1650 is configured to, in response to the selection, to graphically render respective previews of the video segments in the order. The timeline 1650 of the video assembly editor is configured to graphically render text in the text segments along with their respective timestamps adjacent to the respective previews adjacent to the respective previews. A second text and video segment 1504 is positioned below the first selected segment 1104, and the third text and video segment 2004 is positioned below the second selected text and video segment 1504. A progress bar may be displayed under the video so that a user may see the playback position within the timeline. Also, the current playback position of collaborators may also be displayed. The user may also skip back and forth between segments, or jump directly to one by clicking on the thumbnail 1104, 1504, and 2004.
Operation of the Disclosed Technology
In creating a new project, the user drags and drops audio and video files, often in the form of MP4 videos by accessing cloud storage applications. The downloaded video clips will include an audio track, video track, and synchronizing timecode for the audio track in the video track. The first step is to transcribe the original audio into a transcription language. In this form, a text file is created which corresponds to the spoken words in the audio track. The timecode associates each spoken word with the spoken words in the audio track. The text transcription is in the original language in the audio track, but there is a provision to convert the text into other languages as a secondary step, as needed and-drop audio.
In the next phase, the video assembly mode, the video is cut by editing text. By selecting and sequencing the text representations of the soundbites from speakers voices, transcribed into text, the corresponding video is also sequenced on a graphical timeline. Once a rough-cut video is created from sequencing the soundbites, other video editing operations are performed, such as removing time gaps, and bloopers, the video can be exported using a variety of export formats such as popular MP4. Subsequently, the rough-cut video can be further processed in a nonlinear video editing application.
In the speaker video resources, multiple speakers can be identified in a single transcript. Each clip can include one or more speakers. Multiple speaker video resources can be created by transcribing zoom meetings and other video conferences.
A video conferencing center can facilitate generation of transcripts by marking in video recording original audio channel sources to help the transcription process distinguish among speakers and to recognize multiple speakers talking at the same time. This is practical during multi-Speaker video conferences, leveraging the separate channel origins of separate speakers. Further, speaker identification can be annotated as captions in the video recording file for speaker channels based on Speaker logins and channel origin attributes.
Removing Bloopers
A blooper may result in a recorded interview or informational video interview, when a speaker states something that is improper, embarrassing, or otherwise inappropriate for the context of the program. It is usually desirable to remove these “bloopers” from the video program. Selected content may be identified as bloopers by providing a listing of keywords to search. The user may excise portions of the recording, both audio and video, by manipulating the words in the transcript based on this keyword search. The solution can be extended by training the transcription system to alert the user to a blooper keyword and the content immediately preceding the blooper keyword, so that the user may choose to delete the inappropriate content.
Removing Disfluencies
Also, the disclosed technology provides for the automatic removal of disfluencies. Breaks or disruptions in the flow of speech are labelled “disfluencies” and these encompass speech hesitations and irregularities such as “uhm's” or pauses caused by stuttering. Disfluencies may be identified by detecting excessive pauses between spoken words, or using a keyword list of known disfluency utterances. Once identified, the software may automatically delete the disfluencies from the transcript. Additionally, disfluencies may be detected by machine learning methods.
Collaboration Mode
The creation of a video program has been and continues to be a collaborative effort. The technology of the present invention provides for the originator of the project to share the project with other collaborators by inviting others to collaborate.
In one aspect, a user may invite others to collaborate and play the rough-cut video program. A collaborator may input security credentials into the application the user-interface, which allows that collaborator authorized status to provide near real-time commentary on a video program under development in the video assembly editor.
Collaborators may view and approve the rough-cut, or make suggestions for alternatives or improvements, and actually create different versions. For example, collaborators may play back different versions to view alternative sequencing of video segments and provide comments or suggested edits the user.
The creation of a video program has been and continues to be a collaborative effort. The technology of the present invention provides for the originator of the project to share the project with other collaborators by inviting others to collaborate. Collaborators may indicate approval or disapproval on the timeline with visual icons (emoji's).
User Interface
In one aspect, the disclosed technology is a method and system for video editing and for assembling a video program implemented a dedicated website. The disclosed technology provides an enhanced user experience (UX) superior to competitive products The workflow is an easy-to-follow guided method for assembling a video program. The disclosed technology provides an enhanced user experience by simplifying the video program creation process which results in a final program that can be exported for post-processing or final processing in a dedicated video editing system.
Various competitive products have the drawback of not being user-friendly. It is known that when the workflow and user experience (UX) is complex, illogical, or non-intuitive, users can be discouraged from using the website at all. Providing the user with a pleasing user experience (UX), promotes further use of the web site application. The user interface may support viewing transcripts for two interviews at the same time on multiple monitors to make cutting between subjects faster.
In another aspect, the timeline may be arranged by topics and nested sequences.
Computer System
In one implementation, the video assembly editor 2240 is communicably linked to the storage subsystem 2210 and the user interface input devices 2238.
User interface input devices 2238 can include a keyboard; pointing devices such as a mouse, trackball, touchpad, or graphics tablet; a scanner; a touch screen incorporated into the display; audio input devices such as voice recognition systems and microphones; and other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and ways to input information into computer system 2200.
User interface output devices 2276 can include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem can include an LED display, a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem can also provide a non-visual display such as audio output devices. In general, use of the term “output device” is intended to include all possible types of devices and ways to output information from computer system 2200 to the user or to another machine or computer system.
Storage subsystem 2210 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein. These software modules are generally executed by processors 2278.
Processors 2278 can be graphics processing units (GPUs), field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and/or coarse-grained reconfigurable architectures (CGRAs). Processors 2278 can be hosted by a deep learning cloud platform such as Google Cloud Platform™, Xilinx™, and Cirrascale™. Examples of processors 2278 include Google's Tensor Processing Unit (TPU)™, rackmount solutions like GX4 Rackmount Series™, GX22 Rackmount Series™, NVIDIA DGX-1™, Microsoft' Stratix V FPGA™, Graphcore's Intelligent Processor Unit (IPU)™, Qualcomm's Zeroth Platform™ with Snapdragon Processors™, NVIDIA's Volta™, NVIDIA's DRIVE PX™, NVIDIA's JETSON TX1/TX2 MODULE™, Intel's Nirvana™, Movidius VPU™, Fujitsu DPI™, ARM's DynamicIQ™, IBM TrueNorth™, Lambda GPU Server with Testa VIOOs™, and others.
Memory subsystem 2222 used in the storage subsystem 2210 can include a number of memories including a main random access memory (RAM) 2232 for storage of instructions and data during program execution and a read only memory (ROM) 2234 in which fixed instructions are stored. A file storage subsystem 2236 can provide persistent storage for program and data files, and can include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations can be stored by file storage subsystem 2236 in the storage subsystem 2210, or in other machines accessible by the processor.
Bus subsystem 2255 provides a mechanism for letting the various components and subsystems of computer system 2200 communicate with each other as intended. Although bus subsystem 2255 is shown schematically as a single bus, alternative implementations of the bus subsystem can use multiple busses.
Computer system 2200 itself can be of varying types including a personal computer, a portable computer, a workstation, a computer terminal, a network computer, a television, a mainframe, a server farm, a widely-distributed set of loosely networked computers, or any other data processing system or user device. Due to the ever-changing nature of computers and networks, the description of computer system 2200 depicted in
Each of the processors or modules discussed herein may include an algorithm (e.g., instructions stored on a tangible and/or non-transitory computer readable storage medium) or sub-algorithms to perform particular processes. A module is illustrated conceptually as a collection of modules, but may be implemented utilizing any combination of dedicated hardware boards, DSPs, processors, etc. Alternatively, the module may be implemented utilizing an off-the-shelf PC with a single processor or multiple processors, with the functional operations distributed between the processors. As a further option, the modules described below may be implemented utilizing a hybrid configuration in which certain modular functions are performed utilizing dedicated hardware, while the remaining modular functions are performed utilizing an off-the-shelf PC and the like. The modules also may be implemented as software modules within a processing unit.
Various processes and steps of the methods set forth herein can be carried out using a computer. The computer can include a processor that is part of a detection device, networked with a detection device used to obtain the data that is processed by the computer or separate from the detection device. In some implementations, information (e.g., image data) may be transmitted between components of a system disclosed herein directly or via a computer network. A local area network (LAN) or wide area network (WAN) may be a corporate computing network, including access to the Internet, to which computers and computing devices comprising the system are connected. In one implementation, the LAN conforms to the transmission control protocol/internet protocol (TCP/IP) industry standard. In some instances, the information (e.g., image data) is input to a system disclosed herein via an input device (e.g., disk drive, compact disk player, USB port etc.). In some instances, the information is received by loading the information, e.g., from a storage device such as a disk or flash drive.
A processor that is used to run an algorithm or other process set forth herein may comprise a microprocessor. The microprocessor may be any conventional general purpose single- or multi-chip microprocessor such as a Pentium™ processor made by Intel Corporation. A particularly useful computer can utilize an Intel Ivybridge dual-12 core processor, LSI raid controller, having 128 GB of RAM, and 2 TB solid state disk drive. In addition, the processor may comprise any conventional special purpose processor such as a digital signal processor or a graphics processor. The processor typically has conventional address lines, conventional data lines, and one or more conventional control lines.
The technology disclosed herein may be implemented as a method, apparatus, system or article of manufacture using standard programming or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein refers to code or logic implemented in hardware or computer readable media such as optical storage devices, and volatile or non-volatile memory devices. Such hardware may include, but is not limited to, field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), complex programmable logic devices (CPLDs), programmable logic arrays (PLAs), microprocessors, or other similar processing devices. In particular implementations, information or algorithms set forth herein are present in non-transient storage media.
Other implementations of the method described in this section can include a non-transitory computer readable storage medium storing instructions executable by a processor to perform any of the methods described above. Yet another implementation of the method described in this section can include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above.
One or more implementations of the technology disclosed, or elements thereof can be implemented in the form of a computer product including a non-transitory computer readable storage medium with computer usable program code for performing the method steps indicated. Furthermore, one or more implementations of the technology disclosed, or elements thereof can be implemented in the form of an apparatus including a memory and at least one processor that is coupled to the memory and operative to perform exemplary method steps. Yet further, in another aspect, one or more implementations of the technology disclosed or elements thereof can be implemented in the form of means for carrying out one or more of the method steps described herein; the means can include (i) hardware module(s), (ii) software module(s) executing on one or more hardware processors, or (iii) a combination of hardware and software modules; any of (i)-(iii) implement the specific techniques set forth herein, and the software.
While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.
Particular Implementations
Some particular implementations and features for assembling a video program using soundbite-based editing are described in the following discussion. The disclosed technology is a time-saving way to produce a rough cut video using an intuitive user interface that requires no special training or tutorials for its use. In one disclosed implementation, the system and a computer implemented method are provided for assembling and editing of video program is based on spoken words and soundbites. The disclosed technology imports source A/V files over network and provides the user with a way as assembling and editing a video program from spoken words associated with the imported video files. The past nonlinear video editing has been performing complex and expensive dedicated machines with dedicated software. The disclosed technology brings video editing and the assembly of the video program within the grasp of most Internet users from remote locations using cloud-based applications or a network.
Video files generally consist of an audio track and a video track. This audio track and video track are synchronized by timecodes so that each frame of video is locked into frame of audio. In actuality, the audio track in the video track are synchronized. One frame of audio is associated by timecode to one frame of video. Video clip also contains other metadata of various kinds including frame rate, image resolution, along with the synchronizing timecode. Source audio/video clips may be imported in various formats such as popular format such as MPEG 4. The disclosed technology imports source audio/video clips and any of the popular video formats.
In one aspect, a video track including spoken audio with spoken voices is transcribed into searchable text by a speech-to-text engine. The text transcript appears as an editable text document. In the transcription process a timecode is associated with each transcribed word from the audio track. The words are mapped to the video, and every spoken word has an associated timecode, with a start timecode and a stop timecode. Each word of transcribed text has a corresponding start timecode and end timecode. Each spoken word corresponds to a timecode marker, which in turn corresponds to a video frame or frames.
In one implementation, the transcribed text may be edited using conventional word processing operations and text editing functions. The text may be or searched by the use of keywords. By searching the corresponding text, exact fragments or soundbites can be identified in the transcribed text and selected as transcribed text segments. By selecting text and arranging text segments on a timeline display, a video program can be assembled on a timeline display. The video frames in the selected video segments correspond precisely to selected text segments. Multiple text segments may be selected by the user and move to the timeline by drag-and-drop actions, and sequenced as a playlist. In this way, a video program is assembled which can be played back, while the user experiments until satisfied with the final sequence which can then be exported as a complete program or as a rough cut video program which can be exported to a dedicated nonlinear editing system or application for fine cut finishing. The disclosed technology can be used to assemble a video program on a timeline display in any chosen order by the user. The sequence of video segments may be ordered, reordered, and edited, as desired. Sequences can be changed along within the video segments themselves, until the user is fully satisfied with the composition of the video program, and the user is ready to export.
In the assembly of a video program, the disclosed technology allows the user to engage authorize collaborators to view the video project as it is being assembled. Authorized collaborators can log on and play the video, provide comments on the project, provide suggestions for alternatives for improvements, or simply provide positive indicators in the form of emoji's.
The method described in this section and other sections of the technology disclosed can include one or more of the following features and/or features described in connection with additional methods disclosed. In the interest of conciseness, the combinations of features disclosed in this application are not individually enumerated and are not repeated with each base set of features. The reader will understand how features identified in this method can readily be combined with sets of base features identified as implementations.
In another aspect, the present invention provides a very fast way to search for relevant content. The alternative is to view an entire video and audio track, which can be time-consuming, even when viewed at faster than real-time speed. By using transcribed text which can be searched, locating relevant text is expedited. It's no longer necessary to view the entire video or listen to the entire audio associated with the transcribed text to find relevant content. When the user highlights text in the transcription, the audio and video corresponding to that text is immediately queued up for viewing. This feature also expedites the assembly of the video program, because the user can quickly identify the particular soundbites which are essential to assembling a particular video program.
In one aspect of the disclosed technology, a video implemented method is provided for assembling an audio/video program from source A/V files. The method includes importing at least one source audio/video file, where the audio/video file comprising an audio track and a video track. The audio track includes audio frames and the video track including video frames synchronized by time code. The audio track further includes spoken speech from one or more speaker's voices, which are in synchronization with the corresponding video frames. The method includes creating a text transcription of the spoken speech using a speech-to-text engine. During the creation of the text transcription, each word in the audio track is mapped to corresponding video frames. On a monitor screen, the transcription of the audio is displayed as text in a text window, alongside an indication of the corresponding video frames. The present technology maps the text elements to the corresponding video frames in the transcript map, where each word has a start timecode and end timecode. A user interacts through a user interface to select a text segment and moving the selected text segment with the corresponding video segment into a visual timeline on the monitor screen. The text segment and corresponding video segment are displayed together on visual timeline.
The disclosed technology tracks spoken content for multiple speakers, and the content is automatically labeled for each speaker based on the speech characteristics of each speaker. In one aspect, the spoken content is labeled with speakers' names. In another aspect, the identification label for each speaker is manually applied by the user. Users may assign a speaker identification label to selected text, and it will propagate to all other paragraphs that are determined by the software to be the same speaker.
In one aspect, the computer implemented method provides a timeline for displaying the selected video clips arranged in a chosen display sequence. In some implementations of the method for assembling a video program, multiple text segments and corresponding video segments are moved to the visual timeline and displayed in a user-selected playback order on the timeline. In one example, three or more text and video segments may comprise the assembled video program. In another example of the disclosed method, in response to a play command, the selected segments may be played in order of selection as shown the sequence as indicated on the timeline. Alternatively, in response to a user selection, the sequence of playback may be reordered in a second selected order for playback. And in a like manner, in response to user selection, the playback of the selected video segments on the timeline for playback in a third selected order. The reorder command allows the user to drag-and-drop the text and video selections into a different chosen order. A user may skip forwards or backwards, clicking on chosen thumbnails to jump to that point in the playback, 1104, 1504, or 2004.
In one aspect, the user may authorize remote collaborators to play the video program and comment in real time or provide feedback the user. The collaborators must be authorized by using passwords and other network security measures. In another implementation of the disclosed technology, it may be desirable to eliminate some video content in the situation where the speakers use an appropriate language for unwanted content. By performing a keyword search on the transcribed text, the unwanted content may be located and deleted from the assembled. List of keywords can also be applied throughout the transcribed text to identify all instances of unwanted content. By deleting unwanted text content, the corresponding video segments will also be deleted.
In another disclosed implementation, video frames are retrieved corresponding to text selections and the retrieved text and video selections are moved to a timeline display for further editing and assembly of the video program. A completely assembled video program may take the form of a rough-cut video, which will be exported to a secondary video editing system for fine-cut processing.
The disclosed technology may be implemented as a system for assembling a audio/video program. The system includes an A/V storage bin for storing uploaded video files from a user. The video files comprise an audio track and a video track. The audio track includes audio frames and the video track includes video frames synchronized by time code. The audio track further including spoken speech from one or more speakers, in synchronization with corresponding video frames. A speech-to-text engine is provided for creating a text transcription of the spoken speech on the audio track, wherein each transcribed word is mapped to corresponding video frames. Aa monitor screen displays the text transcription. User-selected text segments and corresponding video segments are shown. A visual timeline displayed on the monitor screen for sequencing user-selected text segments and corresponding video segments into a user-selected playback order on the timeline.
Some disclosed implementations provide for refining the playback order of user selected text segments and corresponding video segments are carried out by a user's drag-and-drop actions. The system provides for accepting authorized collaborator interaction with the video program, including playback, providing suggestions, and comments. The collaborators changes may be reflected in the timeline in real-time.
In one implementation, a system is provided for creating and assembling a video program over a network. A host computer is in operable connection with a network interface. The network interface is in operable connection with a network. An application server is in operable communication with the network interface. An application server is configured to host a website application for assembling a video program, the server having a user interface for providing a remote user with web-page access to the host computer. The user interface is in operable communication with an A/V storage bin and video editing resources. The user interface comprises a webpage screen for assembling video program. The user interface comprises a webpage screen configured for uploading a user's A/V files into the application. The user interface further comprises a webpage screen configured for displaying a text window for transcribed text from the audio track of selected A/V files, where the displayed text words correspond to timecode-synchronized video content. A webpage screen is configured to provide a timeline, wherein a user may select text segments and video segments corresponding to the transcribed text segments. A webpage screen is configured to show the sequence of selected text segments alongside corresponding video segments in a vertical timeline configuration. A sequence of text segments and corresponding video segments are listed on a timeline display in a chosen playback sequence. The sequence may be reordered by user drag-and-drop actions.
In a further aspect of the disclosed system, a text window is provided for displaying transcribed text from an audio track, and a video window for displaying image frames corresponding to transcribed text, and a timeline for dropping and dragging transcribed text segments onto the timeline in a user-chosen playback sequence. The timeline displays selected test segments, video preview frames, and a video playback window, wherein the again selected text segments are moved by drag-and-drop actions into a user selected sequence, so that the assembled video program may be played in the selected sequence.
The methods described in this section and other sections of the technology disclosed can include one or more of the following features and/or features described in connection with additional methods disclosed. In the interest of conciseness, the combinations of features disclosed in this application are not individually enumerated and are not repeated with each base set of features. The reader will understand how features identified in this method can readily be combined with sets of base features identified as implementations.
Allibhai, Shamir, Hodgson, Roderick Neil
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
10945040, | Oct 15 2019 | Adobe Inc.; Adobe Inc | Generating and providing topic visual elements based on audio content and video content of a digital video |
7298930, | Nov 29 2002 | Ricoh Company, LTD | Multimodal access of meeting recordings |
20020069073, | |||
20070050837, | |||
20110060751, | |||
20110202848, | |||
20110239107, | |||
20120210221, | |||
20140169767, | |||
20150378998, | |||
20200126559, | |||
20200126583, | |||
20200273493, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jul 15 2021 | ALLIBHAI, SHAMIR | SIMON SAYS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 056915 | /0611 | |
Jul 16 2021 | HODGSON, RODERICK NEIL | SIMON SAYS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 056915 | /0611 | |
Jul 18 2021 | META PLATFORMS TECHNOLOGIES, LLC | (assignment on the face of the patent) | / | |||
Jan 27 2022 | Facebook, Inc | Facebook Technologies, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 059392 | /0564 | |
Mar 10 2022 | SIMON SAYS, INC | Facebook, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 059394 | /0443 | |
Mar 18 2022 | Facebook Technologies, LLC | META PLATFORMS TECHNOLOGIES, LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 060199 | /0876 |
Date | Maintenance Fee Events |
Jul 18 2021 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Jul 29 2021 | SMAL: Entity status set to Small. |
Aug 30 2021 | PTGR: Petition Related to Maintenance Fees Granted. |
Date | Maintenance Schedule |
Apr 11 2026 | 4 years fee payment window open |
Oct 11 2026 | 6 months grace period start (w surcharge) |
Apr 11 2027 | patent expiry (for year 4) |
Apr 11 2029 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 11 2030 | 8 years fee payment window open |
Oct 11 2030 | 6 months grace period start (w surcharge) |
Apr 11 2031 | patent expiry (for year 8) |
Apr 11 2033 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 11 2034 | 12 years fee payment window open |
Oct 11 2034 | 6 months grace period start (w surcharge) |
Apr 11 2035 | patent expiry (for year 12) |
Apr 11 2037 | 2 years to revive unintentionally abandoned end. (for year 12) |