Methods and apparatus for extracting structured data from weblogs are disclosed. In some examples, the methods and apparatus include a web crawler to access a home page of a weblog, and identify a feed associated with the weblog. The methods and apparatus also include a feed finder to determine whether items in the feed contain sufficient content for feed-guided segmentation. The methods and apparatus also include a feed classifier to determine whether the items in the feed contain full content of the weblog. The methods and apparatus also include a wrapper to map data found in the feed into a representation of a weblog post, and screen scrape the weblog into the representation of the weblog post.

Patent
   11556598
Priority
Jun 16 2005
Filed
Jan 14 2019
Issued
Jan 17 2023
Expiry
Apr 21 2029

TERM.DISCL.
Extension
1040 days
Assg.orig
Entity
Large
0
226
currently ok
25. A computer readable storage device or storage disc comprising instructions that, when executed, cause a machine to at least:
access a home page of a weblog;
identify a feed associated with the weblog;
determine whether the feed contains sufficient content for feed-guided segmentation;
when the feed contains sufficient content for feed-guided segmentation, determine whether the feed contains full content or partial content of the weblog;
when the feed contains full content of the weblog, map data found in the feed into a representation of a weblog post; and
when feed contains partial content of the weblog, screen scrape the weblog into the representation of the weblog post.
1. An apparatus, comprising:
at least one memory;
instructions; and
processor circuitry to execute the instructions to:
access a home page of a weblog;
identify a feed associated with the weblog;
determine whether items in the feed contain sufficient content for feed-guided segmentation;
when the items in the feed contain sufficient content for feed-guided segmentation, determine whether the items in the feed contain full content of the weblog;
when the items in the feed contain full content of the weblog, map data found in the feed into a representation of a weblog post; and
when the items in the feed contains partial content, screen scrape the weblog into the representation of the weblog post.
2. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to receive a feed URL for the weblog from a ping relayed by a ping server from which a weblog update is collected.
3. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to, when the weblog is hosted by a weblog host that publishes a full content feed for the weblog, map a URL of the weblog to a feed URL.
4. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to use RSS auto-discovery to identify a URL of the feed from metadata in HTML, for the home page of the weblog.
5. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to:
extract a set of hyperlinks from a body of the weblog; and
filter the set of extracted hyperlinks to identify hyperlinks that belong to the feed for the weblog.
6. The apparatus of claim 5, wherein the processor circuitry is to execute the instructions to use a set of heuristics to filter the set of extracted hyperlinks to identify the feed for the weblog from the extracted hyperlinks.
7. The apparatus of claim 5, wherein the processor circuitry is to execute the instructions to identify the feed for the weblog based on at least one of a URL that allow readers to subscribe to the feed in an RSS reader, a URL matching a host name of the weblog and having a common feed suffix including at least one of atom.xml, .xml, .rss, or rdf, or a URL matching a domain name of the weblog and having a common feed prefix including at least one of xml or rss.
8. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to determine the feed contains content for feed-guided segmentation when an item in the feed contains a date field and at least one of a content field or a description field.
9. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to determine whether the feed contains full content or partial content of the weblog based on heuristics and features of content and description text in the feed.
10. The apparatus of claim 9, wherein the features of the content and the description text include at least one of a presence or absence of HTML, tags, a percent of posts ending in ellipses, or a type of feed.
11. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to, when the feed contains partial content of the weblog, create the representation of the weblog post based on data available from the feed.
12. The apparatus of claim 11, wherein the representation includes at least one of a URL of the weblog, a date on which the weblog post was posted, partial content of the weblog post, a title of the weblog post, an author of the weblog post, or a permalink of the weblog post.
13. The apparatus of claim 12, wherein the processor circuitry is to execute the instructions to search the home page of the weblog for content matching the representations.
14. The apparatus of claim 13, wherein the processor circuitry is to execute the instructions to:
upon finding content on the home page of the weblog matching the representations, identify an enclosing node associated with the content matching the representations in extensible HyperText Markup Language (XHTML) for the home page of the weblog; and
extract text contained within the enclosing node to supplement the representation.
15. The apparatus of claim 12, wherein the processor circuitry is to execute the instructions to exclude summarization artifacts and advertisements from the representations.
16. The apparatus of claim 1, wherein the processor circuitry is to execute the instructions to extract dates from XHTML for a page of the weblog, and wherein when the feed contains insufficient content for feed-guided segmentation, the processor circuitry is to execute the instructions to:
sort the extracted dates into ordered lists, each ordered list corresponding to a relative XPath;
filter the ordered lists according to a set of heuristics to determine which of the lists correspond to entry dates of the weblog post;
segment the weblog into entries based on dates from the determined list as markers for the entries;
extract the weblog post from the weblog entries based on post title markers; and
identify a permalink and an author for the weblog post.
17. The apparatus of claim 16, wherein, to filter the ordered lists based on the set of heuristics, the processor circuitry is to execute the instructions to:
extract lists whose dates all belong to the current year or the past year;
extract non-singleton date lists;
extract lists including dates that conform to a similar format;
extract lists including dates that decrease monotonically;
extract lists with most recent dates and without future data;
extract a list with a longest date string representation;
extract a list with a greatest number of dates; and
extract a first list.
18. The apparatus of claim 16, wherein the processor circuitry is to execute the instructions to segment the weblog into entries based on a set of heuristics to identify an end of the weblog entries.
19. The apparatus of claim 18, wherein the set of heuristics includes at least one of a start of a sidebar on the page of the weblog, a copyright notice on the page of the weblog, a form, or a comment.
20. The apparatus of claim 19, wherein the processor circuitry is to execute the instructions to detect a node in a document object model whose XPath is analogous in structure to an XPath of a last node in a previous weblog entry.
21. The apparatus of claim 19, wherein the processor circuitry is to execute the instructions to:
search nodes of an entry to identify a title node;
determine that titles of subsequent posts have a same relative XPath as the title node identified in searching the nodes of the entry; and
in response to failing to identify a title node, determine that the entry is a single post.
22. The apparatus of claim 21, wherein the processor circuitry is to execute the instructions to identify the title node when a class attribute of the node is at least one of a title, a subtitle, or a blogpost.
23. The apparatus of claim 16, wherein, to identify the permalink and author, the processor circuitry is to execute the instructions to identify hrefs in content of the weblog post that match.
24. The apparatus of claim 16, wherein, to identify the permalink and author, the processor circuitry is to execute the instructions to identify patterns indicative of an author byline.

This application arises from a continuation of U.S. patent application Ser. No. 14/881,071, which was filed on Oct. 12, 2015, which is a continuation of U.S. patent application Ser. No. 11/454,301, which was filed on Jun. 16, 2006, which claims the priority of U.S. Provisional Patent Application Ser. No. 60/691,200, which was filed Jun. 16, 2005. U.S. patent application Ser. No. 14/881,071, U.S. patent application Ser. No. 11/454,301 and U.S. Provisional Patent Application Ser. No. 60/691,200 are incorporated by reference in their entireties.

Weblogging or “blogging” has emerged in the past few years as a new grassroots publishing medium. Like electronic mail and the web itself, weblogging has taken off and by some estimates the number of weblogs is doubling every 6 months. As of June 2006, BlogPulse estimates place the number of active weblogs at nearly 10 million blogs, of which about 36% have had at least one post in the past 3 months. BlogPulse finds approximately 60,000 new weblogs each day. Statistics published by other blog search engines such as Technorati and PubSub are similar. However, these estimates may well be excluding large numbers of non-English language weblogs.

A weblog is commonly defined as a web page with a set of dated entries, in reverse chronological order, maintained by its writer via a weblog publishing software tool. We can define each entry as a set of one or more time-stamped posts; an author may typically post several times a day. This is a matter a style, as some authors post at most once a day in an all-inclusive entry. Others prefer to micro-post, making each published item a separate post in the day's entry.

Due to the popularity of weblogs, there is a need for a method of searching individual posts within weblogs. The present invention addresses this need.

FIG. 1 shows a sample page from a weblog.

The present invention provides a process for segmenting weblogs into posts. Weblogs can facilitate communication and dissemination of content in any environment having two or more workstations in mutual communication. While weblogs are typically hosted by a server connected to the Internet, the concept can include other types of networks, such as local area networks (LANs), wide area networks (WANs), and public data networks, by which client workstations obtain data from a server workstation.

Each workstation may comprise a microcomputer such as a personal computer, for example, including a system bus that is connected to a central processing unit (CPU) and to memory, including read only memory (ROM) and random access memory (RAM). The system bus can be connected, via appropriate interface known to persons skilled in the art, to various input/output devices, including additional nonvolatile data storage devices, video and audio adapters, keyboard, mouse, and other devices that provide input to the workstation or receive output from the workstation. The Workstation can also include a data port for communicating with other constituents of collaborative data processing environment. The data port may be a serial port for linking workstation to a modem or a communications adapter for connecting workstation to a LAN.

Each workstation also typically includes software programs that are stored on the data storage devices or retrieved from other parts of collaborative data processing system and loaded into RAM and then into CPU for execution. Among those programs is a client program that receives messages from, and transmits messages to, other workstations connected to the network.

Web search engines such as Google, Yahoo, and MSN Search index the entire content of a web page typically every few days. However, for weblogs, users want to be able to search over individual posts, and in near real-time. Weblog search portals such as Technorati, Feedster, Pub Sub and BlogPulse have gained in popularity over the past year and a half, as people begin to turn to weblogs to get up-to-the-minute breaking news and to get fresh angles on news stories.

In addition, marketers have awakened to the possibility of mining consumer sentiment from weblogs. In order to produce accurate analytics, it is first necessary to be to identify individual weblog posts. Examples of consumer sentiment analytics are the buzz surrounding a product (number of mentions), number of links to a company website, trends in number of mentions and number of links, and ratio of positive vs. negative mentions. Glance, M. Hurst, K. Nigam, M. Siegler, R. Stockton, and T. Tomokiyo. Analyzing online discussion for marketing intelligence. In Proceedings WWW-2005, Chiba, Japan, 2005 (incorporated herein by reference).

Researchers as well are turning to blogs to gauge opinion and community structure. For example, Adamic and Glance recently analyzed the linking behavior of political bloggers during the 2004 U.S. Presidential Election and found that conservative bloggers link to each other more frequently and in a denser pattern than liberal bloggers. Adamic and N. Glance, The political blogosphere and the 2004 U.S. election: Divided they blog, In Proceedings WWW-2005 2nd Annual Workshop on the Weblogging Ecosystem, Chiba, Japan, 2005 (incorporated herein by reference). Marlow has studied the structure and authority in weblogs using inter-post citation counts. Marlow. Audience, structure and authority in the weblog community, In International Communication Association Conference, New Orleans, La., 2004 (incorporated herein by reference). Adar et. al. have explored how memes thread through the blogsphere from post to post. Adar, L. Zhang, L. A. Adamic, and R. M. Lukose, Implicit structure and the dynamics of blogspace, In Proceedings WWW-2004 Workshop on the Weblogging Ecosystem, New York City, N.Y., 2004 (incorporated herein by reference). The Global Attention Profiles project tracks the attention that bloggers pay to different nations of the world, in comparison with selected mainstream media outlets.

To enable sophisticated analytics over weblogs, a blog search engine typically uses an indexing mechanism that indexes a weblog one post at time, as opposed to one HTML page at a time. In order to index blogs one post at a time, the indexing system should be able to segment the weblog HTML into individual posts and extract meta-data associated with the posts, such as the posting date, title, permalink, and author.

The present invention provides a method for segmenting weblogs into individual posts using a combination of weblog feeds (such as RSS and Atom) and model-based wrapper segmentation. RSS is a family of web feed formats, specified in XML and used for Web syndication. Web feeds provide web content or summaries of web content together with links to the full versions of the content, and other metadata. RSS, in particular, delivers this information as an XML file called an RSS feed, webfeed, RSS stream, or RSS channel. In addition to facilitating syndication, web feeds allow a website's frequent readers to track updates on the site using an aggregator. Atom is the name of a specific web feed format. Web feeds, from a user's perspective, allow Internet users to subscribe to websites that change or add content regularly. Web feeds in general provide web content or summaries of web content together with links to the full versions of the content, and other meta-data in a developer-friendly standardized format Atom, from a technical perspective, is an open standard that includes an XML-based web syndication format used by weblogs, news websites and web mail.

The following definitions are used throughout this description:

Weblog or blog: a weblog is a website where an individual or group of individuals publishes posts periodically. The posts are usually displayed in reverse chronological order. Each post generally consists of: a date, a title, the body of the post, a permalink to the post, an author, and one or more categorizations.

Weblog entry: a post or a set of posts published on a specific day.

Post: item published to weblog at a specific time of day.

Weblog feed/syndication: weblogs may or may not make posts available via syndication using RSS or Atom feeds. Web feeds provide web content or summaries of web content together with links to the full versions of the content, and other metadata. Atom feeds are XML documents. In addition, there are several versions of the RSS standard in use.

Weblog host: a company or website that hosts weblogs for individuals. Examples of popular weblog hosts are: livejournal.com, xanga.com, spaces.msn.com, blogspot.com, and the family of per-country domain typepad hosts.

Weblog software: software that enables creation and publishing of weblog posts to a weblog host, or to a self-hosted weblog. Each weblog host has its own weblog software tool for publishing posts. In addition, there are a number of weblog software tools for publishing a self-hosted weblog, such as Typepad, Moveable Type, and Wordpress.

Weblog ping: A weblog ping is an XML-RPC mechanism that notifies a ping server, such as weblogs.com or blo.gs, that the weblog has changed (e.g., the author has written a new post). Many weblog software tools can be set (or are automatically pre-set) to ping centralized servers whenever the weblog is updated. Example ping servers are http://blogs/ping.php and http://rpc.technorati.com/rpc/pingl. Some ping servers accept “extended pings” that include both the URL and feed URL of the updated weblog.

Crawl: A web crawler (also known as a web spider or web robot) is a program which browses the World Wide Web in a methodical, automated manner. A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

Screen scraping: a technique in which a computer program extracts data from the display output of another program. The program doing the scraping is called a screen scraper. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was nominally intended for human consumption, not machine interpretation. There are a number of synonyms for screen scraping, including: Data scraping, data extraction, web scraping, page scraping, and HTML scraping (the last three being specific to scraping web pages).

Wrapper: a program that performs screen scraping.

“Document Object Model” (DOM): a description of how an HTML or XML document is represented in an object-oriented fashion. DOM provides an application programming interface to access and modify the content, structure and style of the document.

Permalink: a term used in the world of blogging to indicate a URL which points to a specific blog entry.

XPath (XML Path Language): a terse (non-XML) syntax for addressing portions of an XML document.

Here we describe a process for extracting individual posts from a weblog, according to an exemplary embodiment of the present invention. First we describe the typical layout of a weblog.

FIG. 1 shows the home page of a well-known weblog. Notice the extraneous content on the page: header, footer (not displayed) and sidebars (in this example, ads). However, the main content is a sequence of entries ordered in reverse chronological order, with each entry consisting of sequence of posts, also in reverse chronological order.

A weblog can be described formally as follows:

The ordering of the sub-elements for the Entry elements and the Post elements is typically not standardized across weblogs, although it is assumed to be fixed within a weblog.

Also, the model assumes that the entry dates are monotonically decreasing.

Many weblog publishing software tools also publish a feed in association with the weblog. The feed is updated whenever a new item is posted to the weblog. The feed is a “pull” mechanism, as is the weblog page. As a “pull” mechanism, the feed is accessed in order to find out if the weblog has been updated. However, feeds are designed to be read via a feed reader/aggregator (such as Bloglines, NewsGator, etc. or via an extension to a mail reader), which polls the feed on the behalf of the user(s). Thus, the end user who reads feeds via a feed reader experiences weblogs as a “push” phenomena: the newly published weblog posts are pushed to the user's screen.

Some weblog software tools have provided customization of the weblog's feed: the publication of the feed can be turned on or off, the feed can be updated whenever a new item is posted or modified, and the feed can be full content or partial content. Full vs. partial content is an important distinction. We define a full content feed as a feed that publishes the entire content of the post as viewable on the front page of the weblog. We define a partial content feed as a feed that publishes a summary of the post content available via the weblog.

With respect to feed publication, weblog software tools fall into three categories: (1) automatic generation of feeds (partial or full); (2) customized generation of feeds; or (3) no feed generation capability. In the last case, some tech-savvy bloggers will use custom software to create a feed and associate it with their weblog, or turn to a third-party feed generator to host a feed for the weblog (e.g., FeedBurner: http://www.feedburner.com/).

This section describes our approach for segmenting weblogs into posts, according to an exemplary embodiment of the present invention. It would be costly to manually create individual wrappers for each weblog. However, weblogs tend to conform to a common model, as described in Section 3.1 above. Thus, we have focused on developing an approach that generalizes well over the majority of weblogs.

If a full content feed is available for a weblog, then the task of extracting posts from the weblog is the straightforward mapping of the XML format to an internal format. If a partial content feed exists for a weblog, then we use the partial content to guide the extraction process. If no partial content feed exists for a weblog, then we apply a model-based approach to extracting posts from the weblog page, taking advantage of regularities more or less common to most weblogs. Our work on model-based segmentation is similar to that of Nanno et al. Nanno, Automatic collection and monitoring of Japanese weblogs, In Proceedings WWW-2004 Workshop on the Weblogging Ecosystem, New York City, N.Y., 2004 (incorporated herein by reference).

Accordingly, here is an outline of the algorithm used for extracting posts from a weblog, according to an exemplary embodiment of the present invention:

After reaching the home page of the weblog, the first step consists of discovering the feed(s) for the weblog. If the weblog update was collected from a ping server relaying extended pings, and if the accepted ping includes the feed URL for the weblog, then we have located the feed. Alternatively, if the weblog is hosted by a weblog host which publishes full content feeds for its weblogs, then we need only map the weblog URL to the feed URL.

Otherwise, the next step in discovering the feed(s) for a weblog is to use “RSS auto-discovery.” RSS auto-discovery is an agreed-upon standard for specifying the location(s) of a weblog feed(s) as metadata in the HTML for the weblog home page.

If RSS auto-discovery fails to find a set of feeds for the weblog, the next step is to search for links to feeds from body of the weblog. First, all hyperlinks are extracted from the weblog. Next, the set of extracted hyperlinks are filtered using a classifier to identify which one(s) belong to the set of feeds for the weblogs. Currently, we use a set of heuristics to identify the feed(s) for a weblog from the extracted hyperlinks. The following is a non-exclusive list of criteria that can be used to identify the feed:

The multiple XML standards for weblog feeds (several versions of RSS and Atom) all satisfy the following minimal conditions:

Our feed finder considers an item in the feed content to be sufficient if it contains, at minimum, the following fields: date-posted AND (content OR description). If no items in the feed contain sufficient content, the feed is rejected and weblog segmentation falls back upon model-based weblog segmentation (aka screen scraping).

The actual names of the fields depend on the feed standard being used. For example, for RSS v0.91, date-posted maps onto the XPath/item/title; content maps onto the XPath/item/description; and description maps onto the XPath/item/description. (There is no separate content field in the RSS v0.91 specification.)

Typically, the description field is used to provide a summary of the post (usually the first few lines) while the content field is used to provide either the full content of the post or a summary. Some feeds contain both, in which case, typically, the description contains the summary and the content contains the full post.

The feed classifier, which classifies the feed as full content or partial content, takes as input features of the content and description text, such as: presence/absence of HTML tags, % posts ending in ellipses, and type of feed. Based on these features, it uses heuristics decides whether or not the items in the feed are full content. Other features could be added, such as the variance in the length of text, etc.

If the feed is classified as full content, then we map the data found in the feed into our own internal representation for weblog posts, using XML representation of the content of the post+meta-data. Elements in the XML representation include: weblog url, permalink, weblog title, post title, date posted, time posted, and content.

If the feed is not full content, then we create skeletal posts from the data in the feed. For each post, we fill in the following data: weblog url; date-posted; partial content; post title (if found); post author (if found); and permalink (if found).

The next step is to fill in the skeletal posts constructed from the feed by using the content of the weblog page itself. Missing from the skeletal posts is the full content of the post. To find the full content, the partial content is first processed to remove summarization artifacts (e.g., ending ellipsis). Then, we search for the partial content in the weblog. If the partial content is not found, then we will omit that particular post from our segmentation because not enough information can be located to construct the post. If we end up finding insufficient information on all posts, then we will fall back on model-based segmentation.

If the partial content matches text on the weblog home page, then we find the enclosing node for the matching text in the tidied XHTML for the weblog page. The Extensible HyperText Markup Language, or XHTML, is a markup language that has the same expressive possibilities as HTML, but a stricter syntax. The text inside the enclosing node is then used as the content for the post. If enclosing nodes for successive posts overlap, then we throw an error indicating that feed-guided segmentation has failed for the weblog, and, again, fall back on model-based segmentation.

If there are no feeds with sufficient full or partial content, then we attempt to segment the weblog into posts using screen scraping of the weblog. Screen scraping uses a model-based approach to segment the weblog page into posts using textual and HTML elements of the page as markers.

Model-based weblog segmentation assumes that weblogs can be modeled as described in Section 3.1. Our approach then starts from a simplification of that model: (date ([title] content)+)+. This model assumes that dates appear first. This means that if we are able to extract the weblog entry dates, then we can use the dates as markers for the entries. Of course, a weblog page may have many other dates apart from the dates marking the entries: dates in the content of the posts; dates in the sidebars or in other non-weblog content included in the HTML page. However, as weblogs are produced by weblog software, we can expect certain regularities in the underlying DOM of the generated HTML. In particular, we expect that the relative XPaths of the weblog entry dates to be identical. A relative XPath is an Xpath that is defined relative to a location (XML node) in an XML document. In practice we've found that the relative XPaths of the entry dates are identical if we ignore certain elements in the XPath:/align/and repeating/font/s.

The first step in our model-based segmentation algorithm consists of extracting all the dates from the tidied XHTML for the weblog page using a date extractor. The dates are sorted into ordered lists, one list for each unique relative XPath. The order within the list corresponds to the ordering of the dates with the DOM for the weblog page.

We then filter the lists according to a set of heuristics in order to identify which list corresponds to the actual weblog entry dates. The filtering process for the date lists can be performed using the following sequence of steps:

One might think that after step 5 in the filtering process, we would be left with at most one list of dates. In practice, this is frequently not the case, because weblogs often have a sidebar with a dated list of recent posts which corresponds exactly the full set of posts in the main part of the weblog. The last few filtering steps help correctly identify the weblog entry dates as opposed to the dates in the sidebar.

If we fail to find a conforming list of dates, then model-based segmentation fails. There are some known cases where our approach fails: when only one entry appears on the home page of the weblog; or when weblog software for some reason generates irregular XPaths for the dates and/or content. But in many cases, segmentation fails when the HTML page in question is not actually a weblog. Thus, our model-based segmentation algorithm has the additional functionality of serving as a classifier that identifies whether or not an HTML page is indeed a weblog.

Once we have identified the entry dates for the weblog, model-based segmentation proceeds as follows:

In step 1, we assume that all DOM nodes between subsequent entry dates form the weblog entry associated with the earlier date. The main difficulty is identifying the end of the last post. For this we use a set of heuristics to identify the end of the blog entry by looking for the start of boilerplate weblog end template. Example end markers include: the start of a sidebar, a copyright notice, or a form, or a comment. Another heuristic for finding the end of the blog entry is to look for a node in the DOM whose XPath is analogous in structure to the XPath of the last node in the previous weblog entry.

In step 2, we attempt to use post titles to demarcate boundaries between posts for an entry. First, we iterate over the nodes of the entry searching for a node that matches one of our conditions for being a title node. These conditions include: class attribute of the node equals ‘title’ or ‘ subtitle’ or ‘blogpost’, etc. Once we have found the first matching title, we then assume that all subsequent post titles will have the same relative XPath. Again, we assume that all DOM nodes between subsequent title nodes are associated with the earlier title.

If we are unable to find titles, then we treat the entire entry as a single post. In fact, we have found that the majority of bloggers do not post more than once per day.

The final post-processing step identifies the permalink and author from the content of each extracted post using common patterns for permalinks and author signatures. To find authors, we look for patterns like “posted by.” To find permalinks, we look for hrefs (hyperlinks) in the post content that match, for example, “comment” or “archive.” Some patterns are given higher priority than others for matching against permalinks.

A weakness of our current implementation of model-based wrapper segmentation is that it assumes that the date field comes first in a weblog entry. In fact, while most blogs exhibit the pattern date ([title] content)+, others use (title date content)+ or even ([title] content date)+. Our approach is still able to segment blogs exhibiting these less common patterns, although the segmentation associates the date with the incorrect content. That is, if we have a sequence of N posts (post 1 through post N), the date for post 1 will be associated with the content of post 2 and so on. In addition, we will fail to extract the content of post 1. We call this error a parity error.

We have implemented weblog segmentation as part of the BlogPulse weblog post collection, indexing and search system.

In tests of the model-based segmentation algorithm, we have found that the precision of this algorithm is about 90%—that is about 90% of extracted posts have date, title and content fields that correspond to those of actual posts on the weblogs. The recall is approximately 70%—that is, we are able to extract posts from about 70% of true weblogs.

TABLE 1
Segmentation statistics for Apr. 13, 2005
Segmentation method % of weblogs
Full content feed 78%
Feed-guided segmentation 11%
Model-based segmentation 11%

Table 1 shows the statistics for our segmentation process, the percentage of weblogs segmented using: (1) full content feeds (78%); (2) feed-guided segmentation (11%); or (3) model-based segmentation (11%).

We have implemented our segmentation algorithm as part of the weblog post collection subsystem of BlogPulse. This enables BlogPulse to provide search over individual blog posts. Furthermore, the corpus of dated weblog posts serves as a data set for tracking trends over time, and for analyzing how memes spread through the blogosphere.

Having described the invention with reference to embodiments, it is to be understood that the invention is defined by the claims, and it is not intended that any limitations or elements describing the embodiments set forth herein are to be incorporated into the meanings of the claims unless such limitations or elements are explicitly listed in the claims. Likewise, it is to be understood that it is not necessary to meet any or all of the identified advantages or objects of the invention disclosed herein in order to fall within the scope of any claims, since the invention is defined by the claims and since inherent and/or unforeseen advantages of the present invention may exist even though they may not have been explicitly discussed herein.

Glance, Natalie

Patent Priority Assignee Title
Patent Priority Assignee Title
10180986, Jun 16 2005 BuzzMetrics, Ltd. Extracting structured data from weblogs
3950618, Mar 25 1971 System for public opinion research
5041972, Apr 15 1988 MARITZ INC Method of measuring and evaluating consumer response for the development of consumer products
5077785, Jul 13 1990 System for recording comments by patrons of establishments
5124911, Apr 15 1988 IMAGE ENGINEERING, INC Method of evaluating consumer choice through concept testing for the marketing and development of consumer products
5301109, Jun 11 1990 CONTENT ANALYST COMPANY LLC Computerized cross-language document retrieval using latent semantic indexing
5317507, Nov 07 1990 Fair Isaac Corporation Method for document retrieval and for word sense disambiguation using neural networks
5321833, Aug 29 1990 GOOGLE LLC Adaptive ranking system for information retrieval
5371673, Apr 06 1987 INFOTREND, INC Information processing analysis system for sorting and scoring text
5495412, Jul 15 1994 ICAN Systems, Inc. Computer-based method and apparatus for interactive computer-assisted negotiations
5519608, Jun 24 1993 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
5537618, Dec 23 1993 TIMDOLIR LTD , LIMITED LIABILITY COMPANY Method and apparatus for implementing user feedback
5659732, May 17 1995 Google, Inc Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents
5659742, Sep 15 1995 BHW INFO EDCO COM, LLC Method for storing multi-media information in an information retrieval system
5668953, Feb 22 1995 Method and apparatus for handling a complaint
5671333, Apr 07 1994 THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT Training apparatus and method
5675710, Jun 07 1995 Alcatel-Lucent USA Inc Method and apparatus for training a text classifier
5696962, Jun 24 1993 Xerox Corporation Method for computerized information retrieval using shallow linguistic analysis
5754939, Nov 29 1994 Apple Inc System for generation of user profiles for a system for customized electronic identification of desirable objects
5761383, Apr 27 1995 Northrop Grumman Systems Corporation Adaptive filtering neural network classifier
5778363, Dec 30 1996 Intel Corporation Method for measuring thresholded relevance of a document to a specified topic
5794412, Mar 15 1996 CO.ME.SCA. Costruzioni Meccaniche Scarperia S.r.l. Method of folding a portion of dielectric sheet material
5819285, Sep 20 1995 TWINTECH E U II, LIMITED LIABILITY COMPANY Apparatus for capturing, storing and processing co-marketing information associated with a user of an on-line computer service using the world-wide-web.
5822744, Jul 15 1996 Consumer comment reporting apparatus and method
5836771, Dec 02 1996 HANGER SOLUTIONS, LLC Learning method and system based on questioning
5845278, Sep 12 1997 GOOGLE LLC Method for automatically selecting collections to search in full text searches
5857179, Sep 09 1996 HEWLETT-PACKARD DEVELOPMENT COMPANY, L P Computer method and apparatus for clustering documents and automatic generation of cluster keywords
5884302, Dec 02 1996 HANGER SOLUTIONS, LLC System and method to answer a question
5895450, Feb 22 1995 Method and apparatus for handling complaints
5911043, Oct 01 1996 F POSZAT HU, L L C System and method for computer-based rating of information retrieved from a computer network
5920854, Aug 14 1996 GOOGLE LLC Real-time document collection search engine with phrase indexing
5924094, Nov 01 1996 Progress Software Corporation Independent distributed database system
5950172, Jun 07 1996 Secured electronic rating system
5950189, Jan 02 1997 AT&T Corp Retrieval system and method
5953718, Nov 12 1997 Oracle International Corporation Research mode for a knowledge base search and retrieval system
5974412, Sep 24 1997 ROYAL BANK OF CANADA, AS SUCCESSOR COLLATERAL AGENT Intelligent query system for automatically indexing information in a database and automatically categorizing users
5983214, Apr 04 1996 HUDSON BAY MASTER FUND LTD System and method employing individual user content-based data and user collaborative feedback data to evaluate the content of an information entity in a large information communication network
5983216, Sep 12 1997 GOOGLE LLC Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections
6006221, Aug 16 1995 Syracuse University Multilingual document retrieval system and method using semantic vector matching
6012053, Jun 23 1997 RPX Corporation Computer system with user-controlled relevance ranking of search results
6021409, Aug 09 1996 EUREKA DATABASE SOLUTIONS, LLC Method for parsing, indexing and searching world-wide-web pages
6026387, Jul 15 1996 Consumer comment reporting apparatus and method
6026388, Aug 16 1995 Textwise, LLC User interface and other enhancements for natural language information retrieval system and method
6029161, Apr 04 1996 HUDSON BAY MASTER FUND LTD Multi-level mindpool system especially adapted to provide collaborative filter data for a large scale information filtering system
6029195, Nov 29 1994 Pinpoint Incorporated System for customized electronic identification of desirable objects
6032145, Apr 10 1998 International Business Machines Corporation Method and system for database manipulation
6035294, Jul 03 1998 BSG TECH LLC Wide access databases and database systems
6038610, Jul 17 1996 Microsoft Technology Licensing, LLC Storage of sitemaps at server sites for holding information regarding content
6061789, Jan 12 1996 GOOGLE LLC Secure anonymous information exchange in a network
6064980, Mar 17 1998 Amazon Technologies, Inc System and methods for collaborative recommendations
6067539, Mar 02 1998 VIGILTECH LTD Intelligent information retrieval system
6078892, Apr 09 1998 International Business Machines Corporation; IBM Corporation Method for customer lead selection and optimization
6081793, Dec 30 1997 Lenovo PC International Method and system for secure computer moderated voting
6094657, Oct 01 1997 International Business Machines Corporation Apparatus and method for dynamic meta-tagging of compound documents
6098066, Jun 13 1997 Oracle America, Inc Method and apparatus for searching for documents stored within a document directory hierarchy
6112203, Apr 09 1998 R2 SOLUTIONS LLC Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis
6119933, Jul 17 1997 NCR Corporation Method and apparatus for customer loyalty and marketing analysis
6138113, Aug 10 1998 R2 SOLUTIONS LLC Method for identifying near duplicate pages in a hyperlinked database
6138128, Apr 02 1997 Microsoft Technology Licensing, LLC Sharing and organizing world wide web references using distinctive characters
6169986, Jun 15 1998 Amazon Technologies, Inc System and method for refining search queries
6185558, Mar 03 1998 A9 COM, INC Identifying the items most relevant to a current query based on items selected in connection with similar queries
6192360, Jun 23 1998 Microsoft Technology Licensing, LLC Methods and apparatus for classifying text and for building a text classifier
6202068, Jul 02 1998 Database display and search method
6233575, Jun 24 1997 International Business Machines Corporation Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
6236977, Jan 04 1999 Acacia Research Group LLC Computer implemented marketing system
6236980, Apr 09 1998 Magazine, online, and broadcast summary recommendation reporting system to aid in decision making
6236987, Apr 03 1998 PERSPECTA, INC Dynamic content organization in information retrieval systems
6236991, Nov 26 1997 IBM Corporation Method and system for providing access for categorized information from online internet and intranet sources
6260041, Sep 30 1999 NetCurrents, Inc.; INFOLOCITY, INC Apparatus and method of implementing fast internet real-time search technology (first)
6266664, Oct 01 1998 Microsoft Technology Licensing, LLC Method for scanning, analyzing and rating digital information content
6269362, Dec 19 1997 R2 SOLUTIONS LLC System and method for monitoring web pages by comparing generated abstracts
6278990, Jul 25 1997 JUSTSYSTEMS EVANS RESEARCH INC Sort system for text retrieval
6289342, Jan 05 1998 NEC Corporation Autonomous citation indexing and literature browsing using citation context
6304864, Apr 20 1999 Textwise LLC System for retrieving multimedia information from the internet using multiple evolving intelligent agents
6308176, Apr 24 1998 APR SMARTLOGIK LIMITED Associating files of data
6314420, Apr 04 1996 HUDSON BAY MASTER FUND LTD Collaborative/adaptive search engine
6324648, Dec 14 1999 Intellectual Ventures II LLC Secure gateway having user identification and password authentication
6334131, Aug 29 1998 International Business Machines Corporation Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures
6360215, Nov 03 1998 R2 SOLUTIONS LLC Method and apparatus for retrieving documents based on information other than document content
6362837, May 06 1997 GINN, MICHAEL Method and apparatus for simultaneously indicating rating value for the first document and display of second document in response to the selection
6366908, Jun 28 1999 UNILOC 2017 LLC Keyfact-based text retrieval system, keyfact-based text index method, and retrieval method
6377946, Feb 25 1998 Hitachi Ltd Document search method and apparatus and portable medium used therefor
6385586, Jan 28 1999 Nuance Communications, Inc Speech recognition text-based language conversion and text-to-speech in a client-server configuration to enable language translation devices
6393460, Aug 28 1998 International Business Machines Corporation Method and system for informing users of subjects of discussion in on-line chats
6401118, Jun 30 1998 ZeroFOX, Inc Method and computer program product for an online monitoring search engine
6411936, Feb 05 1999 NVal Solutions, Inc.; NVAL SOLUTIONS, INC Enterprise value enhancement system and method
6418433, Jan 28 1999 International Business Machines Corporation System and method for focussed web crawling
6421675, Mar 16 1998 S L I SYSTEMS, INC Search engine
6434549, Dec 13 1999 SABA SOFTWARE, INC Network-based, human-mediated exchange of information
6473794, May 27 1999 Accenture Global Services Limited System for establishing plan to test components of web based framework by displaying pictorial representation and conveying indicia coded components of existing network framework
6493703, May 11 1999 CXT SYSTEMS, INC System and method for implementing intelligent online community message board
6507866, Jul 19 1999 RPX Corporation E-mail usage pattern detection
6510513, Jan 13 1999 Microsoft Technology Licensing, LLC Security services and policy enforcement for electronic data
6513032, Oct 29 1998 Verizon Patent and Licensing Inc Search and navigation system and method using category intersection pre-computation
6519571, May 27 1999 Accenture Global Services Limited Dynamic customer profile management
6519631, Aug 13 1999 Answers Corporation Web-based information retrieval
6526440, Jan 30 2001 GOOGLE LLC Ranking search results by reranking the results based on local inter-connectivity
6536037, May 27 1999 Accenture Global Services Limited Identification of redundancies and omissions among components of a web based architecture
6539375, Aug 04 1998 Microsoft Technology Licensing, LLC Method and system for generating and using a computer user's personal interest profile
6546390, Jun 11 1999 ABUZZ TECHNOLOGIES, INC Method and apparatus for evaluating relevancy of messages to users
6553358, Apr 20 1999 Microsoft Technology Licensing, LLC Decision-theoretic approach to harnessing text classification for guiding automated action
6571234, May 11 1999 CXT SYSTEMS, INC System and method for managing online message board
6571238, Jun 11 1999 ABUZZ TECHNOLOGIES, INC System for regulating flow of information to user by using time dependent function to adjust relevancy threshold
6574614, Jul 15 1996 Consumer feedback apparatus
6584470, Mar 01 2001 Buzzmetrics Ltd Multi-layered semiotic mechanism for answering natural language questions using document retrieval combined with information extraction
6606644, Feb 24 2000 International Business Machines Corporation System and technique for dynamic information gathering and targeted advertising in a web based model using a live information selection and analysis tool
6615166, May 27 1999 Accenture Global Services Limited Prioritizing components of a network framework required for implementation of technology
6622140, Nov 15 2000 Justsystem Corporation Method and apparatus for analyzing affect and emotion in text
6640218, Jun 02 2000 HUDSON BAY MASTER FUND LTD Estimating the usefulness of an item in a collection of information
6651086, Feb 22 2000 OATH INC Systems and methods for matching participants to a conversation
6654813, Aug 17 1998 R2 SOLUTIONS LLC Dynamically categorizing entity information
6658389, Mar 24 2000 Xylon LLC System, method, and business model for speech-interactive information system having business self-promotion, audio coupon and rating features
6662170, Aug 22 2000 International Business Machines Corporation System and method for boosting support vector machines
6678516, May 21 2001 Nokia Technologies Oy Method, system, and apparatus for providing services in a privacy enabled mobile and Ubicom environment
6708215, Jan 16 1998 Wilmington Trust, National Association, as Administrative Agent Method and system for initiating an outbound communication from a service provider responsive to a user activity with respect to a network resource
6721713, May 27 1999 Accenture Global Services Limited Business alliance identification in a web architecture framework
6721734, Apr 18 2000 JUSTSYSTEMS EVANS RESEARCH INC Method and apparatus for information management using fuzzy typing
6751606, Dec 23 1998 Microsoft Technology Licensing, LLC System for enhancing a query interface
6751683, Sep 29 2000 International Business Machines Corporation Method, system and program products for projecting the impact of configuration changes on controllers
6757646, Mar 22 2000 FIVER LLC Extended functionality for an inverse inference engine based web search
6772141, Dec 14 1999 Oracle International Corporation Method and apparatus for organizing and using indexes utilizing a search decision table
6775664, Apr 04 1996 HUDSON BAY MASTER FUND LTD Information filter system and method for integrated content-based and collaborative/adaptive feedback queries
6778975, Mar 05 2001 R2 SOLUTIONS LLC Search engine for selecting targeted messages
6782393, May 31 2000 RICOH CO , LTD Method and system for electronic message composition with relevant documents
6795826, May 25 2000 MANYWORLDS, INC Fuzzy content network management and access
6807566, Aug 16 2000 International Business Machines Corporation Method, article of manufacture and apparatus for processing an electronic message on an electronic message board
6889325, Apr 28 1999 UNICATE B V Transaction method and system for data networks, like internet
6892944, Oct 01 2001 Avante International Technology, Inc Electronic voting apparatus and method for optically scanned ballot
6928526, Dec 20 2002 EMC Corporation Efficient data storage system
6957186, May 27 1999 Accenture Global Services Limited System method and article of manufacture for building, managing, and supporting various components of a system
6978292, Nov 22 1999 Fujitsu Limited Communication support method and system
6983320, May 23 2000 ZeroFOX, Inc System, method and computer program product for analyzing e-commerce competition of an entity by utilizing predetermined entity-specific metrics and analyzed statistics from web pages
6999914, Sep 28 2000 Manning And Napier Information Services LLC Device and method of determining emotive index corresponding to a message
7039621, Mar 22 2000 JOHNSON & JOHNSON PHARMACEUTICAL RESEARCH AND DEVELOPMENT, L L C System, method, and computer program product for representing object relationships in a multidimensional space
7043760, Oct 11 2000 HOLTZMAN, DAVID M System and method for establishing and managing relationships between pseudonymous identifications and memberships in organizations
7117187, Nov 04 1996 JOHNSON & JOHNSON PHARMACEUTICAL RESEARCH AND DEVELOPMENT, L L C Method, system and computer program product for non-linear mapping of multi-dimensional data
7117368, Jan 21 2000 NEC Corporation Anonymous participation authority management system
7139739, Apr 03 2000 JOHNSON & JOHNSON PHARMACEUTICAL RESEARCH AND DEVELOPMENT, L L C Method, system, and computer program product for representing object relationships in a multidimensional space
7146416, Sep 01 2000 ENERGETIC POWER INVESTMENT LIMITED Web site activity monitoring system with tracking by categories and terms
7149698, May 27 1999 Accenture Global Services Limited Business alliance identification in a web architecture Framework
7165041, May 27 1999 Accenture Global Services Limited Web-based architecture sales tool
7185008, Mar 01 2002 HEWLETT-PACKARD DEVELOPMENT COMPANY L P Document classification method and apparatus
7185065, Oct 11 2000 Buzzmetrics Ltd System and method for scoring electronic messages
7188078, Oct 11 2000 BUZZMETRICS, LTD System and method for collection and analysis of electronic discussion messages
7188079, Oct 11 2000 BUZZMETRICS, LTD System and method for collection and analysis of electronic discussion messages
7197470, Oct 11 2000 BUZZMETRICS, LTD System and method for collection analysis of electronic discussion methods
7214298, Sep 23 1997 California Institute of Technology Microfabricated cell sorter
7277919, Mar 19 1999 International Business Machines Corporation Relevance clause for computed relevance messaging
7292723, Feb 26 2003 Inventor Holdings, LLC System for image analysis in a network that is structured with multiple layers and differentially weighted neurons
7315826, May 27 1999 Accenture Global Services Limited Comparatively analyzing vendors of components required for a web-based architecture
7351376, Jun 05 2000 California Institute of Technology Integrated active flux microfluidic devices and methods
7363243, Oct 11 2000 Buzzmetrics Ltd System and method for predicting external events from electronic posting activity
7401025, Feb 15 2000 THE ESTATE OF ELLIOTT LOKITZ; LOKITZ, SHEILA Accessible service provider clearinghouse
7422150, Nov 20 2000 Avante International Technology, Inc Electronic voting apparatus, system and method
7431209, Nov 20 2000 Avante International Technology, Inc Electronic voting apparatus, system and method
7464003, Aug 24 2006 MELTWATER NEWS US, INC System and method for change detection of information or type of data
7536389, Feb 22 2005 R2 SOLUTIONS LLC Techniques for crawling dynamic web content
7600017, Oct 11 2000 BuzzMetrics, Ltd. System and method for scoring electronic messages
7818659, Apr 13 2005 Apple Inc. News feed viewer
7844483, Oct 11 2000 BuzzMetrics, Ltd. System and method for predicting external events from electronic author activity
7844484, Oct 11 2000 BuzzMetrics, Ltd. System and method for benchmarking electronic message activity
7865511, Jun 25 2004 Apple Inc News feed browser
9158855, Jun 16 2005 BUZZMETRICS, LTD Extracting structured data from weblogs
20010011351,
20010018858,
20010020228,
20010034708,
20010042087,
20020010691,
20020019764,
20020032772,
20020059258,
20020087515,
20020103801,
20020123988,
20020133481,
20020159642,
20020188586,
20030034393,
20030046144,
20030062411,
20030070338,
20030088532,
20030094489,
20030173404,
20040024752,
20040059708,
20040059729,
20040078432,
20040111412,
20040122811,
20040181675,
20040199498,
20040205482,
20040210561,
20050049908,
20050060340,
20050114161,
20050125216,
20050154686,
20060004691,
20060015737,
20060041605,
20060069589,
20060085248,
20060155999,
20060173819,
20060173837,
20060173985,
20060184629,
20060184630,
20060184631,
20060184674,
20060184678,
20060184679,
20060206505,
20060230021,
20070027840,
20070143853,
20070208614,
20070282621,
20080059791,
20080262920,
EP1052582,
WO17824,
WO197070,
//////////////////////////////////////////////////////////////////////////
Executed onAssignorAssigneeConveyanceFrameReelDoc
Oct 06 2006GLANCE, NATALIEBUZZMETRICS, LTDASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0532680851 pdf
Jan 14 2019BuzzMetrics, Ltd.(assignment on the face of the patent)
Jun 04 2020NIELSEN AUDIO, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NETRATINGS, LLCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020GRACENOTE MEDIA SERVICES, LLCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020GRACENOTE DIGITAL VENTURES, LLCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020GRACENOTE, INCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020EXELATE, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020CZT ACN TRADEMARKS, L L C CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020ATHENIAN LEASING CORPORATIONCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020ART HOLDING, L L C CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020AFFINNOVA, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020ACNIELSEN ERATINGS COMCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020ACNIELSEN CORPORATIONCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020ACN HOLDINGS INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NIELSEN CONSUMER INSIGHTS, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NIELSEN CONSUMER NEUROSCIENCE, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020VIZU CORPORATIONCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020VNU INTERNATIONAL B V CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020THE NIELSEN COMPANY B V CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NIELSEN HOLDING AND FINANCE B V CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NMR LICENSING ASSOCIATES, L P CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020VNU MARKETING INFORMATION, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020THE NIELSEN COMPANY US , LLCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020TNC US HOLDINGS, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020TCG DIVESTITURE INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NMR INVESTING I, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NIELSEN MOBILE, LLCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NIELSEN INTERNATIONAL HOLDINGS, INC CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020NIELSEN FINANCE CO CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020A C NIELSEN COMPANY, LLCCITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020A C NIELSEN ARGENTINA S A CITIBANK, N A CORRECTIVE ASSIGNMENT TO CORRECT THE PATENTS LISTED ON SCHEDULE 1 RECORDED ON 6-9-2020 PREVIOUSLY RECORDED ON REEL 053473 FRAME 0001 ASSIGNOR S HEREBY CONFIRMS THE SUPPLEMENTAL IP SECURITY AGREEMENT 0540660064 pdf
Jun 04 2020A C NIELSEN COMPANY, LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN AUDIO, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NETRATINGS, LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020GRACENOTE MEDIA SERVICES, LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020GRACENOTE DIGITAL VENTURES, LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020GRACENOTE, INCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020EXELATE, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020CZT ACN TRADEMARKS, L L C CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020ATHENIAN LEASING CORPORATIONCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020ART HOLDING, L L C CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020AFFINNOVA, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020ACNIELSEN ERATINGS COMCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020ACNIELSEN CORPORATIONCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020ACN HOLDINGS INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN CONSUMER INSIGHTS, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN CONSUMER NEUROSCIENCE, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020VNU INTERNATIONAL B V CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020THE NIELSEN COMPANY B V CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN HOLDING AND FINANCE B V CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NMR LICENSING ASSOCIATES, L P CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020VNU MARKETING INFORMATION, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020VIZU CORPORATIONCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020THE NIELSEN COMPANY US , LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020TNC US HOLDINGS, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN FINANCE CO CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN INTERNATIONAL HOLDINGS, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN MOBILE, LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NIELSEN UK FINANCE I, LLCCITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020NMR INVESTING I, INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Jun 04 2020TCG DIVESTITURE INC CITIBANK, N A SUPPLEMENTAL SECURITY AGREEMENT0534730001 pdf
Oct 11 2022CITIBANK, N A NETRATINGS, LLCRELEASE REEL 054066 FRAME 0064 0636050001 pdf
Oct 11 2022CITIBANK, N A THE NIELSEN COMPANY US , LLCRELEASE REEL 054066 FRAME 0064 0636050001 pdf
Oct 11 2022CITIBANK, N A GRACENOTE MEDIA SERVICES, LLCRELEASE REEL 054066 FRAME 0064 0636050001 pdf
Oct 11 2022CITIBANK, N A GRACENOTE, INCRELEASE REEL 054066 FRAME 0064 0636050001 pdf
Oct 11 2022CITIBANK, N A EXELATE, INC RELEASE REEL 054066 FRAME 0064 0636050001 pdf
Oct 11 2022CITIBANK, N A A C NIELSEN COMPANY, LLCRELEASE REEL 054066 FRAME 0064 0636050001 pdf
Oct 11 2022CITIBANK, N A NETRATINGS, LLCRELEASE REEL 053473 FRAME 0001 0636030001 pdf
Oct 11 2022CITIBANK, N A THE NIELSEN COMPANY US , LLCRELEASE REEL 053473 FRAME 0001 0636030001 pdf
Oct 11 2022CITIBANK, N A GRACENOTE, INCRELEASE REEL 053473 FRAME 0001 0636030001 pdf
Oct 11 2022CITIBANK, N A EXELATE, INC RELEASE REEL 053473 FRAME 0001 0636030001 pdf
Oct 11 2022CITIBANK, N A A C NIELSEN COMPANY, LLCRELEASE REEL 053473 FRAME 0001 0636030001 pdf
Oct 11 2022CITIBANK, N A GRACENOTE MEDIA SERVICES, LLCRELEASE REEL 053473 FRAME 0001 0636030001 pdf
Date Maintenance Fee Events
Jan 14 2019BIG: Entity status set to Undiscounted (note the period is included in the code).


Date Maintenance Schedule
Jan 17 20264 years fee payment window open
Jul 17 20266 months grace period start (w surcharge)
Jan 17 2027patent expiry (for year 4)
Jan 17 20292 years to revive unintentionally abandoned end. (for year 4)
Jan 17 20308 years fee payment window open
Jul 17 20306 months grace period start (w surcharge)
Jan 17 2031patent expiry (for year 8)
Jan 17 20332 years to revive unintentionally abandoned end. (for year 8)
Jan 17 203412 years fee payment window open
Jul 17 20346 months grace period start (w surcharge)
Jan 17 2035patent expiry (for year 12)
Jan 17 20372 years to revive unintentionally abandoned end. (for year 12)