A pattern matching system, method and computer program product are provided. In use, a plurality of components of data is received, such that each component of the data is compared against a plurality of patterns. To this end, more than one match between the components of the data and the patterns can be identified.
|
1. A method, comprising:
receiving a plurality of components of data;
comparing each component of the data against a plurality of patterns, utilizing a micro-processor, wherein the patterns are representative of intrusion attacks;
establishing a first list including patterns that are potentially matched against the components of the data, wherein the patterns are determined to be potentially matched upon a first identification of a match between a portion of the plurality of components of the data and a portion of a plurality of portions of the patterns; and
identifying more than one complete match between the components of the data and each of the plurality of portions of the patterns in the first list;
wherein in response to the identification of a complete match between the plurality of components of the data and one or more of the patterns in the first list, each pattern on the first list for which a complete match is identified is moved from the first list to a second list;
wherein each component of the data is compared against each pattern of the plurality of patterns only once;
wherein one or more of the patterns are determined to be completely matched to the plurality of components of data upon a match between all of the plurality of components of the data and each of the plurality of portions of the pattern.
18. A system, comprising:
an intrusion detection system including a micro-processor coupled to a memory for comparing each component of data against a plurality of patterns, wherein the patterns are representative of intrusion attacks;
wherein the system is operable such that a first list is established including patterns that are potentially matched against the components of the data, wherein the patterns are determined to be potentially matched upon a first identification of a match between a portion of the plurality of components of the data and a portion of a plurality of portions of the patterns; and
wherein the system is operable such that more than one complete match is identified between the components of the data and each of the plurality of portions of the patterns in the first list;
wherein the system is operable such that in response to the identification of a complete match between the plurality of components of the data and one or more of the patterns in the first list, each pattern on the first list for which a complete match is identified is moved from the first list to a second list;
wherein each component of the data is compared against each pattern of the plurality of patterns only once;
wherein the system is operable such that one or more of the patterns are determined to be completely matched to the plurality of components of data upon a match between all of the plurality of components of the data and each of the plurality of portions of the pattern.
17. A computer program product embodied on a non-transitory computer readable medium, comprising:
computer code for receiving a plurality of components of data;
computer code for comparing, utilizing a hardware processor, the components of the data against a plurality of patterns, wherein the patterns are representative of intrusion attacks;
computer code for establishing a first list including patterns that are potentially matched against the components of the data, wherein the patterns are determined to be potentially matched upon a first identification of a match between a portion of the plurality of components of the data and a portion of a plurality of portions of the patterns; and
computer code for identifying more than one complete match between the components of the data and each of the plurality of portions of the patterns in the first list;
wherein the computer program product is operable such that in response to the identification of a complete match between the plurality of components of the data and one or more of the patterns in the first list, each pattern on the first list for which a complete match is identified is moved from the first list to a second list;
wherein each component of the data is compared against each pattern of the plurality of patterns only once;
wherein the computer program product is operable such that one or more of the patterns are determined to be completely matched to the plurality of components of data upon a match between all of the plurality of components of the data and each of the plurality of portions of the pattern.
2. The method of
4. The method of
5. The method of
9. The method of
11. The method of
12. The method of
13. The method of
14. The method of
15. The method of
16. The method of
|
The present invention relates to pattern matching, and more particularly to increasing the efficiency of pattern matching.
Pattern matching is utilized by a wide variety of applications in both security and non-security-related environments. In the realm of security-related applications, pattern matching has been used to combat intrusion attacks. Intrusion attacks on computer networks are a major problem in today's networked computing environment. An intrusion attack occurs when an intruder either breaches a network and/or computer, or at least temporarily has an unwanted influence on it.
A variety of intrusion detection systems (IDSs) have been developed to detect and moreover prevent intrusion attacks. In order to detect intrusion attacks, IDSs typically include an intrusion scanning engine with one or more files known as attack signature files, which contain patterns pertaining to known types of intrusion attacks. Using such attack signature files, IDSs examine packets that pass on a network and attempt to identify the various patterns of known attacks. When an IDS detects characteristics of a known intrusion attack, a system administrator is typically notified along with any other desired response.
IDSs typically require near real-time testing for the presence of thousands of patterns in network packets. Sequential scanning of each network packet for pattern matches is far too slow for achieving desired throughput. Thus, state of the art IDSs either use hardware accelerated pattern matching devices, a costly solution, or utilize string search algorithms, such as Boyer-Moore.
In any case, to further maintain the desired throughput, traditional IDSs utilize pattern matching systems which stop after having detected a first pattern match. When stopped, the scanning is terminated and a desired response to such first pattern match is carried out, without scanning for any remaining untested patterns. Unfortunately, this early termination of the scanning results, in some situations, in fewer than all possible matches (and associated response, etc.).
There is thus a need for overcoming these and/or other problems associated with the prior art.
A pattern matching system, method and computer program product are provided. In use, a plurality of components of data is received, such that each component of the data is compared against a plurality of patterns. To this end, more than one match between the components of the data and the patterns can be identified.
Coupled to the networks 102 are data server computers 104 which are capable of communicating over the networks 102. Also coupled to the networks 102 and the data server computers 104 is a plurality of end user computers 106. Such data server computers 104 and/or client computers 106 may each include a desktop computer, lap-top computer, hand-held computer, mobile phone, hand-held computer, peripheral (e.g. printer, etc.), any component of a computer, and/or any other type of logic. In order to facilitate communication among the networks 102, at least one gateway or router 108 is optionally coupled therebetween.
It should be noted that any of the foregoing network devices in the present network architecture 100, as well as any other unillustrated hardware and/or software, may be equipped with various pattern matching features. For example, the various data server computers 104 and/or end user computers 106 may be equipped with a pattern matching technique for comparing each component of data against a plurality of patterns, such that more than one match is identified between the components of the data and the patterns. More information regarding optional functionality and optional architectural components associated with such feature will now be set forth for illustrative purposes.
The workstation shown in
The workstation may have resident thereon any desired operating system. It will be appreciated that an embodiment may also be implemented on platforms and operating systems other than those mentioned. One embodiment may be written using JAVA, C, and/or C++ language, or other programming languages, along with an object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications.
Of course, the various embodiments set forth herein may be implemented utilizing hardware, software, or any desired combination thereof. For that matter, any type of logic may be utilized which is capable of implementing the various functionality set forth herein.
As shown, a plurality of components of data is received in operation 302. In the context of the present description, the data may include any computer readable information and the components may include any character, word, string, number of bits (e.g. byte), and/or any other portion of the data. Still yet, the data may be received via a network, from another local sub-system, and/or in any other desired manner.
Next, in operation 304, each component of the data is compared against a plurality of patterns. Each pattern may include one or more components, which each include any predetermined character, word, string, token, key, number of bits (e.g. byte), and/or any other portion of potentially received data that may exist, and/or further be of interest. In one embodiment, such pattern may be representative of unwanted activity or even an intrusion attack (e.g. network intrusion, computer intrusion, etc.) in the context of a security system. Of course, it is also contemplated that the patterns may represent non-security-related activity.
To this end, the method 300 may operate such that more than one match is identified between the components of the data and the patterns. Note operation 306. For example, in one embodiment, the aforementioned comparison of operation 304 may be continued subsequent to or in parallel with a first pattern match, so that multiple matches are identified. In another embodiment, the method 300 may even continue until substantially all possible pattern matches have been identified and reported.
By this design, the present method 300 is capable of providing a more comprehensive set of pattern matching results. Further, in an optional embodiment where different pattern matches prompt a different response (e.g. see Table 1 below, for example), the present method 300 may optionally ensure that each of the different responses (i.e. a complete set) are carried out, by not stopping at a single pattern match and associated response.
TABLE 1
Pattern match_1
Response_1
Pattern match_2
Response_2
Pattern match_3
Response_3
In the context of the aforementioned security embodiment, the different pattern matches may each represent different types of intrusions (e.g. system compromise, distributed denial of service attack, Trojan, zombie, worm, etc.), and the responses (e.g. disconnect network, alert administrator, block port, etc.) may be tailored thereto. Again, however, it should be noted that it is also contemplated that such technique may be equally applicable to non-security-related environments.
In one optional embodiment, a particular data structure of patterns and associated technique may be provided for improving an overall efficiency of the pattern matching comparison algorithm, thus further making it feasible to identify more than one (and even substantially all) pattern matches. More illustrative information will now be set forth regarding such optional features with which the foregoing method 300 may or may not be implemented, per the desires of the user. It should be strongly noted that the following information is set forth for illustrative purposes and should not be construed as limiting in any manner. Any of the following features may be optionally incorporated with or without the exclusion of other features described.
As shown, the data structure 400 includes a plurality of indices 402. Further, each index 402 correlates with a list of stored patterns 406, 408. As an option, a character associated with each index 402 is the same as a first character of each of the patterns 406, 408 in the corresponding list. Table 2 illustrates an example of a couple of indices and associated lists of patterns. Of course, such table is set forth for illustrative purposes and should not be construed as limiting in any manner whatsoever.
TABLE 2
Index
Patterns
O
OM
OU
U
UY
UT
UXE
UTER
In use, the data structure 400 may optionally be used when comparing the patterns against data components (e.g. see, for example, operation 304 of
In one embodiment, the patterns in the list may optionally be prioritized based on various factors such as a length of the patterns. For example, patterns of shorter length may be listed prior to patterns of longer length. As will soon become apparent, the data components may be compared against the patterns of specific lists only, for efficiency purposes.
As shown, a variable (i) is initialized as zero in operation 502. For reasons that will soon become apparent, it is then determined whether such variable is less than a length of a buffer containing a plurality of components of data. See decision 504. In use, such buffer may be utilized for storing incoming data received via a network or via any other framework.
If, at any time, it is found that the variable is not less than the length of such buffer containing the data components, such provides an indication that all of the components of data in the buffer have been tested, after which the present method 500 may be terminated. Prior to such termination, the following operations are repeated for each component of data, for the purpose of comparing the same against the appropriate patterns.
This is first accomplished by identifying an appropriate list of patterns to test against the present component of data. This is carried out by utilizing the component of data itself as an index into the data structure 400 of
Next, in operation 508, each pattern in the current list is compared with respect to the current data component. In particular, the comparison of operation 508 involves both a first and last component of each pattern in the list identified in operation 506. Further, since the current data component was used to look up the appropriate list of patterns (and thus the first component of each pattern inherently represents a match), the only pattern component that need be compared in operation 508 is the last component. Of course, another component of the pattern (other than the first and last) may be utilized in operation 508.
Thus, in the context of the previous example, it would be found that, while of course the first component of each pattern (“U”) matches the current data component, only the last component of the patterns “UT,” “UXE,” and “UTER” matches the corresponding data component of the string “COMPUTER.” To this end, all of the patterns in the current list, except for the pattern “UY,” would be eligible to be added to the first pending list.
It should be noted, however, that in cases where a pattern consists of only two components, it is not necessary to add such pattern to the first pending list, since the foregoing comparison would establish the same as a complete match or not. Thus, in the context of the present example, the patterns “UXE” and “UTER” would be added to the first pending list, and only the pattern “UT” would be added to a second result list, including only complete matches.
To this end, a first pending list may be established by including patterns that are at least potentially matched against the component of the data (i.e. where only a portion of the pattern is matched, etc.). As an option, the first pending list may take the form of a linked list. Further, as mentioned earlier, the patterns in the list may optionally be prioritized based on various factors, such as a length of the patterns, etc. To this end, as is now apparent, the data components may be compared against the patterns of each list based on such prioritization, for efficiency purposes.
Next, in operation 510, all of patterns in the first pending list may be tested against the current component to determine eligibility to either remain in the first pending list, be removed from the first pending list, or be added to the second result list.
As is now readily apparent, various previously tested patterns would have been added to the first pending list, since at least a portion of the components in such patterns have already been matched. In order to ensure that the appropriate character of a previously stored pattern in the first pending list is compared against the current data component, an index or offset value may be tracked with respect to each pattern in the first pending list. Such index or offset may be altered (e.g. incremented by one, etc.) after each iteration of the method 500 to track a current location in each of the patterns in the first pending list.
Thus, in one possible scenario, a pattern in the first pending list may be included in the second result list if it is determined, based on a match of the current data component, that the pattern is completely matched against the components of the data (i.e. all components of the data and pattern match). In other words, if a last component of a pattern is matched in operation 510, it is stored in the second result list. Of course, since the final component of each pattern would have already been determined to be a match in order for such pattern to be included in the first pending list (note operation 508 above), the aforementioned last component may actually refer to a second-to-final component.
In another possible scenario, a pattern in the first pending list may be removed from the first pending list upon violating a rule. For example, if it is determined that the current data component does not match the current corresponding component of the associated pattern, it may be concluded that such pattern is incapable of a complete match. Thus, such pattern may be removed from the first pending list to avoid unnecessary processing in subsequent iterations of method 500.
In still yet another possible scenario, a pattern in the first pending list may be maintained in the first pending list if it can not be confirmed that the pattern is completely matched against the components of the data. For example, if the current data component is successfully compared to the corresponding component of the pattern (where such pattern component is not the last), the pattern may be maintained in the first pending list for further processing.
Finally, the variable (i) is incremented in operation 512 to afford the appropriate number of iterations of operations 504-510, as set forth above.
Table 3 illustrates an example of the various scenarios associated with operation 510 in the specific context of the abovementioned example. Of course, such table is set forth for illustrative purposes and should not be construed as limiting in any manner whatsoever.
TABLE 3
Buffer: COMPUTER
Iteration #1 (“C”)
First pending list
N/A
Second result list
N/A
Iteration #2 (“O”)
First pending list
Second result list
OM
Iteration #3 (“M”)
First pending list
Second result list
OM
Iteration #4 (“P”)
First pending list
Second result list
OM
Iteration #5 (“U”)
First pending list
UXE (index = 1)
UTER (index = 1)
Second result list
OM
UT
Iteration #6 (“T”)
First pending list
UTER (index = 2)
Second result list
OM
UT
Iteration #7 (“E”)
First pending list
UTER (index = 3)
Second result list
OM
UT
Iteration #8 (“R”)
First pending list
Second result list
OM
UT
UTER
By this design, each component of the data is compared with a particular pattern only once, thus allowing for more efficient operation.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. For example, any of the network elements may employ any of the desired functionality set forth hereinabove. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Nedbal, Manuel, Mayr, Johannes, Steiner, Thomas C. H.
Patent | Priority | Assignee | Title |
10320812, | Mar 01 2012 | TREND MICRO INCORPORATED | Methods and systems for full pattern matching in hardware |
8239944, | Mar 28 2008 | CA, INC | Reducing malware signature set size through server-side processing |
8312537, | Mar 28 2008 | CA, INC | Reputation based identification of false positive malware detections |
8458796, | Mar 08 2011 | TREND MICRO INCORPORATED | Methods and systems for full pattern matching in hardware |
9602522, | Mar 08 2011 | TREND MICRO INCORPORATED | Methods and systems for full pattern matching in hardware |
Patent | Priority | Assignee | Title |
6725377, | Mar 12 1999 | JPMORGAN CHASE BANK, N A ; MORGAN STANLEY SENIOR FUNDING, INC | Method and system for updating anti-intrusion software |
6789202, | Oct 15 1999 | Musarubra US LLC | Method and apparatus for providing a policy-driven intrusion detection system |
6851061, | Feb 16 2000 | JPMORGAN CHASE BANK, N A ; MORGAN STANLEY SENIOR FUNDING, INC | System and method for intrusion detection data collection using a network protocol stack multiplexor |
7110540, | Apr 25 2002 | Intel Corporation | Multi-pass hierarchical pattern matching |
7134143, | Feb 04 2003 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Method and apparatus for data packet pattern matching |
7290282, | Apr 08 2002 | NORTONLIFELOCK INC | Reducing false positive computer virus detections |
20030229710, | |||
20040083384, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 13 2005 | NEDBAL, MANUEL | McAfee, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017117 | /0743 | |
Oct 13 2005 | STEINER, THOMAS C H | McAfee, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017117 | /0743 | |
Oct 13 2005 | MAYR, JOHANNES | McAfee, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017117 | /0743 | |
Oct 18 2005 | McAfee, Inc. | (assignment on the face of the patent) | / | |||
Dec 20 2016 | McAfee, Inc | McAfee, LLC | CHANGE OF NAME AND ENTITY CONVERSION | 043665 | /0918 | |
Sep 29 2017 | McAfee, LLC | JPMORGAN CHASE BANK, N A | CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE PATENT 6336186 PREVIOUSLY RECORDED ON REEL 045055 FRAME 786 ASSIGNOR S HEREBY CONFIRMS THE SECURITY INTEREST | 055854 | /0047 | |
Sep 29 2017 | McAfee, LLC | MORGAN STANLEY SENIOR FUNDING, INC | CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE PATENT 6336186 PREVIOUSLY RECORDED ON REEL 045056 FRAME 0676 ASSIGNOR S HEREBY CONFIRMS THE SECURITY INTEREST | 054206 | /0593 | |
Sep 29 2017 | McAfee, LLC | MORGAN STANLEY SENIOR FUNDING, INC | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 045056 | /0676 | |
Sep 29 2017 | McAfee, LLC | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 045055 | /0786 | |
Oct 26 2020 | JPMORGAN CHASE BANK, N A , AS COLLATERAL AGENT | McAfee, LLC | RELEASE OF INTELLECTUAL PROPERTY COLLATERAL - REEL FRAME 045055 0786 | 054238 | /0001 | |
Mar 01 2022 | McAfee, LLC | JPMORGAN CHASE BANK, N A , AS ADMINISTRATIVE AGENT | CORRECTIVE ASSIGNMENT TO CORRECT THE THE PATENT TITLES AND REMOVE DUPLICATES IN THE SCHEDULE PREVIOUSLY RECORDED AT REEL: 059354 FRAME: 0335 ASSIGNOR S HEREBY CONFIRMS THE ASSIGNMENT | 060792 | /0307 | |
Mar 01 2022 | McAfee, LLC | JPMORGAN CHASE BANK, N A , AS ADMINISTRATIVE AGENT AND COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 059354 | /0335 | |
Mar 01 2022 | MORGAN STANLEY SENIOR FUNDING, INC , AS COLLATERAL AGENT | McAfee, LLC | RELEASE OF INTELLECTUAL PROPERTY COLLATERAL - REEL FRAME 045056 0676 | 059354 | /0213 |
Date | Maintenance Fee Events |
Mar 07 2011 | ASPN: Payor Number Assigned. |
Aug 20 2014 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Feb 25 2015 | RMPN: Payer Number De-assigned. |
Feb 26 2015 | ASPN: Payor Number Assigned. |
Sep 04 2018 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Oct 17 2022 | REM: Maintenance Fee Reminder Mailed. |
Apr 03 2023 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Mar 01 2014 | 4 years fee payment window open |
Sep 01 2014 | 6 months grace period start (w surcharge) |
Mar 01 2015 | patent expiry (for year 4) |
Mar 01 2017 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 01 2018 | 8 years fee payment window open |
Sep 01 2018 | 6 months grace period start (w surcharge) |
Mar 01 2019 | patent expiry (for year 8) |
Mar 01 2021 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 01 2022 | 12 years fee payment window open |
Sep 01 2022 | 6 months grace period start (w surcharge) |
Mar 01 2023 | patent expiry (for year 12) |
Mar 01 2025 | 2 years to revive unintentionally abandoned end. (for year 12) |