IPMES+ is a system that performs incremental pattern matching over audit event streams (provenance graph). It is the successor of the original IPMES.
The core concept of the original IPMES involves decomposing a target behavioral pattern into multiple total-ordered subpatterns (Preprocessing), matching and reordering events (Matching Layer), composing events against these subpatterns (Composition Layer), and then combining subpattern matches into complete instances (Join Layer). IPMES+ retains a similar architecture with key differences:
- Integrate frequency and flow semantics by extending event pattern types and merging the Matching Layer into the Composition Layer for efficient support.
- Enhancing event matching and state management through Shared Entity Filtration, Flow Contraction, and Sibling Entity Sharing Enforcement to reduce search space and state explosion.
- Port the prototype from Java to Rust for better memory control and locality.
The simplified flow chart of IPMES+ is illustrated in the below figure.
The only build dependencies of IPMES+ is rust compiler and cargo, which can be install with the following commands on linux:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
# vefify installation
rustc -VGet the latest version of IPMES+ and build it:
git clone https://github.com/XYFC128/IPMES_PLUS.git
cd IPMES_PLUS/
cargo build --releaseThe output binary is located in target/release/ipmes-rust. The first build will take longer due to downloading the dependencies.
IPMES implemented in rust
Usage: ipmes-rust [OPTIONS] <PATTERN_FILE> <DATA_GRAPH>
Arguments:
<PATTERN_FILE> The path to the pattern file in json format, e.g. data/universal_patterns/SP12.json
<DATA_GRAPH> The path to the preprocessed data graph (provenance graph) in csv format
Options:
-w, --window-size <WINDOW_SIZE> Window size (sec) [default: 1800]
-s, --silent Enable silent mode will not print individual pattern matches
-h, --help Print help
-V, --version Print version
./target/release/ipmes-rust -w 1800 data/paper/behavioral_pattern.json data/paper/data_graph.csv-w 1800: Set the time window size to be1800seconds.data/paper/behavioral_pattern.json: An example pattern used in our paper. See the section below for more information.data/paper/data_graph.csv: Input data graph to search for pattern. See the section below for its format.
The program output for the above example is shown below:
Pattern Match: <5.000, 11.000>[(1 -> 3), (3, 5), (4, 6)]
Total number of matches: 1
CPU time elapsed: 0.000108047 secs
Peak memory usage: 8604 kB
The output message contains:
- Zero or multiple lines of Pattern Match: Each entry of
Pattern Matchdenotes a matched instance of the pattern such that they are in the following format:<StartTime, EndTime>[MatchEventList...], where- StartTime: The timestamp of the earliest event of this match instance.
- EndTime: The timestamp of the latest event of this match instance.
- MatchEventList: A comma seperated list of the matched input events, whose index in this array corresponds to the pattern event they are matched to.
- If the corresponding pattern event is a flow event, it will be in the format
(StartEntityID -> EndEntityID). In this example, the pattern event 0 is a flow pattern, and IPMES+ found the flow from entity 1 to entity 3 that matches the pattern event. - If the corresponding pattern is a frequency event, the match id format is
(EventID, ...). The numbers in the parentheses is a list of matched input event IDs for that frequency pattern event. In this example, the pattern event 1 is a frequency event, and input event 3 and 5 both match that frequency event. - If the corresponding pattern is a normal regex pattern, the match id is simply the ID of the matched input event.
- If the corresponding pattern event is a flow event, it will be in the format
- Total number of matches: The number of matched instances of the pattern on the data graph.
- CPU time elapsed: The CPU time spent for pattern matching.
- Peak memory usage: The maximum total system memory usage in kilobytes.
IPMES+ takes 2 files as input: The pattern description file and the data graph file. IPMES+ will search for pattern in the data graph.
The data graph is a streaming provenance graph where each line is an audit event. We use normal file in our experiments, but it can also be an UNIX PIPE to support ture streaming input. An event in provenance graph is associated with 2 entities: the subject that initiates the event and the object affected by the event.
An event is given by a CSV format in each line of the input data graph. The columns in this CSV format are: [start_time, end_time, event_id, event_sig, subject_id, subject_sig, object_id, object_sig], which represents:
start_time: the event start time in seconds. e.g.5.000end_time: the event end time in seconds. e.g.10.000event_id: the numerical event idevent_sig: the event signature stringsubject_id: the subject entity numerical idsubject_sig: the subject signature stringobject_id: the object numerical idobject_sig: the object signature string
See data/paper/data_graph.csv for example.
A pattern describes a subgraph of the data graph by specifying the signature of events and entities of the subgraph. IPMES+ additionally support flow and frequency event pattern to match high-level event patterns. The format of pattern description file is in this JSON scheme:
{
"Version": "0.2.0",
"UseRegex": true,
"Entities": [
{
"ID": 1,
"Signature": "Socket::ip::.*"
},
{
"ID": 2,
"Signature": "Process::name::.*"
},
{
"ID": 3,
"Signature": "File::path::/.*/crontabs/root"
}
],
"Events": [
{
"ID": 1,
"Type": "Flow",
"SubjectID": 1,
"ObjectID": 2,
"Parents": []
},
{
"ID": 2,
"Signature": "read",
"Type": "Frequency",
"Frequency": 2,
"SubjectID": 3,
"ObjectID": 2,
"Parents": [
1
]
},
{
"ID": 3,
"Signature": "write",
"Frequency": 2,
"SubjectID": 2,
"ObjectID": 3,
"Parents": [
1
]
}
]
}Version: the version of the pattern format, the latest version is0.2.0.UseRegex: theSignaturein this pattern is supposed to be treated as regex expressions. We use the regex crate to handle regex expresions, the supported regex syntax can be found here.Entities: an array of Pattern Entity Object.Events: an array of Pattern Event Object.
Pattern Entity Object:
ID: the unique id of this pattern entity.Signaturethe signature of this pattern entity. It will match input entities in data graph with the same signature.
Pattern Event Object:
-
ID: the unique id of this pattern event. -
Type:Default,FrequencyorFlow.-
Default: The default event pattern that matches the input event with the signature specified inSignature. IfUseRegexis set totrue, the signature will be treated as a regex expression to match the signatures of input events in the data graph. Can ignoreTypefor default event pattern. -
Frequency: Similar to the default event pattern except it must be matched$f$ times to count as a frequency pattern match (i.e. there must be at least$f$ events in data graph that matches the signature of this pattern event). The parameter$f$ is specifed by theFrequencyattribute of this event. -
Flow: This pattern event has no signature. It finds a flow from the entity matches its subject pattern to the entity matches its object pattern.
-
-
SubjectID: the subject of this event. If 2 events are arise from the same subject, they share the subject id. -
ObjectID: the object of this event. If 2 events act on the same object, they share the object id. -
Parents: an array of pattern event id. This determins the dependency of a pattern event. The pattern event should be matched after all of its parents are matched.
data/: Example input data for the program. Check data/README.md for more information.docs/: Documentations.scripts/: Helper scripts to carry out expriment and data preprocessing. Check scripts/README.md for more information.src/: Source codes of IPMES+. Document are written as code comments.testcases/: Test data for the experiments and code tests.
- All the files in
data/are licensed under CC BY-NC 4.0. - The license for the source code of IPMES+ is stated in
LICENSE.
- Hong-Wei Li (Research Center for Information Technology Innovation, Academia Sinica, Taiwan) g6_7893000@hotmail.com
- Ping-Ting Liu (Department of Computer Science, National Yang Ming Chiao Tung University, Taiwan) xyfc128@gmail.com
- Bo-Wei Lin (Department of Computer Science, National Yang Ming Chiao Tung University, Taiwan) 0800680274united@gmail.com
- Yennun Huang (Research Center for Information Technology Innovation, Academia Sinica, Taiwan) yennunhuang@citi.sinica.edu.tw
