A method for operating a distributed key-value store includes processing a data set comprised of data records each associated with a unique identifier and having one or more values associated with one or more attributes using a private key provided at a client device, thereby partitioning each of the data records based on the identifier and forming a plurality of encrypted identifier-value pairs for distributed storage across a plurality of server nodes operably connectable to the client device. The method also includes building, at the client device, encrypted indexes based on the type of query; and executing a query protocol in response to receiving a query from the client device so as to identify, using the built encrypted indexes, data distributively stored in the server nodes which matches the query. The invention also provides a related system for operating a distributed key-value store.
|
18. An apparatus for operating a nosql database with distributed key-value store, comprising a processor and a memory, operably connected with a plurality of servers, the apparatus being configured to:
process a data set comprised of data records, each of the data records having a unique identifier and including a respective value associated with one or more attributes using a private key provided at the client device, wherein the encrypted identifier-value pairs are distributively stored at the servers based on a consistent hashing ring maintained at the client device and which indicates a range of identifiers associated with each of the servers;
partition the data records based on the identifier and form a plurality of encrypted identifier-value pairs for distributed storage across a plurality of servers operably connected to the client device; and
build encrypted indexes for the encrypted identifier-value pairs for storage at the servers, wherein each server stores indexes associated with the local encrypted identifier-value pairs and not with encrypted identifier-value pairs in other servers;
wherein the client device is arranged to cooperate with the servers to execute a query protocol in response to receiving a query from the client device so as to identify, using the build encrypted indexes, data distributively stored in the servers and which matches the query,
wherein the client device is arranged to:
generate a token set including a plurality of tokens based on a condition attribute of the query;
transmit the token set to each of the servers for processing at the servers using local indexes associated with each respective server;
decrypt one or more encrypted identifiers of encrypted matched record provided from the servers; and
generate labels for obtaining the matched record.
1. A method for operating a nosql database with distributed key-value store, comprising:
(a) processing a data set comprised of data records, each of the data records having a unique identifier and including a respective value associated with one or more attributes using a private key provided at a client;
(b) partitioning the data records based on the identifier and forming a plurality of encrypted identifier-value pairs for distributed storage across a plurality of servers operably connected to the client device;
(c) storing the encrypted identifier-value pairs at the servers, wherein the distribution of the encrypted identifier-value pairs to the servers based on a consistent hashing ring maintained at the client device and which indicates a range of identifiers associated with each of the servers;
(d) building, at the client device, encrypted indexes for the encrypted identifier-value pairs;
(e) storing, at the servers, the respective built encrypted indexes, wherein each server stores local indexes associated with the local encrypted identifier-value pairs and not with encrypted identifier-value pairs in other servers;
(f) executing a query protocol in response to receiving a query from the client device so as to identify, using the built encrypted indexes, data distributively stored in the servers and which matches the query, wherein the execution comprises:
generating, at the client device, a token set including a plurality of tokens based on a condition attribute of the query;
transmitting the token set to each of the servers;
processing the tokens at each of the servers using local indexes associated with each respective server;
providing one or more encrypted identifiers of encrypted matched record to the client device;
decrypting, at the client device, the one or more encrypted identifiers; and
generating, at the client device, labels for obtaining the matched record.
15. A system for operating a nosql database with distributed key-value store, comprising:
a client device with a processor and a memory, configured to:
process a data set comprised of data records, each of the data records having a unique identifier and including a respective value associated with one or more attributes using a private key provided at the client device;
partition the data records based on the identifier and form a plurality of encrypted identifier-value pairs for distributed storage across a plurality of servers operably connected to the client device; and
build encrypted indexes for the encrypted identifier-value pairs;
a plurality of servers each having a processor and a memory, configured to:
store the encrypted identifier-value pairs, wherein the encrypted identifier-value pairs are distributively stored at the servers based on a consistent hashing ring maintained at the client device and which indicates a range of identifiers associated with each of the servers;
store the respective built encrypted indexes; wherein each server stores indexes associated with the local encrypted identifier-value pairs and not with encrypted identifier-value pairs in other servers;
wherein the client device and the servers are arranged to execute a query protocol in response to receiving a query from the client device so as to identify, using the built encrypted indexes, data distributively stored in the servers and which matches the query,
wherein the client device is arranged to:
generate a token set including a plurality of tokens based on a condition attribute of the query;
transmit the token set to each of the servers;
decrypt one or more encrypted identifiers of encrypted matched record provided from the servers; and
generate labels for obtaining the matched record; and
wherein each of the servers are arranged to:
process the tokens using local indexes associated with each respective server; and provide one or more encrypted identifiers of encrypted matched record to the client device.
2. The method of
3. The method of
4. The method of
7. The method of
8. The method of
9. The method of
the token set further includes a token containing encrypted order information.
10. The method of
11. The method of
12. The method of
13. The method of
16. The system of
|
The present invention relates to a method for operating a distributed key-value store and particularly, although not exclusively, to a method for operating an encrypted key-value store with rich queries.
A new group of distributed storage systems known as NoSQL data stores have rapidly emerged in the past decade for handling data in large-scale applications such as online gaming and product recommendations. Among various distributed storage systems, key-value (KV) stores are one of the most popular type of distributed data stores, due to their strength of performance as well as their scalability and fault tolerance. Exemplary key-value (KV) stores systems include Bigtable, Redis, DynamoDB, and RAMCloud. Recent advances on KV stores have made possible the utilization of secondary indexes to enrich their features, i.e., supporting multiple data models and enabling rich queries via attributes other than the primary key.
Against the backdrop of these advancements, and with frequent incidents of massive data breaches, privacy concerns are becoming increasingly serious with large volumes of data stored in distributed KV stores such as in public cloud or private data warehouses. Indeed, these distributed KV stores do not provide a strong protection for data confidentiality. Conventional security mechanisms for KV stores mainly rely on access control that specifies the access scope at user or group levels, or transparent server-side encryption that asks the servers (not the data owners) to encrypt data. These mechanisms cannot provide full protection against serious threats of data theft.
There is a need to provide a method and related system that can provide a more secure and efficient key-value (KV) store.
In accordance with a first aspect of the present invention, there is provided a method for operating a distributed key-value store, comprising: processing a data set comprised of data records each associated with a unique identifier and having one or more values associated with one or more attributes using a private key provided at a client device, thereby partitioning each of the data records based on the identifier and forming a plurality of encrypted identifier-value pairs for distributed storage across a plurality of server nodes operably connectable to the client device; building, at the client device, encrypted indexes based on the type of query; and executing a query protocol in response to receiving a query from the client device so as to identify, using the built encrypted indexes, data distributively stored in the server nodes which matches the query.
In one embodiment of the first aspect, the encrypted indexes are exact-match indexes which index the identifiers which respectively match the same value for a corresponding attribute.
In one embodiment of the first aspect, step (b) comprises tracking the values and recording a count for each of the values on the server nodes.
In one embodiment of the first aspect, step (b) utilizes a searchable symmetric encryption method.
In one embodiment of the first aspect, the method further comprises storing the encrypted indexes at the client device.
In one embodiment of the first aspect, the method further comprises storing the encrypted indexes in the plurality of server nodes.
In one embodiment of the first aspect, step (c) comprises: generating, at the client device, a token set with a plurality of tokens based on a condition attribute of the query; transmitting the token set to each of the plurality of nodes; processing the tokens at each of the plurality of nodes using local indexes associated with the respective node; providing one or more encrypted identifiers of encrypted matched records to the client device; decrypting, at the client device, the one or more encrypted identifiers; and generating, at the client device, labels for obtaining the matched values.
In one embodiment of the first aspect, the encrypted indexes are range-match indexes.
In one embodiment of the first aspect, step (b) comprises tracking the values, recording a count for each of the values on the server nodes, and tracking order information of the values.
In one embodiment of the first aspect, the order information is randomized.
In one embodiment of the first aspect, the method further step (c) comprises: generating, at the client device, a token set having a plurality of tokens based on a condition attribute of the query and one token containing encrypted order information; transmitting the token set to each of the plurality of nodes; processing the tokens at each of the plurality of nodes using local indexes associated with the respective node; providing one or more encrypted identifiers of encrypted matched records to the client device; decrypting, at the client device, the one or more encrypted identifiers; and generating, at the client device, labels for obtaining the matched value.
In one embodiment of the first aspect, the identifiers are stored in the form of ciphertext at the client;
In one embodiment of the first aspect, the identifiers are stored in the form of ciphertext in the plurality of server nodes.
In one embodiment of the first aspect, the method further comprises inserting dummy identifier-value pairs into the plurality of nodes.
In one embodiment of the first aspect, the method further comprises maintaining, at the client, a hashing ring which indicates the range of identifiers associated with each of the plurality of nodes.
In one embodiment of the first aspect, step (c) is performed in at least two batches for the data distributively stored in the server nodes.
In one embodiment of the first aspect, the server nodes are arranged in a cloud computing network.
In accordance with a second aspect of the present invention, there is provided system for operating a distributed key-value store, comprising: means for processing a data set comprised of data records each associated with a unique identifier and having one or more values associated with one or more attributes using a private key provided at a client device, thereby partitioning each of the data records based on the identifier and forming a plurality of encrypted identifier-value pairs for distributed storage across a plurality of server nodes operably connectable to the client device; means for building, at the client device, encrypted indexes based on the type of query; and means for executing a query protocol in response to receiving a query from the client device so as to identify, using the built encrypted indexes, data distributively stored in the server nodes which matches the query.
In one embodiment of the second aspect, the encrypted indexes are exact-match indexes which index the identifiers which respectively matches the same value for a corresponding attribute.
In one embodiment of the second aspect, the encrypted indexes are range-match indexes.
Embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings in which:
Referring to
Overview
System Architecture
In one embodiment, to insert a data record to EncKV, the client 102 has to utilize its private, preferably secured, key to generate encrypted label-value (LV) pair(s). If the data is formatted in rich data models other than the simple key-value model, then it will randomly be mapped into a set of encrypted LV pairs. This treatment allows EncKV in some embodiments of the invention to use a standard data partition algorithm (i.e., consistent hashing) for distributing encrypted pairs across the nodes.
For the purpose of building local indexes, EncKV in one embodiment may require the client 102 to maintain a small-sized consistent hashing ring which indicates the label range associated with each node. With the hashing ring, the client 102 can then directly insert LV pairs to targeted nodes 104A-104F. To retrieve data records via record identifiers, the client 102 generates corresponding labels for the targeted nodes 104A-104F.
To submit a secure query via secondary attributes of data, the client 102 first generates a token set from the query condition attribute, and then broadcasts the tokens to each of the nodes 104A-104F, respectively. Afterwards, each of the nodes 104A-104F processes the tokens on its local index, preferably simultaneously, and returns the encrypted record identifiers of matched records. Finally, the client 102 decrypts the record identifiers and generates labels to be transmitted to the corresponding nodes 104A-104F for fetching the encrypted result values.
In a preferred example, EncKV can allow dummy records to be inserted to mitigate inference attacks and leakage-abuse attacks. In a preferred embodiment, the query protocols of EncKV require two rounds of interaction. The first is to obtain the encrypted record identifiers, and the second is to fetch the matched results. This treatment facilitates an immediate security improvement to hide the associations between data values on different attributes. Again, in the present embodiment, the index framework of EncKV requires the client 102 to generate query tokens for all the nodes 104A-104F, and each of the nodes 104A-104F at least produces partial results.
Assumptions
In the present embodiment of EncKV, it is assumed that the client 102 is secure and trusted, i.e., it will not expose the keys to server nodes 104A-104F, and the keys are securely stored at the client 102. One embodiment of EncKV also assumes that the attackers will never have access to private keys of the client 102, although they can dump all the encrypted indexes and data records from server nodes 104A-104F. They can also monitor the query protocols and learn about the query tokens, accessed index entries, and encrypted result values. In one example, EncKV does not consider the case where attackers can access the background information about the queries and datasets, e.g., the partial (or entire) distribution or the content of queries or records. EncKV in the present embodiment also preferably does not consider the case where malicious attackers intentionally modify or delete the indexes and records.
Cryptographic Primitives
An embodiment of a symmetric encryption scheme (K Gen, Enc, Dec) contains three algorithms: The key generation algorithm K Gen takes a security parameter k to return a secret key K. The encryption algorithm Enc takes a key K and a value v∈{0,1*} to return a ciphertext v*∈{0,1}*; The decryption algorithm Dec takes K and v* to return v. Define a family of pseudo-random functions F:×X→, if for all probabilistic polynomial-time distinguishers Y, |Pr|YF(k,⋅)=1|k←−−Pr[Y9=1|g←{Fūnc: X→}]|<negl(k), where negl(k) is a negligible function in k.
The ENCKV Design
The Underlying Encrypted KV Store
EncKV, in one embodiment of the present invention, is built based on an encrypted KV store illustrated in X. Yuan et al. Building an encrypted, distributed, and searchable key-value store. In Proc. ACM AsiaCCS, 2016. This prior design by the same inventor of the present invention has two main features. First, it proposes a secure data partition algorithm that dispatches encrypted data records across distributed nodes, while preserving horizontal scalability and fault tolerance. Second, it sketches an encrypted local index framework towards efficient queries via secondary attributes of data in distributed data stores.
The index designs of EncKV in the present invention are improved based on this framework for the practical performance of secure rich queries. Before introducing EncKV of the present embodiment in greater detail, a summary of the underlying encrypted KV store proposed in X. Yuan et al. Building an encrypted, distributed, and searchable key-value store. In Proc. ACM AsiaCCS, 2016 is provided.
As an example,
P(kl,C∥R),Enc(kv,v)
kl, kv are private keys, P is a secure pseudo random function (PRF), R is the record identifier, C is a column (secondary) attribute, v is a value on C, and Enc is a symmetric key encryption algorithm.
In X. Yuan et al. Building an encrypted, distributed, and searchable key-value store. In Proc. ACM AsiaCCS, 2016, P(kl,C∥R) is used as the label for partition. However, EncKV in the present embodiment uses the unique record identifier R instead to preserve the locality for queries via multiple attributes. As a result of this improvement, all the encrypted values for a given record are stored at the same node, and yet they are still fully scattered for protecting the schema and associations between the underlying values. In the present embodiment the record identifiers can be stored at either the client or server nodes in ciphertexts for system scaling.
Regarding the encrypted local index framework, the client in the present embodiment is required to maintain a consistent hashing ring so that it can trace the locations of values and build encrypted indexes that index the values stored on the same node. The benefits of maintaining such a hashing ring are two-fold: (1) Inter-node interaction can be avoided during the query process. Otherwise, if generic primitives are adopted, additional dedicated index nodes are needed to store the encrypted global indexes. 2) The nodes can process the queries in parallel, i.e., at the same time. Otherwise, one needs to add more index nodes and specifically design concurrent algorithms to improve the query latency and throughput in global indexes.
Exact-Match Index and Query Protocol
—Encrypted Index Design
Algorithm 1 Buildext: build exact-match indexes
Input: Private Key ke; secure PRFs {G1, G2, H1, H2}; val-
ues {v1, ... ,vm} on attribute C.
Output: Encrypted indexes {I1ext, ... ,Inext}.
1:
Initialize a hash table S to maintain counters;
2:
for vj ∈ {v1, ... ,vm} do
3:
i ← route(R); // R is vj′s ID, i ∈ {1, n} is node ID
4:
t1 ← G1(ke,C∥vj∥i);
5:
t2 ← G2(ke, C∥vj∥i);
6:
if S. find(i∥j) = ⊥ then
7:
cij ← 0;
8:
else
9:
cij ← S. find(i∥j);
10:
end if
11:
∝← H1(t1, cij);
12:
β ← H2(t2, cij) ⊕ Enc(kR, R);
13:
cij + +;
14:
S.put(i∥j, cij);
15:
Iiext.put(∝, β);
16:
end for
In the present embodiment, the construction of encrypted indexes in EncKV for secure exact-match queries is based on the SSE scheme illustrated in D. Cash et al. Dynamic Searchable Encryption in Very Large Databases: Data Structures and Implementation. In Proc. NDSS, 2014. The design in Cash uses KV pairs to index files that match the same keyword, with each file of the same keyword being distinguished by a stateful counter. EncKV in the present embodiment is built upon this idea—it indexes the record identifiers that match the same values on a certain column attribute. To integrate the design into the distributed local index framework, the client in EncKV is requested to track the values and maintain counters for each distinct value on different nodes during the index building procedure.
Algorithm 1 shows the detailed algorithm to index values {v1, . . . , vm} for a given column attribute C. The illustrated Algorithm is preferably executed at the client. In the present embodiment. For each vj for j from 1 to m, n counters are first initialized, where n corresponds to the number of nodes. The client then finds the target node i for vj based on the position of its record identifier R on the hashing ring. After that, the client generates two (or more) tokens by embedding the value securely via secure PRF, i.e., t1=G1(ke, Cv∥vj∥i) and t2=G2(ke,Cv∥vj∥i). The client further uses the corresponding counter to generate the encrypted index entry, i.e., ∝=H1(t1,cij),β=H2(t2,cij)⊕Enc(kR,R), where R is securely indexed.
In the present embodiment, advantageously, only the index size is known to the nodes. In other words, without querying, no other information of the underlying content can be learned. In one embodiment, the counters will not be used in subsequent query protocols, and so they can be dropped if no records will further be added. To support incremental index updates, they can be stored either at the client or at the nodes in their encrypted form. Regarding performance, in the present embodiment, the query time is in linear relationship with the number of result values.
—Secure Query Protocol
Algorithm 2 Queryext: secure exact-match query protocol
Input: Private key ke; query condition value v; query con-
dition attribute Cv; result value attribute Cr.
Output: Encrypted matched results {vr}.
Client.Token
1:
for i ∈ {1, ... ,n} do
2:
t1 ← G1(ke, cv∥v∥i)
3:
t2 ← G2(ke, cv∥v∥i)
4:
Send (t1, t2) to node i;
5:
end for
Nodei, ExtQuery
1:
ci ← 0;
2:
∝← H1 (t1, ci);
3:
while find(∝) ≠⊥ do
4:
β ← find(∝);
5:
r ← Iiext, get (H2 (t2, cj) ⊕ β);
6:
ci + +;
7:
Return r to client for decryption
Client
8:
R ← Dec(kR, r);
9:
l ← P(kl, Cr\\R);
10:
Fetch vr via l;
11:
end while
12:
// Note: in the implementation, all matched {r} are
sent back in a batch, and {Vr} are fetch in a batch.
The corresponding query protocol in the present embodiment is executed between the client and the nodes, as presented in Algorithm 2, following the index construction. In one embodiment, given a query via two attributes, the client may find all the values {vr} in attribute Cr on the matching condition such that the value of another attribute Cr should be the same as value v. First, the client generates query tokens for each node {t1,t2}, where t1=G1(ke,Cv∥v∥i) and t2=G2(ke,Cv∥v∥i). Preferably, each node processes these tokens in parallel. In particular, each node increments a counter ci to locate all the matched index entries via H1(t1,ci) (until no entry is returned, and each entry is unmasked via XORing H2(t2,ci) to get r the encrypted record identifier. After that, all matched {r} are sent back to the client for decryption. For each decrypted identifier R, the client generates the corresponding label via P(kl,Cr∥R) for fetching the encrypted result value.
In the present embodiment, data values and attributes are strongly protected during the query procedure. In particular, each node only learns the query tokens, accessed index entries, and encrypted result values. Due to the deterministic property of tokens, it also learns the repeated queries on the same attribute. The query protocol in the present embodiment requires two rounds of interaction between the client and each node. This arrangement is advantageous in that it can minimize the leakage of queries. Also, each node only learns the matched values associated to the same column, and it will not learn the associations between values in different column attributes, thereby effectively addressing inference attacks. Formal security analysis will provided below.
Range-Match Index and Query Protocol
—Encrypted Index Design
Algorithm 3 Buildrng: build range-match indexes
Input: Private key kr, ko; secure PRFs {G1, G2, H1, H3};
values {v1, ... , vm} on attribute C.
Output: Encrypted indexes {Iirng, ... ,Inrng}.
1:
Initialize a hash table S to maintain counters;
2:
for vj ∈ {v1, ... , vm} do
3:
i ← route(R); // R is v′s ID, i ∈ {1, n} is node ID
4:
t1 ← G1(kr, C∥i);
5:
t2 ← G2(kr, C∥i);
6:
if S. find(i) = ⊥ then
7:
ci ← 0;
8:
else
9:
ci ← S. find(i);
10:
end if
11:
∝← H1(t1, ci);
12:
ctR ← OREenc(ko, vj, ci); // shown in Algorithm 4
13:
β ← H3(t2, ci) ⊕ (ctR∥Enc(kR, R));
14:
ci + +;
15:
S.put(i, ci);
16:
Iirng.put(∝, β);
17:
end for
The construction of encrypted range-match indexes in the present embodiment follows the same treatment as the encrypted exact matched indexes discussed above. For security reasons, each index entry has to be strongly encrypted, and the information on which entries associated with the same column attribute should also be hidden before querying. This objective is achieved in this example through searchable encryption techniques. For index and data locality, the client in EncKV is required to track the locations of data values. Algorithm 3 illustrates the index building procedure in one embodiment of the present invention. For each value vj on a column attribute C, the client first locates the node in which the record is stored. The client then generates the encrypted index entry ∂,β by securely embedding C and the counter ci. It should be noted that the underlying content of β also contains the ORE ciphertext ctR which is computed from an enhanced ORE scheme to be introduced in the next section. As a result, the encrypted range-match index in the present embodiment is integrated into the local index framework of EncKV.
Enhanced ORE scheme: The basic concept of an ORE scheme proposed in K. Lewi and D. J. Wu. Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds. In Proc. ACM CCS, 2016 is to split a message into bit blocks with equal length, and then conduct comparison from the significant least blocks of two messages. For example, if the message space is 4 bits, the block size is 2 bits, each message will then be encrypted into 2 blocks. Specifically, each block has total 22 possible values {00; 01; 10; 11}. The message block, say “10” to be encrypted, will be transformed to 4 sub blocks, where the order information {>, >, =, <} to each value above is securely embedded with its prefix block. In this example, the order cmp is defined as the output of the comparison CMP (m1,m2) for block m1 and m2.
The inventors of the present invention note that the original ORE scheme may reveal or expose the order information between the query value and each ciphertext on the column. Such leakage is dangerous because it can tell partial order information between ciphertexts, i.e., some ciphertexts are smaller than or greater than others. Even the order can be one-way transformed as a pseudo-random tag, such tags would be sent along the queries, which can be used as frequency information for attackers if they know the query distribution. In short, security is compromised in the original ORE scheme.
To minimize the abovementioned leakage, the present embodiment protects order information by embedding it securely via PRF with the column attribute, the block index, and the stateful counter, as shown in Lines 7 and 8 of Algorithm 4. This forms the enhanced ORE encryption algorithm in one embodiment of the present invention. The sub block j in block i is encrypted as Q1(si,j,c)+Q2|(F1(k1,v|i-1∥j),γ), where si,j=F3(ka,CMP(j*,v|i))∥C∥j), v|i is the block value, v|i-1 is the prefix block value, C is the column attribute, and c is the counter of this value. j* is the securely permuted j in one of the possible values to this block, where j∈[1,
This improved construction in the present embodiment guarantees that the order in each sub block is different, and the order conditions for different values are also different. Due to the deterministic property of PRF, the query comparison can still correctly be performed via token matching, as illustrated below.
—Secure Query Protocol
Algorithm 4 OREenc: enhanced ORE encryption
Input: Private key ko; secure PRFs {F1, F2, F3}; secure
PRP π; value v; counter c;
Output: ORE ciphertext ctR
1:
Derive k1, k2, k3 from ko ;
2:
Generate a nonce γ;
3:
for i ∈ {1, b} do
4:
for j ∈ {1, 2b} do
5:
j* ← π−1(F2(k2,v|i−1),j);
6:
if CMP(j*, v|i) ≠ 0 then
7:
si,j ← F3(k3, CMP(j*, v|i))∥C∥j);
8:
zi,j ← Q1(si,j, c) + Q2(F1(k1, v|i−1∥j),γ);
9:
else
10:
zi,j ← “equal” + Q2(F1(k1, v|i−1∥j),γ);
11:
end if
12:
end for
13:
ctR|i ← zi, 1, ... , zi,2b;
14:
end for
15:
ctR ← γ, ctR|1, ... , ctR|b;
Algorithm 5 OREcmp: ORE compare operation
Input: ORE query token ctL; ORE ciphertext ctR;
Output: true or false.
1.
γ, u′1, ... , u′b ← ctR;
2:
u1, ... , ub ← ctL;
3:
for i ∈ {1, ... b} do
4:
xi, v{tilde over (|)}ĩ, qi ← ui;
5:
zi, 1, ... , zi,2b ← u′i;
6:
si ← zi,v
7:
if si ≠ 0 and si = Q1(qi, c) then
8:
return true; // condition matched
9:
end if
10:
end for
11:
return false;
Algorithm 6 Queryrng: secure range-match query protocol
Input: Private key kr, ko; query condition value v; order
condition cmp ∈ {>, <}; query condition attribute Cv;
result value attribute Cr.
Output: encrypted match results {vr}.
Client.Token
1:
for i ∈ (1, ... ,n} do
2:
t1 ← G1(kr, cv∥i);
3:
t2 ← G2(kr, cv∥i);
4:
for i ∈ {1, b} do
5:
v{tilde over (|)}ĩ ← π(F2(k2, V|i−1), v|i));
6:
qi ← F3(k3, cmp∥C∥ v{tilde over (|)}ĩ);
7:
ui ← F1(k1, v|i−1∥v{tilde over (|)}ĩ),
v{tilde over (|)}ĩ, qi;
8:
end for
9:
ctL ← (u1, ... , ub);
10:
Send (t1, t2, ctL) to node i;
11:
end for
Nodei, RngQuery
1:
ci ← 0;
2:
∝← H1 (t1, ci);
3:
while find(∝) ≠⊥ do
4:
β ← find(∝);
5:
r ← Iirng, get (H3 (t2, ci) ⊕ β);
6:
Parse r as rx ← Enc(kR, R), ry ← ctR;
7:
ci + +;
8:
// ORE compare operation shown in Algorithm 5
9:
if ORE cmp (ctL, ctR) = true then
10:
Return rx to the client;
11:
end if
12:
end while
13:
// Note: we ignore the steps to fetch final results,
which is the same in Line 7 to 10 in Algorithm 2.
Based on the index construction, the range match query protocol in one embodiment of the present invention is presented in details in Algorithm 6. In one example, given a query via two attributes, the client wants to find all values {vr} in attribute Cr on the matching condition such that the value of another attribute Cv should be smaller than value v. Similar to the exact-match query in one embodiment of the invention, the client generates query tokens for each node {t1, t2} from Cv. For ORE comparison, the client needs to compute another token ctL which contains the encrypted blocks (u1, . . . , ub) with distinct encrypted order condition qi of each block. Preferably, each node processes {t1, t2} in parallel, i.e., unmasking the corresponding ORE index entries via incremental counters. Afterwards, each node calls the ORE compare operation OREcmp to compare ctL and ctR in each entry above, as presented in Algorithm 5. The process is conducted from the most significant block. Symmetric to the block encryption, the encrypted order is obtained via si=zi,v
It should be noted that the design in the present embodiment has revealed the equality of the query value and the ciphertexts, and has indicated the position of the first block in which two values differ, which is the same to the adopted ORE scheme in K. Lewi and D. J. Wu. Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds. In Proc. ACM CCS, 2016.
The query time complexity in the current treatment is O(mC), where mC is the number of values on attribute C at a certain node. The performance can further be improved by sorting values before encryption. Then the client notifies mC to this node for binary search.
Secure Rich Query Instantiation
The encrypted indexes in EncKV of the present embodiment readily enable rich queries supported in existing NoSQL data stores. These stores implement SQL-like query language for easy data management. SQL-like query examples will be used below to introduce how EncKV in the embodiment of the present invention supports rich queries.
Keyword Search, Equality and Count queries: Given a keyword search or equality query “SELECT name WHERE city=LA” in
Range and Like queries: In terms of range-match queries such as the example “SELECT name WHERE age>20” in
Preferably, EncKV also supports LIKE (aka prefix) query which is a common query operation. For instance, the query “SELECT City WHERE City LIKE ‘A %’” obtains answers like “Argentina” or “Australia” and so on. The adopted ORE scheme in one embodiment of the present invention supports comparison in both numeric numbers and alphanumeric strings. Recall that each ORE ciphertext is encrypted by blocks, and previous block content is also embedded in current block ciphertext for prefix matching. During the comparison, the first different blocks between ctL and ctR will tell that their previous blocks are the same.
Join Queries: EncKV in one embodiment of the present invention supports Join queries such that any attributes of two tables can be joined together. First, define a generic join query statement, FROM T1 JOIN T2 ON T1.C1=T2.C2 WHERE field, where T1 and T2 are two tables being joined, C1 and C2 are attributes of T1 and T2, and field is a join condition such as an exact-match or range match operation. For example, given a query “SELECT T2.GPA from T1 JOIN T2 on T1.id=T2.id where T1.age>20”, the client first parses the query and performs “SELECT T1.id from T1 WHERE T1.age>20” via the range-match indexes, which derives the matched record identifier set R. Then, the client generates label P(kl,T2.GPA∥Ri)R
Sum and Average queries: Following the treatment in prior encrypted databases, nodes in EncKV can perform aggregation on encrypted data values by using addictive homomorphic encryption (HOM) scheme as illustrated in, for example, R. A. Popa et al. CryptDB: protecting con dentiality with encrypted query processing. In Proc. ACM SOSP, 2011. Values to be aggregated are encrypted via a certain HOM encryption scheme, i.e., (P(kl,C∥R∥HOM), HEnc(kv,v)). When the client issues a query in the form “SELECT SUM(score) FROM Score WHERE age>20”, it firstly queries the matched record identifier set R via the range-match indexes from “SELECT stu_ID FROM Score WHERE age>20”. Then each node locates and aggregates the HOM ciphertext via label {P(kl,score∥Rs∥HOM)}R
Group By queries: In one embodiment, EncKV performs Group By queries via combining exact-match queries with aggregation computation. Suppose the client issues a group by request as “SELECT city, sum(age) GROUP BY city”. It firstly finds the specific record identifier set R for each group such as “LA” via the exact-match query “SELECT stu_ID where city=LA”. Here, it is assumed that the client knows all the distinct city names. Then it can use the HOM label {P(kl,age∥Rj∥HOM)}n
Max and Min queries: To support Max and Min queries, the client inserts a specific LV pairs for the MAX/MIN data values on a column attribute. Using maximum value as an example, if the maximum value of the attribute “age” is “100”, the client generates LV pair: P(kl,age∥MAX),E(kv,100). When the client wants to query the maximum data, it computes the label P(kl,age∥MAX) to get the maximum value. The above also applies to minimum value.
Batch Queries
In one embodiment, the query protocols of EncKV across different attributes are conducted in two phases. The first phase is to find the encrypted record identifiers that match the query condition on a specific query attribute, and the second phase is to fetch the values of these matched records on a targeted attribute. Such queries will let the nodes to know the associations (e.g., schema) between index entries of different attributes and values of the same records.
To reduce the above leakage, in a preferred embodiment the queries are conducted in two rounds of interaction, i.e., in a batched manner. For a batch of queries, the client can first parse the query conditions in a way that the overlapped or repeated query conditions will be queries only once in this batch. After receiving the encrypted result identifiers, the client decrypts them and eliminates the duplicated identifiers if any. Then the client generates the labels from distinct identifiers and the targeted attributes. In addition, the client is required to permute those labels and fetch final result values from corresponding nodes. In one example, such improvement can be realized via a dedicated query planner. Based on the above treatment, the associations between values and index entries on same records can be better protected.
Secure Update Operations
In one embodiment, EncKV provides two ways of updating operations when new data record gas to be added, namely, bulk update and incremental update.
Bulk update is suitable for cases in which a large number of data records has to be added, i.e., migrating an unencrypted database to EncKV. Encrypted exact-match and range-match indexes can be built via their index building functions in Algorithm 1 and Algorithm 3 respectively.
Incremental update is suitable in cases where data records are only occasionally inserted into EncKV. As a result, new index entries need to be added to existing indexes. To implement incremental update, the state information (i.e., counters) on each indexed attribute should carefully be maintained either at the client or at the nodes in encrypted form so that the client can generate the corresponding index entries without affecting the following queries. Once the attributes are queried, the nodes will know whether the newly inserted index entries are associated with those attributes.
Security Analysis
Security on Exact-Match Queries
Since the secure exact-match queries of EncKV are realized in the framework of SSE, the nodes will only learn the controlled leakage and will never learn the underlying contents of queries and results. Basically, the index size will be learned once the index is uploaded to the server. Search and access pattern will be learned along the queries, where search pattern indicates the repeated queries, and access pattern indicates the accessed ciphertexts. In targeted queries which contain multiple query attributes, and thus access pattern also includes the associations between values across those attributes. Following the notion of SSE, in one example, define the leakage functions in EncKV as follows:
1ext(C)=({mi}n,|α|,|β|)
where C is the set of secondary attributes, mi is the size of local index Iiext of node i, n is the number of nodes, and |α|, |β| are the lengths of label and value in the index entry.
L2ext(vc,Cv,Cr)=({t1i,t7i}n,{{α,β,l,v}c1}n)
where vC is the query value, Cv is the attribute of vC, Cr is the attribute of result values, and {t1i, t2i}n are tokens for n nodes respectively. Given a query, the matched index entries and results {α,β,l,v}c1 at each node are known.
L3ext(Q)=(Mq×q,Tv*→∝),
where Q is q number of adaptive queries, and Mq×q is a symmetric bit matrix to trace the same queries. Mi,j and Mj,i are equal to 1 if t1i=t1j for i,j∈[1,q]. Otherwise, they are equal to 0. Tv*→∝ is an inverted list that traces index entries that match each result value, which can also be referred to as inference information. For each posting list v*|{∝1, . . . , ∝a} in T, the associations between the index entries of different attributes and the result value are learned.
In terms of the quantified leakage, the security definition of exact-match queries is presented as:
Definition 1. Let Ext=(K Gen,Buildext,Queryext) be the encrypted exact-match index construction in EncKV. Given leakage L1ext,L2ext and L3ext, and a probabilistic polynomial time (PPT) adversary A and a PPT simulator S, define the following experiments.
Real.A(k): The client calls K Gen(1k) to output a private key K. A selects a dataset D and asks the client to build {I1ext, . . . , Inext} via Buildext. Then A performs a polynomial number of q adaptive queries, and asks the client for tokens and ciphertexts. Finally, A outputs a bit.
Ideal.A.S({dot over (k)}): A selects D. S generates {I1lext, . . . , Inlext} for A based on 1ext. A performs a polynomial number of adaptive q queries. From 2ext and 3ext, S returns the simulated ciphertexts and tokens. Finally, A outputs a bit.
Ext is adaptively secure with (1ext, 2ext and 3ext) if for all PPT adversaries A, there exists a simulator S such that Pr┌Real.A(k)=1|−Pr┌IdealA.S(k)=1┐≤negl(k), where negl(k) is a negligible function in k.
Theorem 1. Ext is adaptively secure with (1ext, 2ext and 3ext) under the random-oracle model if G1, G2, H1, H2, P are secure PRF.
The security notion of exact-match queries in EncKV is stronger than deterministic encryption (DET) used in existing encrypted databases. DET-based designs expose the server all the same values on an attribute, while EncKV in the present embodiment will not disclose such information. For other auxiliary information, associations between values across attributes (aka inter-column and intra-column associations) are directly exposed in existing encrypted databases with legacy compatibility, while the exposure of such information in EncKV of the present embodiment is greatly reduced. On the one hand, the attribute is secretly embedded in the encrypted index, the server will never learn whether two tokens query the values on the same attribute or not. On the other hand, the batch query mechanism in EncKV of the present embodiment further reduces the associations between columns.
Security on Range-Match Queries
The secure range-match queries in EncKV in the present embodiment are improved based on the ORE scheme proposed in K Lewi and D. J. Wu. Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds. In Proc. ACM CCS, 2016. Therefore, the security in the present embodiment achieves at least the same level as the scheme in K. Lewi and D. J. Wu. Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds. In Proc. ACM CCS, 2016. That is, briefly, the ciphertexts are semantically secure, and the first different block that differs between two values in the comparison. To achieve general protection and integrate the indexes into the local index framework, EncKV leverages SSE techniques as an overlay to protect ORE ciphertexts in the encrypted indexes. Yet, similar to exact-match queries, inference information will also be learned since queries may involve multiple query attributes. Accordingly, the leakage functions are defined as follows:
1rng(c)=({mi}n,|∝|,|β|)
where C is the set of secondary attributes, mi is the size of local index I1rng of node i, n is the number of nodes, and |∝|, |β| are the lengths of label and value in the index entry.
2rng(vC,Cv,Cr)=({t1i,t2i}n,ctL,{{∝,β,l,v}c
where vC is the query value, Cv is the query attribute, Cr is the attribute of result value, ctL is the token for ORE comparison, and {t1i, t2i}n are tokens for n nodes respectively. Given a query, the matched index entries and result pairs {∝, β, l,v*}ci, at each server node are known. In addition, the rest of index entries on this column will also be learned.
L3rng(vc,cmp)=({{bdif}ci}n)
where bdif is the first block that differs in the comparison of matched ORE ciphertexts.
L4rng(Q)=(Mq×q,Tv*→∝)
where Q is q number of adaptive queries, and Mq×q is a symmetric bit matrix to trace the same queries. Mi,j and Mj,i are equal to 1 if t1i, t1J for i,j∈[1,q]. Otherwise, they are equal to 0. Tv*→∝ is an inverted list that indicates the associations between the index entries of different attributes and the result values as defined in exact-match queries. Accordingly, the security definition of range-match queries can be presented as follows:
Definition 2. Let Rng=(KGen, Buildrng,Queryrng) be the encrypted exact-match index construction of EncKV. Given leakage 1rng, 2rng, 3rng and L4rng, and a PPT adversary A and a PPT simulator S, define the following experiments.
RealA(k): The client calls KGen(1k) to output a private key K. A selects a dataset D and asks the client to build{I1rng, . . . , Inrng} via Buildrng. Then A performs a polynomial number of q adaptive queries, and asks the client for tokens and ciphertexts. Finally, A outputs a bit.
IdealA;S(k): A selects D. S generates {I1lrng, . . . , Inlrng} for A based on 1rng. A performs a polynomial number of non-adaptive q queries. From 2rng, 3rng, and L4rng, S returns the simulated ciphertexts and tokens. Finally, A outputs a bit.
Rng is non-adaptively secure with (1rng, 2rng, 3rng and 4rng) if for all PPT adversaries A, there exists a simulator S such that Pr[Real.A(k)=1]−Pr[Idea.A,S(k)=1]≤negl(k), where negl(k), is a negligible function in k.
Theorem 2. Rng is non-adaptively secure with (1rng, 2rng, 3rng, 4rng) if G1, G2, H1, H3, P, F1, F2, F3 are secure PRF.
The enhanced ORE scheme in the present embodiment protects the order information in queries and ciphertexts even after comparison. As illustrated in line 6 of Algorithm 6, the query order is protected in the ORE query token, i.e., qi=F2(k3,cmp∥C∥v|ĩ), where cmp is the order, C is the query attribute, and v|i, is the ith block of query value. As a result, different query values or attributes will result in different ORE query tokens. Then qi is used by the server node to compute Q1(qi,c) in line 7 of Algorithm 5. If the output is matched with si in the ciphertext, this entry will be considered to be matched.
Experimental Evaluation
Prototype Implementation
To assess the performance of EncKV in the present embodiment of the invention, a prototype was implemented and deployed onto Amazon Web Services. 4 AWS M4-xlarge instances were created to operate as clients. Also created was a Redis (v3.2.0) cluster that consists of 9 AWS M4-xlarge instances as the nodes to store encrypted indexes and records. Each instance was assigned with 4 vcores (2.4 GHz Intel Xeon® E5-2676 v3 CPU), 16 GB RAM and 40 GB SSD, and Ubuntu server 14.04 are installed. The EncKV prototype utilized Apache Thrift (v0.9.2) to implement the remote procedure call (RPC).
EncKV used OpenSSL (v1.01f) for the implementation of cryptographic build blocks. Secure PRF was implemented via AES cipher (128 bits). The enhanced ORE scheme was implemented on top of the implementation of the ORE scheme in K Lewi and D. J. Wu. Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds. In Proc. ACM CCS, 2016. In the evaluation, 8 bits were set as the block size for ORE encryption. The encrypted exact-match and range match indexes were integrated into the implementation of the distributed encrypted index framework X. In total, the EncKV in this embodiment contained about 10144 lines of C++ code.
Performance Evaluation
The evaluation on EncKV of the present embodiment mainly focuses on the encrypted index and query performance.
TABLE 1
Encrypted index space consumption
# Indexed values
400K
600K
800K
(a) Encrypted exact-match index
Size (GB)
0.012
0.018
0.024
(b) Encrypted range-match index
Size (2 bit block) (GB)
0.209
0.313
0.417
Size (4 bit block) (GB)
0.399
0.599
0.799
Size (8 bit block) (GB)
3.070
4.604
6.139
Index evaluation: The index space consumption is shown in Table 1. For the encrypted exact-match index, the size of each entry <α,β> is 256 bits, where α and β are 128-bit long. Table 1(a) shows that the index size increases linearly from 0.012 GB (400K indexed values) to 0.024 GB (800K indexed values). For the encrypted range-match index, each entry also needs to store ORE ciphertext ctR for comparison. As an ORE ciphertext is encrypted by blocks, the size of ctR depends on the length of block b. And each block ciphertext contains 2b sub blocks, where each is 64 bit-long (truncated from AES cipher output). With α, β, a 128-bit nonce β, and ctR, the size of an entry for a 32-bit value is 128+128+64×2b×32/b+128 bits. As mentioned in K. Lewi and D. J. Wu. Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds. In Proc. ACM CCS, 2016, there is a tradeoff in security and space. The larger block size has stronger security but introducing more space cost, as shown in Table 1(b).
Query evaluation: To evaluate the scalability of EncKV in the present invention, the query throughput for exact match indexes and range match indexes, respectively, in one embodiment of the present invention, is evaluated, and the results are shown in
In order to gain a deeper understanding on the query performance of EncKV of the present embodiment, the latency for exact-match and range-match queries, respectively, were further evaluated. In
Since EncKV in the present embodiment also supports incremental updates for newly added records, the cost for index entry insertion in the present embodiment is evaluated in
The adopted local index framework in the present embodiment requires the client to generate query tokens for each node. To understand the bandwidth overhead, the ratio between the query token size and the result size is shown in
System
Referring to
Conclusion
The above embodiments of the present invention provides EncKV, an encrypted key-value store with secure rich query support. To support exact match queries (keyword search, equality test, counting, and enumeration) and range match queries (range search and prefix match), EncKV in the above embodiments leverages two primitives: searchable symmetric encryption (SSE) and order-revealing encryption (ORE). For high performance queries, EncKV in one embodiment follows the guideline of the encrypted local index framework proposed in X. Yuan et al. Building an encrypted, distributed, and searchable key-value store. In Proc. ACM AsiaCCS, 2016; that is, the client needs to know the location of each data record so that it can build the local encrypted indexes that index the data records on each node respectively. This requirement naturally demands EncKV to inherit the secure data partition algorithm, which allows the client to track the locations of encrypted data records.
In one embodiment for exact-match queries, EncKV carefully integrates an efficient SSE scheme into its local index framework, with customization and improvements made to support exact-match queries via encrypted single or multiple secondary attributes of data. As a result, the encrypted local indexes of EncKV has at least the same level of security as SSE, and can readily be stored in any KV store back end for easy deployment.
For range-match queries, EncKV is developed based on an ORE scheme, which achieves the “best-possible” security notion for ORE. Similarly, the ORE scheme is heavily customized, improved, and then integrated into the index framework of EncKV.
Advantageously, in some embodiments, EncKV further reduces the leakage during ciphertext comparisons by randomizing the query order (i.e., “>” and “<”). Accordingly, the servers will not know whether the matched results are greater or smaller than the query values.
EncKV introduces an interactive batch query mechanism to reduce the leakage of data correlations on different attributes. EncKV of the present invention provides, among other things, the following technical advantages:
Other advantages of the present invention will become apparent to the person skilled in the art upon referring to the description and the appended drawings.
Although not required, the embodiments described with reference to the Figures can be implemented as an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files assisting in the performance of particular functions, the skilled person will understand that the functionality of the software application may be distributed across a number of routines, objects or components to achieve the same functionality desired herein.
It will also be appreciated that where the methods and systems of the present invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilized. This will include stand-alone computers, network computers and dedicated hardware devices. Where the terms “computing system” and “computing device” are used, these terms are intended to cover any appropriate arrangement of computer hardware capable of implementing the function described.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated.
Wang, Cong, Wang, Xinyu, Guo, Yu, Yuan, Xingliang
Patent | Priority | Assignee | Title |
11544403, | Feb 07 2018 | MediCapture, Inc. | System and method for decentralized data storage |
11599583, | Apr 23 2020 | PAYPAL, INC. | Deep pagination system |
11645409, | Dec 18 2020 | Seagate Technology LLC | Search and access pattern hiding verifiable searchable encryption for distributed settings with malicious servers |
Patent | Priority | Assignee | Title |
9037860, | Nov 22 2013 | SAP SE | Average-complexity ideal-security order-preserving encryption |
9342707, | Nov 06 2014 | SAP SE | Searchable encryption for infrequent queries in adjustable encrypted databases |
9425960, | Oct 17 2008 | SAP SE | Searchable encryption for outsourcing data analytics |
9712320, | Jun 11 2013 | EMC IP HOLDING COMPANY LLC | Delegatable pseudorandom functions and applications |
20080092239, | |||
20100114964, | |||
20100246827, | |||
20110055585, | |||
20110264920, | |||
20120078914, | |||
20120121080, | |||
20130046974, | |||
20140164758, | |||
20140351260, | |||
20160125198, | |||
20160132692, | |||
20170346851, | |||
20180218426, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 07 2017 | YUAN, XINGLIANG | City University of Hong Kong | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041979 | /0219 | |
Apr 07 2017 | GUO, YU | City University of Hong Kong | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041979 | /0219 | |
Apr 07 2017 | WANG, XINYU | City University of Hong Kong | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041979 | /0219 | |
Apr 07 2017 | WANG, CONG | City University of Hong Kong | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041979 | /0219 | |
Apr 10 2017 | City University of Hong Kong | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Feb 20 2023 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Date | Maintenance Schedule |
Nov 12 2022 | 4 years fee payment window open |
May 12 2023 | 6 months grace period start (w surcharge) |
Nov 12 2023 | patent expiry (for year 4) |
Nov 12 2025 | 2 years to revive unintentionally abandoned end. (for year 4) |
Nov 12 2026 | 8 years fee payment window open |
May 12 2027 | 6 months grace period start (w surcharge) |
Nov 12 2027 | patent expiry (for year 8) |
Nov 12 2029 | 2 years to revive unintentionally abandoned end. (for year 8) |
Nov 12 2030 | 12 years fee payment window open |
May 12 2031 | 6 months grace period start (w surcharge) |
Nov 12 2031 | patent expiry (for year 12) |
Nov 12 2033 | 2 years to revive unintentionally abandoned end. (for year 12) |