[UPDATE 1/21/2010] Search term seperators no longer include period (as version 5.1). They are space, semi-colon, backslash, foward slash, and hypen.
In the recent month, we’ve had a lot of questions about how searching works in Secret Server, so I thought now would be a good time to answer as many questions about searching as possible.
Searching pre 5.0
Before the 5.0 edition of Secret Server searching was fairly limited. The only thing you could search on was the Secret’s name. Over time, the Search criteria grew a little, but still this main limitation was always there. As soon as you wanted to search on the actual values in the secrets, you were out of luck. The ability to search by the values in a secret was one of our most requested features.
The Secret Server development team has always had a keen sense to what customers wanted, and we typically implement feature requests based on feedback. However, this particular feature had a lot of technical barriers to solve before it could be implemented.
The main barrier we had to deal with was the concept of Secret Server itself. Secret Server is designed to be as secure as possible, and one of the pieces of this design is full data encryption. All of the values of a secret, aside from its name, are stored in the database encrypted. This makes searching the database impossible. If we wanted to perform a search, we would have had to pull back every secret from the database, decrypt it, and then search it. This clearly wouldn’t work from a performance angle, and didn’t scale well.
Searching as of 5.0
We realized we wouldn’t be able to do real-time searches on secrets. The barrier still remained though, how do we search secrets and not expose sensitive information? Our solution was a hash based index table.
Many systems, such as Windows and search providers like Google keep a search index. When you search Google, you really aren’t searching the entire Internet all at once, you are searching a dictionary of content that Google has built over time. Secret Server does something similar. The trick is to build an index but also keep it secure.
Secret Server 5.0 has a background monitor, the Search Indexer, that looks for changed secrets, about every 60 seconds it queries the database looking for unindexed secrets or changes in secrets. When you create or modify a secret, we flag that secret to tell the Search Indexer to re-index it.
The Search Indexer creates hashed terms from the values in a secret. More specifically for those technically interested, we use the HMAC-512 algorithm. A quick explanation of what this algorithm does is creates a one-way code. For example, if the word “book” was hashed, it would produce a unique output. However this output cannot be converted back into the original data, “book” in our case.
This technique is used when creating indexes. Let’s say we have a secret with a field called “Server” with a value of “OFFICE\Webserver01″. When the search indexer got around to indexing this secret, it would create a hashed value of “office\webserver01″. Whenever we create hashed terms, we always convert it to lowercase so that searching isn’t case-sensitive. This search index record would become associated with the secret.
Now, when a user does a search, we use the same hash algorithm to compute the hash term of what you are searching for (again converted to lowercase). We when search our index table for a match.
What About Partial Matches?
When we have a term like “OFFICE\Webserver01″, we produce hashes of “pieces” of the word. In this case, we would also produce specific hashes for “office” and “webserver01″. Notice that we split on the letter “\”. The same happens when a search is performed. This way if you searched for “OFFICE\Webserver02″, it would still come back with the OFFICE\Webserver01″ because the “OFFICE” term still matched. We do this for other letters as well, that includes spaces, backslashes, slashes, periods, commas, and semicolons.
Search Index Modes
The Search Indexer has two modes. Standard, and Enhanced. So far, all of the behavior I have described has been the “Standard” mode. The Enhanced mode works very similarly, however it also produces three letter partials. Using out “OFFICE\Webserver01″ example, we produce our hashes normally, but we also produce the partials. We would add hashes for “OFF”, “FFI”, “FIC”, “ICE”, etc. This allows partial matches to return.
So Many Results
The implementation sounds correct, but it has some room for improvement. Note that I said we split the terms on periods. That means if you searched for “firstname.lastname@example.org”, it would return everything that had “com” in it, and chances are there are a lot of results. The splitting on the period seems to be the biggest culprit for undesired results coming back. Once you throw the Enhanced mode into the mix, it gets even more complicated.
Nothing has been set in stone in terms of changes and when it will be implemented, but we have been kicking around a lot of ideas. The immediate one might be to consider removing the period from the characters that we split on. Another idea was ranking the results. Secret Server right now always returns secrets sorted by their name. It would make more sense if we returned results in order of the number of hash terms that matched and if the name matched as well.
I hope that clarifies some of the mystery surrounding search. If you have any additional feedback or questions, be sure to drop by our forums and let us know!