The National Software Reference Library (NSRL) is a collection of traceable software which is then processed and provided to the public (primarily law enforcement and forensic analysts) for use in investigations. Each file’s identifying information is stored in a database and the following is computed and shared in a Reference Data Set (RDS) every 3 months:
- Cryptographic hash values (MD5 and SHA-1) of the file’s content;
- File’s origin including the software package(s) containing the file and the manufacturer of the package;
- Original name and file size
Often there is confusion over the types of software that the NSRL tracks. One important goal of the NSRL is that the information will be accepted in court. As such, they maintain tight control over the software (originals are stored in a vault and can be used to prove traceability) and adhere to strict standards.
Known (vs. Known Good and Known Bad)
Also, the NSRL does not identify software good or malicious but instead provides a simple automated file classification. The reasoning for this approach is that software can be malicious is some settings and not in others. If you are using the NSRL to eliminate files you must analyze during an investigation, it is important to review the “ApplicationType” field and only eliminate files you deem irrelevant. The NSRL FAQ nicely sums things up by stating the files in the NSRL database are “known” – NOT “known good” OR “known bad” – just “known application files.”
NSRL and Hashkeeper
The NSRL RDS and the NDIC’s Hashkeeper are collections of File Identification Information (FII) which are typically used to identify computer files during forensic investigations of computer systems. The principal differences between the two collections are as follows:
- Provenance. All NSRL data is derived from purchased or donated software which is retained in a secure facility at NIST. NSRL FII is thus traceable to the original software. Hashkeeper is a central repository of FII donated by various sources, usually obtained by law enforcement during the course of forensic investigations of suspect systems. Hashkeeper FII is not traceable to its source.
- Court worthiness. The FII published by the NSRL is designed to be admissible as evidence in a U.S. court of law: NSRL data is traceable and verifiable back to the original software packages which are maintained under evidence locker conditions. The Hashkeeper collection is designed to be of use to forensic investigators but is not generally admissible as evidence in a court of law.
- Scope. NSRL data is strictly limited to that which can be traced to physical installation media which can be obtained and held as evidence at NIST. Hashkeeper data has a potentially broader scope, dependent only on the nature and quantity of FII that law enforcement officials decide to donate.
- Illicit file data. The NSRL is prohibited by law from obtaining and storing certain forms of illicit file data (e.g. child pornography images) and contains no FII on such data. The Hashkeeper collection is known to contain FII on illicit file data.
- Specificity of data. The NSRL uses five measures to identify each file (the SHA-1 cryptographic hash, the MD5 cryptographic hash, the CRC32 checksum, the file name and the file size), whereas Hashkeeper supplies only the file name and the MD5 cryptographic hash. There is thus a lower level of confidence in the Hashkeeper data.