Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

similarity-sensitive-hash

JDvorak36MIT1.0.0

Lightning-fast cosine similarity search using locality-sensitive hashing. Compress high-dimensional vectors into tiny fingerprints for 10× faster similarity search with 99% less memory usage.

lsh, srp, sign-random-projection, cosine-similarity, count-sketch, hcs, higher-order-count-sketch, hashing, similarity, locality-sensitive-hashing, vector-similarity, embeddings, fingerprinting, fast-search, memory-efficient, recommendations, product-similarity, image-similarity, text-similarity, audio-fingerprinting, high-dimensional, approximate-similarity, microsecond-search

readme

SimilaritySensitiveHash - Lightning-Fast Cosine Similarity Search

npm package

Find similar items in microseconds. This library compresses high-dimensional vectors (embeddings, features, user preferences) into tiny fingerprints that let you estimate cosine similarity 10× faster while using 99% less memory than naive cosine similarity comparison while retaining 95 % accuracy in most production use cases, and within 0.05 absolute error for ≥ 0.7 cosine pairs, but up to 0.15 absolute error below 0.3.

What’s the Problem?

You have 100 000 image vectors and want “similar” ones.

Works for anything vector-shaped:

  • Product feature vectors
  • Image / text embeddings
  • User preference vectors
  • Audio fingerprints

How It Works (30-Second Version)

  1. Feed your vector into encodeSRP() → get a 32-bit to 512-bit fingerprint. Tiny!
  2. Compare fingerprints with estimateSimilarity() → cosine similarity in micro-seconds.
  3. Profit.

Trade-off: ~5 % error for an order of magnitude speed-up and massive memory savings.

Bundle Size & Performance

Tiny footprint: Core library is only ~5KB (gzipped: ~2KB).

Works everywhere:

  • Node.js (v14+)
  • Browser (ES modules)
  • Web Workers (thread-safe)

Quick Start

npm install similarity-sensitive-hash
import { encodeSRP, estimateSimilarity } from 'similarity-sensitive-hash';

const productA = [0.8, 0.2, 0.9, /*...etc...*/  0.1, 0.7]; // Big old vectors
const productB = [0.7, 0.3, 0.8, /*...etc...*/ 0.2,  0.6]; // Big old vectors

// 1. Hash once (64 bits)
const fpA = encodeSRP(productA, 64);
const fpB = encodeSRP(productB, 64);

// 2. Compare in <1 μs
const sim = estimateSimilarity(fpA, fpB);
console.log('Similarity ≈', sim); // 0.0 – 1.0

Two Algorithms, One API

Name Function When to Pick
HCS-SRP (default) encodeSRP() Faster comparison (recommended)
CS-SRP encodeCSSRP() Faster hashing **

Both give ~5 % error; pick based on your hot path.

API Cheatsheet

encodeSRP(vector, bits, seed = 42)
// vector: number[]           – your data
// bits: int                  – fingerprint size
// seed: int                  – reproducibility
// returns: Uint8Array        – compact fingerprint

estimateSimilarity(fpA, fpB)
// fpA, fpB: Uint8Array       – fingerprints
// returns: float 0-1         – cosine similarity estimate

Need raw speed? Use encodeCSSRP(vec, 64) for a flat 64-bit signature.

Real-World Recipes

const catalog = [...];                           // 100 k products
const fps = catalog.map(p => encodeSRP(p.vec, 64);

function findSimilar(target, threshold = 0.7) {
  const tgt = encodeSRP(target, 64);
  return fps
    .map((fp, i) => ({ i, sim: estimateSimilarity(tgt, fp) }))
    .filter(x => x.sim > threshold)
    .sort((a, b) => b.sim - a.sim);
}

Tiered Accuracy

// Phase 1: 16-bit fast filter
const coarse = encodeSRP(vec, 16);
// Phase 2: 256-bit accurate compare
const fine   = encodeSRP(vec, 256);

Benchmarks

Task Ops / sec vs Vanilla
HCS-SRP encode (512d -> 64-bit) 73 000 sigs/sec
HCS-SRP compare (512d -> 64-bit) 9.9 M ops/sec 10× faster
Vanilla cosine (512-d) 1 M ops/sec baseline

Accuracy: 95 % average absolute error < 0.05.

Size Generation Generation Comparison Comparison Memory
16 808.5 ms 123.7 k/s 58.2 ns 17.3 M/s 2 B
32 1145.0 ms 87.3 k/s 69.1 ns 14.5 M/s 4 B
64 1368.9 ms 73.0 k/s 101.8 ns 9.9 M/s 8 B
128 2304.0 ms 43.4 k/s 202.7 ns 4.9 M/s 16 B

Average Error by Fingerprint Size and Similarity Level

Size High Similarity Medium-High Similarity Medium Similarity Medium-Low Similarity Low Similarity Orthogonal Negative Similarity
16 0.1221 0.2194 0.2670 0.2937 0.2940 0.2917 0.2898
32 0.0891 0.1657 0.1914 0.2037 0.2166 0.2175 0.2154
64 0.0599 0.1101 0.1374 0.1537 0.1448 0.1538 0.1535
128 0.0553 0.1322 0.1925 0.2259 0.2536 0.2725 0.2945

Size Reduction Benefits

Massive memory savings compared to storing full vectors:

Vector Size Full Vector 64-bit Fingerprint Reduction
512-d float 2,048 bytes 8 bytes 256× smaller
1024-d float 4,096 bytes 8 bytes 512× smaller
2048-d float 8,192 bytes 8 bytes 1024× smaller

Real-world impact:

  • 1M product vectors (512-d): 2GB → 8MB (99.6% reduction)
  • 10M user vectors (1024-d): 40GB → 80MB (99.8% reduction)
  • 100M image vectors (2048-d): 800GB → 800MB (99.9% reduction)

Wait, what's this about 15% Error?

The approximation gets shakey when the two vectors are dissimilar. However, since most use cases concern similarity, and not fine grained ranking of dissimilarity this tends not to matter. Now, if you threshold at 0.5, then it'll be noisy, but so would vanilla cosine similarity in any high dimensional space. This library assumes you are filtering high dimensional space for similarity, and not for finding exact opposites or orthogonal vectors.

When to Use / Not Use

Use for

  • large datasets (≥ 10 k items)
  • real-time recommendations
  • memory-constrained environments
  • any cosine-similarity task that can tolerate ~5 % error

Don’t use for

  • exact similarity (error must be 0 %)
  • Jaccard similarity on sets (use Grouped-OPH)
  • Euclidean distance (use DistanceSensitiveHash)
  • Edit distance (use WarpDistanceHash)

Academic Roots

Implements Count-Sketch Sign Random Projections (CS-SRP) and Higher-Order Count-Sketch SRP (HCS-SRP) from:

Verma & Pratap, Faster and Space Efficient Indexing for Locality Sensitive Hashing, arXiv:2503.06737

Built on the classic LSH framework (Gionis, Indyk, Motwani, VLDB 1999).

License

MIT