Duplicate Detection via Random Hashing

Other Algorithms 1 implementation

To find duplicate columns efficiently:

Assign random weight r_i to each row
Compute column hash: h_j = Σ_i r_i · a_ij
Columns with same hash are candidates (verify exactly)

Duplicate columns (same cost): combine bounds

x_j and x_k identical → replace with x_new, l_new = l_j + l_k

Duplicate rows: keep tighter constraint, remove redundant

If row_i ⊆ row_k (interval containment), remove row_k

Mathematical Formulation

Column equivalence: a_j = a_k (element-wise) Combined variable: x_new = x_j + x_k Postsolve: split x_new back to feasible (x_j, x_k*)

Complexity

Time: O(nnz) for hashing, O(candidates × nnz) for verification Random hashing minimizes false positives Common in models with symmetry or copy-paste construction

Implementations

CoinUtils

CoinPresolveDupcol.hpp - Detect and remove duplicate columns and rows
- View source on GitHub