Duplicate Detection via Random Hashing
To find duplicate columns efficiently:
- Assign random weight r_i to each row
- Compute column hash: h_j = Σ_i r_i · a_ij
- Columns with same hash are candidates (verify exactly)
Duplicate columns (same cost): combine bounds
- x_j and x_k identical → replace with x_new, l_new = l_j + l_k
Duplicate rows: keep tighter constraint, remove redundant
- If row_i ⊆ row_k (interval containment), remove row_k
Mathematical Formulation
Column equivalence: a_j = a_k (element-wise) Combined variable: x_new = x_j + x_k Postsolve: split x_new back to feasible (x_j, x_k*)
Complexity
Time: O(nnz) for hashing, O(candidates × nnz) for verification Random hashing minimizes false positives Common in models with symmetry or copy-paste construction
Implementations
CoinUtils
- CoinPresolveDupcol.hpp - Detect and remove duplicate columns and rows