On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint with Wenliang Du, David Eppstein, and George Lueker Motivation Privacy is a concern with respect to information in relational data bases rows are associated with people columns are attributes K-anonymity No query should reveal less than K individuals image source: /2007/09/neutral-mask-masterclass.html Generalization Replace specific attributes with more general ones, so no category has fewer than K members. source: ?-Diversity: Privacy Beyond k-Anonymity Ashwin Machanavajjhala Johannes Gehrke Daniel Kifer Muthuramakrishnan Venkitasubramaniam Department of Computer Science, Cornell University Data Types Linear: Easy greedy algorithm is optimal Unordered: arbitrary groupings possible GPS coordinates: group using rectangles Zip codes: should use proximity, not text image source: /Applications/ZIPScribbleMap.html Previous Work [Samarati, Sweeney, 98] introduce concept of k-anonymization and generalization to achieve it. [Meyerson, Williams, 04] show optimal generalization or unordered data is NP-hard, but their proof requires as many attributes as people. And similar proofs are due to [Aggarwal et al., 05] and [Byun et al., 07]. [Khanna, Muthukrishnan, Paterson, 98] study a rectangle tiling problem similar to GPS coordinate generation, showing 5/4-approximations are not possible unless P=NP. Lots more work on k-anonymization and its variants… Our Results Zip codes: has a 4-approximation, but no 4/3-approximation unless P=NP GPS coordinates: has a 5-approximation, but no 4/3-approximation unless P=NP Unordered: is NP-hard but has a PTAS. Also, this version of the problem gives rise to a new type of bin-packing problem. Min-Max Bin Covering max min (k) image source: /article/5540/bin-packing/4/ Min-Max Bin Cover is NP-hard Reduction from: Reduction method: A Next-Fit Method: “Fold” Theorem: There is a linear-ti


