Please use this identifier to cite or link to this item: https://rda.sliit.lk/handle/123456789/2045
Title: Dataset Reconstruction Attack against Language Models
Authors: Panchendrarajan, R
Bhoi, S
Keywords: Language Models
Dataset Reconstruction Attack
Information Leakage
Issue Date: Jul-2021
Series/Report no.: CEUR Workshop Proceedings;Vol 2942 Pages 1-17
Abstract: With the advances of deep learning techniques in Natural Language Processing, the last few years have witnessed releases of powerful language models such as BERT and GPT-2. However, applying these general-purpose language models to domain-specific applications requires further fine-tuning using domain-specific private data. Since private data is mostly confidential, information that can be extracted by an adversary with access to the models can lead to serious privacy risks. The majority of privacy attacks on language models infer either targeted information or a few instances from the training dataset. However, inferring the whole training dataset has not been explored in depth which poses far greater risks than disclosure of some instances or partial information of the training data. In this work, we propose a novel data reconstruction attack that also infers the informative words present in the private dataset. Experiment results show that an adversary with black-box query access to a fine-tuned language model can infer the informative words with an accuracy of about 75% and can reconstruct nearly 46.67% of the sentences in the private dataset.
URI: http://rda.sliit.lk/handle/123456789/2045
Appears in Collections:Research Papers - Dept of Computer Systems Engineering
Research Papers - Open Access Research
Research Papers - SLIIT Staff Publications

Files in This Item:
File Description SizeFormat 
DatasetReconstructionAttackagainstLanguageModels.pdf1.43 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.