LSTM-Based Analysis of De-Identification Techniques for Protecting Sensitive Data

Authors

  • Cik Feresa Mohd Foozy Author
  • K. Ravindran Author
  • Naqliyah Zainuddin Author
  • Ahmad S. Mohd Rozi Author
  • Muhammad H. A. Fakhrudin Author

DOI:

https://doi.org/10.69513/jnfit.v1.i0.a1

Abstract

This research examines the efficiency of de-identification techniques in enhancing privacy protections for sensitive data using Long Short-Term Memory (LSTM) models. Following a structured five-step methodology such as Dataset Collection, Data Preparation, Feature Extraction, Classification, and Performance Evaluation. The study evaluates LSTM’s performance of dataset based on Resume, Construction, and Medical domains. The primary goal is to examine the ability of de-identification methods to hide certain information based on classification accuracy. Results indicate that LSTM achieves accuracy levels 97.14% on unmodified data, explaining its success detecting sensitive information. However, after applying de-identification using Java Programming at pre-processing phase to eliminate sensitive keyword, the accuracy drops to 78.30%.These findings highlight the effectiveness of de-identification techniques to enhance data privacy, especially in fields that require strict confidentiality.

Downloads

Published

2024-12-20