dc.title : Data gathering and analysis: recognising Personally Identifiable Information
dc.creator : Vanbrabant, Casper
dc.subject :
dc.description.abstract : The internship assignment consists of the building and maintenance of a web scraper. The goal of this assignment is to collect Big Data sets from social media websites. The Institute then uses the Big Data sets to perform dialect analyses on them. There are many web scraping tools available, they often have different features, and sometimes they can be quite costly. At first a web scraping technology has to be selected in according to the needs of the assignment. In this case the tool has to be able to scrap data, filter data and subsequently store it in a database. After the comparison of some web scraping tools, the best one is selected and implemented. The focus of the research assignment is on Personally Identifiable Information. This type of information can be found almost everywhere on the worldwide web, especially on social media. Most people do not understand the possible danger of having their personal information falling into the wrong hands. A literature study explains the definition of Personally Identifiable Information, the difference with Personal Data defined by the General Data Protection Regulation, and demonstrates how criminals can (ab)use Personally Identifiable Information. Furthermore, there is a basic principle for training a model that could be used to recognise PII in the data sets that are collected by the web scraper.
dc.publisher : Hogeschool PXL
dc.contributor :
dc.date : 2019
dc.type : Bachelorproef
dc.format : application/pdf
dc.identifier : http://doks.pxl.be/doks/do/record/Get?dispatch=view&recordId=SEtd8ab2a8216cd2dafb016cd2eb92f602a1
dc.language : eng
dc.rights : All rights reserved
etd.degree.name : Professionele bachelor in de toegepaste informatica
etd.degree.level : Bachelor
etd.degree.discipline : Systemen en netwerkbeheer
etd.degree.grantor : Hogeschool PXL

