Hackers and Cybercriminals Use Dark Web Data to Train DarkBert AI

 

There is a paper released by a team of South Korean researchers describing how they developed a machine-learning model from a large dark web corpus collected by crawling Tor’s network. It was obvious that there were many shady sites included in the data. These sites were from the crypto community, pornography, hackers, weapons, and other categories. Despite this, the team decided not to use the data in the manner it came due to ethical concerns. 
DarkBERT was trained with a pre-training corpus, which was polished through filtering before feeding to the model through dark learning so that sensitive data would not be included in training since bad actors could extract sensitive data from it.
Some think that DarkBERT would sound like a nightmare, but the researchers say that it is a promising project that will do more than help combat cybercrime; it will also contribute to the advancement of technology in the field, which has grown a lot through natural language processing.
The team used the Tor network to connect their model to the dark web by using the DarkBERT language model. This system allows access to the dark web without logging in. In the process, it created a raw database of the data it found and then put it into a search engine. 
There has been a recent explosion

[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.

This article has been indexed from CySecurity News – Latest Information Security and Hacking Incidents

Read the original article: