Meta torrented & seeded 81.7 TB dataset containing copyrighted data
gameshot911
arstechnica.com
arstechnica.com1,270 points938 comments
Summary
provided by metafa.stMeta has been accused of using over 81.7TB of pirated books to train its artificial intelligence language model, leading to concerns about the legality and ethics of this practice from authors and publishers.
