Researchers looking into online toxicity found a way to connect supposedly anonymous posts on the site Economics Job Market Rumors (EJMR) to IP addresses over the past dozen years, according to a draft paper leaked early online.
While EJMR is an academic jobs forum, it “also includes much content that is abusive, defamatory, racist, misogynistic or otherwise ‘toxic,’” the paper says.
“EJMR is sometimes dismissed as not being representative of the economics profession, including claims that the most frequent users on the platform are not actually economists,” the paper says. “However, our analysis reveals that the users who post on EJMR are predominantly economists, including those working in the upper echelons of academia, government and the private sector. In this paper, we identify the scheme used to assign usernames for each post written by an anonymous user on EJMR. We show how the statistical properties of that algorithm do not anonymize posts, but instead allows the IP address from which each post was made to be determined with high probability.”
Florian Ederer, one of the authors and Questrom Professor of Economics and Management at Boston University, said an updated version of the paper will be presented at the National Bureau of Economic Research this afternoon.
Paul Goldsmith-Pinkham, a co-author and an assistant professor of finance at the Yale School of Management, said, “We have no intention of releasing personally identifiable information.”
EJMR posters are aware of the leaked paper. One top thread there is titled, “How is what Ederer did not illegal?”
“EJMR is currently melting down with people convinced their careers are in danger, presumably because they’ve said some very nasty and/or stupid things in locations that will easily identify them,” tweeted Ben Harrell, an assistant professor of economics at Trinity University, in Texas. “In the end, nothing of value will be lost.”
Asked for comment Wednesday afternoon, EJMR sent an email saying, “you may wish to consider what a neutral actor (ChatGPT) thinks about the study.”
EJMR’s email then includes a question to that artificial intelligence program: “Would reverse engineering partial hash codes of thousands of website users to get their IPs with brute force be considered hacking?” ChatGPT, according to the email, replied “Yes, that activity would certainly be considered hacking, and more specifically, it would be illegal and unethical.”
Later in the day, the website sent this email: "It is essential to maintain an anonymous forum in the economics profession. EJMR has been used to expose multiple counts of plagiarism, corruption and serious professional misconduct that would not likely have been shared for fear of retaliation by their higher ups or colleagues. Indeed one of the co-authors of the paper had their own likely plagiarism exposed by anonymous EJMR users, calling into question the motivation for the study. This paper’s attempt to expose the identities of the vast majority of good natured users, using the excuse of there being a very small number of toxic posts on the site, is something that many people find troubling, deeply unethical, and may well be illegal."
Ederer said, “What we’re doing is not hacking.”
Goldsmith-Pinkham said the draft paper was placed in a private cache online. It ended up on GitHub, and at least one professor who wasn’t an author said he downloaded it and shared it on Twitter.
Kyle Jensen, the third author, is also at the Yale School of Management.
The draft paper does include a chart showing, for each of the top 25 U.S. News–ranked economics departments, the percentage of total posts labeled toxic.
The University of California, Los Angeles, at nearly 15 percent toxic, ranked No. 1, followed by Yale and the University of California, San Diego, both above 10 percent.
“Although posting on EJMR is generally frowned upon in the economics profession, 10.2 percent of all posts to which we assign IP addresses originate directly from IP addresses associated with universities or research institutions,” they write. “Although some universities also are the internet service provider for some of their faculty and students (e.g., through university-provided faculty or student housing), this means that a substantial number of posts on EJMR occur while users are at their workplace. Perhaps even more surprisingly, there are EJMR posts from identified IP addresses located at literally all the leading universities in the United States.”
They write that “among the top 10 IP addresses with the highest number of toxic posts, there is not a single one from a university IP address. However, among the top 10 toxic university IP addresses there are several from leading U.S. universities including the University of Rochester and the University of Chicago.”
The researchers write that they “recover[ed] 47,630 distinct IP addresses of EJMR posters and match[ed] these to 66.1 percent of the roughly 7 million posts made over the past 12 years. We geolocate posts and describe aggregated cross-sectional variation—particularly regarding toxic speech—across sub-forums, geographies, institutions and contributors.”
The study included developing software and a dark dictionary to catch this “toxic” speech.
“These posts are obfuscated to such an extent that we found most machine learning models failed to accurately classify them as toxic,” researchers wrote. “To address this, we developed software to deobfuscate such speech. First, we classified posts into commonly occurring natural languages on EJMR (Stahl, 2023): English, German, Chinese, Korean and a few others. Then we collected high-frequency non-English words in the English posts, which we used to develop a dictionary mapping text like ‘f**k,’ ‘secks’ and ‘GTFO’ to canonical forms. We used this dictionary to deobfuscate some of the most commonly obfuscated terms. Then, we checked each word in each post for common symbol-based obfuscations like ‘fa//g//g//ot,’ removing symbols where doing so resulted in an English word or well-known profanity. Finally, we transformed so-called leetspeak—such as ‘d4mn j3ws’—to its canonical form. We did this by attempting common leetspeak substitutions and checking if those substitutions resulted in an English word or a well-known profanity. Our goal in this effort was not perfection, but rather some improvement in the performance of machine learning models for this content.”
Goldsmith-Pinkham said the study hasn’t yet been peer reviewed, but the authors plan to publish it in a peer-reviewed journal.
“EJMR is very popular,” the paper says.
“SimilarWeb estimates that EJMR receives 2.5 million visits per month with an average of 6.45 pages viewed per visit. In comparison, the same figures for the NBER [National Bureau of Economic Research] and AEA [American Economic Association] competitors are 1.1 million and 991,000 visits and 2.09 and 2.76 pages per visit, respectively.”
While researchers found “the vast majority of EJMR posts comes from residential IP addresses located in the United States and in particular in cities with elite universities … there is also a significant share of other countries including Canada, the United Kingdom, Hong Kong, Australia, Germany, Italy and France.”
from Hacker News https://ift.tt/QYEoq1N
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.