Sunday, April 16, 2023

The ChatGPT vs. Bear Blog spam war

Ever since Bear Blog's infancy, spam has been an issue. Free services tend to attract those seeking to exploit them for backlinks and the alleged SEO benefits (although this is debatable given updates to the Google algorithm). I've previously discussed this in a post, detailing the manual review process which has been holding up well for the past 3 years.

But alas, change is upon us.

Spam used to be quite easy to spot: poorly worded, low-effort paragraphs sprinkled with backlinks to products or services. But ChatGPT's availability has altered the landscape, enabling people to generate entire blogs' worth of content at a single click. The economics of creating spam now heavily favours the spammer. This isn’t great for a platform that wants to host good content for a great user experience, and also doesn’t want to run afoul of Google’s ban-hammer.

Other spam hallmarks, such as backlinks, have also become less prevalent. I've seen a tech school, for example, create 20 "personal" blogs just to praise themselves without adding any backlinks (frustrating, since it makes detection harder).

This has led me down a path of conversations and experiments to refine Bear's review process since my time investment has been growing, not just because of the increase in the number of blogs, but the difficulty in separating the chaff from the wheat.

First, let's establish that there's no real economic incentive for AI-generated content on personal blogs. Personal blogs are like shouts into the internet void: an expression of writing on the internet that is, I think, pretty fundamental to human nature (see the rise of Twitter and Facebook). AI may assist some people in writing, but that's not really an issue, and can make the process of self expression easier for some. On the flip side, fuck spammers.

So the question is: How can I prevent spam on Bear while keeping personal blogs and websites awesome?

I came up with a few ideas:

Option 1 - Use AI to check for spam

This is a pretty obvious first choice. Fight fire with fire and all that. Train a custom AI to determine if a blog is selling something, if it is inappropriate or hateful, etc. Run it alongside the current review process until satisfied, then let it work autonomously.

While this has the upside of keeping the existing review process in place and allowing all new users to benefit from the Discovery feed and SEO juju, it doesn't work very well. Since these models are trained using an antagonistic process, they are (by design) pretty terrible at recognising AI generated content. This means that it doesn’t work perfectly and has too many false positives and false negatives. This is unfortunately unacceptable since I don’t want to ban legit blogs, and also want to properly remove spam blogs. I tired many variations of spam blocking with AI in my experiments and unfortunately didn’t find a process I was happy with.

Option 2 - Free accounts are not shown on the Discovery feed or indexed

New free accounts never receive the reviewed flag, never show up on the Discovery feed, and retain no-follow and no-index tags, so they aren’t indexed by Google and other search engines.

This has one or two benefits and quite a few downsides. The benefit is that there are no changes to the signup process, and it completely removes the necessity of reviewing content since the blogs are essentially invisible to the outside web unless the link is explicitly shared. And spammers don’t share links to their spam blogs. The downsides are that the platform is seen as having “poor SEO”, which would be true for free blogs, but I’ve worked pretty hard to make Bear the most SEO friendly blogging platform out there.

It also means that there would be less activity on the Discovery feed, which would be a shame since about 70% of the posts on there are from free blogs. And finally, Bear would technically still host bad content, some of which could be really bad (e.g. crypto scams, revenge porn, etc)

Option 3 - Grandfather all existing users and make Bear paid with a free trial

This option is pretty self explanatory. If Bear is not a free service any longer, it won’t be exploited by spammers since it changes the economics. Bar that upside, it would be pretty terrible for Bear which thrives as a free platform. This was only considered as a last-resort option.

After quite a bit of hard thinking and experimentation, I've concocted a neat way to blend the options together for an improved process.

First, use AI to flag the most blatant spam, employing a general GPT instance to ban accounts that clearly violate basic human decency. This still leaves most mundane spam on the site, however, which I still want on the Discovery feed.

Next, new users will remain in an “unreviewed” state and won’t show up on the Discovery feed or have their blogs indexable (this doesn’t affect existing users who are “reviewed”). They can then opt into a human review (handled by yours truly). The option for opting in only appears a day after the blog's creation, hopefully deterring spammers who are incentivised to spend as little time as possible on the site. New blogs are then reviewed by me once a week. This wait can be skipped by upgrading (which you should totally do anyway to support the platform).

This method drastically reduces manual reviews, as I no longer need to review abandoned or test blogs until there's content to show. I don’t have to even see the nastiest of things that show up on platform, since those are dealt with automatically. It retains a fantastic free user base, prevents the worst content from being hosted on the platform, and while it technically still hosts some marketing spam, that content is invisible to everyone and to the Internet at large, and is utterly useless to spammers.

War won…for now…

--

Enjoyed the article? I write about 1-2 a month. Subscribe via email or RSS feed.


from Hacker News https://ift.tt/6aHg5yu

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.