\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

Hundreds of thousands of identities stolen and used for pro Net Neutrality

https://hackernoon.com/more-than-a-million-pro-repeal-net-ne...
Multi-colored Soggy Office
  11/25/17
your thread title is backward bro, the pro-repeal side was t...
swashbuckling sanctuary
  11/25/17
OMG the call came from INSIDE THE HOUSE!
Trip Amber Market Brethren
  11/25/17


Poast new message in this thread



Reply Favorite

Date: November 25th, 2017 1:09 AM
Author: Multi-colored Soggy Office

https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6

More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked

I used natural language processing techniques to analyze net neutrality comments submitted to the FCC from April-October 2017, and the results were disturbing.

NY Attorney General Schneiderman estimated that hundreds of thousands of Americans’ identities were stolen and used in spam campaigns that support repealing net neutrality. My research found at least 1.3 million fake pro-repeal comments, with suspicions about many more. In fact, the sum of fake pro-repeal comments in the proceeding may number in the millions. In this post, I will point out one particularly egregious spambot submission, make the case that there are likely many more pro-repeal spambots yet to be confirmed, and estimate the public position on net neutrality in the “organic” public submissions.¹

Key Findings:²

One pro-repeal spam campaign used mail-merge to disguise 1.3 million comments as unique grassroots submissions.

There were likely multiple other campaigns aimed at injecting what may total several million pro-repeal comments into the system.

It’s highly likely that more than 99% of the truly unique comments³ were in favor of keeping net neutrality.

Breaking Down the Submissions

Given the well documented irregularities throughout the comment submission process, it was clear from the start that the data was going to be duplicative and messy. If I wanted to do the analysis without having to set up the tools and infrastructure typically used for “big data,” I needed to break down the 22M+ comments and 60GB+ worth of text data and metadata into smaller pieces.⁴

Thus, I tallied up the many duplicate comments⁵ and arrived at 2,955,182 unique comments and their respective duplicate counts. I then mapped each comment into semantic space vectors⁶ and ran some clustering algorithms on the meaning of the comments.⁷ This method identified nearly 150 clusters of comment submission texts of various sizes.⁸

After clustering comment categories and removing duplicates, I found that less than 800,000 of the 22M+ comments submitted to the FCC (3-4%) could be considered truly unique.

Here are the top 20 comment ‘campaigns’, accounting for a whopping 17M+ of the 22M+ submissions:

The vast majority of FCC comments were submitted as exact duplicates or as part of letter-writing/spam campaigns.

So how do we know which of these are legitimate public mailing campaigns, and which of of these were bots?

Identifying 1.3 Million Mail-Merged Spam Comments

The first and largest cluster of pro-repeal documents was especially notable. Unlike the other clusters I found (which contained a lot of repetitive language) each of the comments here was unique; however, the tone, language, and meaning across each comment was largely uniform. The language was also a bit stilted. Curious to dig deeper, I used regular expressions⁹ to match up the words in the clustered comments:

I found the term “People like me” particularly ironic.

It turns out that there are 1.3 million of these. Each sentence in the faked comments looks like it was generated by a computer program. A mail merge swapped in a synonym for each term to generate unique-sounding comments.¹⁰ It was like mad-libs, except for astroturf.

When laying just five of these side-by-side with highlighting, as above, it’s clear that there’s something fishy going on. But when the comments are scattered among 22+ million, often with vastly different wordings between comment pairs, I can see how it’s hard to catch. Semantic clustering techniques, and not typical string-matching techniques, did a great job at nabbing these.

Finally, it was particularly chilling to see these spam comments all in one place, as they are exactly the type of policy arguments and language you expect to see in industry comments on the proposed repeal¹¹, or, these days, in the FCC Commissioner’s own statements lauding the repeal.¹²

Pro-Repeal Comments were more Duplicative and in Much Larger Blocks

But just because the largest block of pro-repeal submissions turned out to be a premediated and orchestrated spam campaign¹³, it doesn’t necessarily follow that there are many more pro-repeal spambots to be verified, right?

As it turns out, the next two highest comments on the list (“In 2015, Chairman Tom Wheeler’s …” and “The unprecedented regulatory power the Obama Administration imposed …”) have already been picked out from previous reporting as possible astroturf as well.

Going down the list, each comment cluster/duplicate would need its own investigation, which is beyond the scope of this post. We can, however, still gain an understanding of the distribution of comments by taking a broader view. Reprising the bar chart above breaking down the top FCC comments, let’s look at the top 300 comment campaigns that comprise an astonishing 21M+ of the 22M+ submissions¹⁴:

Keep-Net Neutrality comments were much more likely to deviate from the form letter, and dominated in the long tail.

From this chart we can see that the pro-repeal comments (there are approximately 8.6 million of them) are much more likely to be exact duplicates (dark red bars) and are submitted in much larger blocks. If even 25% of these pro-repeal comments are found to have been spam, that would still result in more than 2 million faked pro-repeal comments, each with an email address attached. Further verification should be done on the email addresses used to submit these likely spam comments.

On the other hand, comments in favor of net neutrality were more likely to deviate from a form letter (light green, as opposed to dark green bars) and were much more numerous in the long tail. If the type, means of submission, and ‘spamminess’ of comments from both sides were equal, we would expect a roughly even distribution of light and dark, red and green, throughout the bars. This is evidently not the case here.¹⁵

Organic Public Comments: 99%+ Support Keeping Net Neutrality

And what of the less than 800,000 comments submitted that were not a duplicate or clustered as part of a comment category? Does the trend of comments turning in favor of net neutrality continue in the long tail?

It turns out old-school statistics allows us to take a representative sample and get a pretty good approximation of the population proportion and a confidence interval. After taking a 1000 comment random sample of the 800,000 organic comments and scanning through them, I was only able to find three comments that were clearly pro-repeal.¹⁶ That results in an estimate of the population proportion at 99.7%. In fact, we are so near 100% pro net neutrality that the confidence interval goes outside of 100%.¹⁷ At the very minimum, we can conclude that the vast preponderance of individuals passionate enough about the issue to write up their own comment are for keeping net neutrality.

Oh, and please do take a minute to scan through the samples I’ve provided. Those are the comments of real people being affected by this decision, who speak most personally and devastatingly about its impacts:

I am 82, handicapped, and home bound, but not lonely, because I have the free internet. I can roam the world. use Facebook to visit family friends. I can sell my work on Etsy without fear of Amazon getting preference should the 2015 law be repealed. If you (The FCC) no longer had oversight, my ISP could raise its prices so that I couldn’t afford to have the Internet at all! I am relying on the FCC to protect me and others like me.¹⁸

Conclusion

Public participation and civic engagement are fundamental to a functioning democracy. It’s scary to think that organic, authentic voices in the public debate — more than 99% of which are in favor of keeping net neutrality — are being drowned out by a chorus of spambots.¹⁹ We already live in a time of low faith in public institutions, and given these findings, I fear that the federal regulatory public comment process may be yet another public forum lost to spam and disinformation.

With the overwhelming actual public support for keeping net neutrality, it’s irresponsible for the FCC majority to simply wave their hand and disregard public opinion in the latest draft order, merely because of irregularities in the public record, or because the public comments weren’t written in legalese.

FCC chairman Ajit Pai’s office not only needs to furnish the evidence sought by AG Schneiderman, they need to respond to the FOIA requests regarding the net neutrality public comments candidly and transparently, to restore public confidence in the FCC rulemaking process.

(http://www.autoadmit.com/thread.php?thread_id=3808210&forum_id=2#34766377)



Reply Favorite

Date: November 25th, 2017 1:10 AM
Author: swashbuckling sanctuary

your thread title is backward bro, the pro-repeal side was the one faking the comments

(http://www.autoadmit.com/thread.php?thread_id=3808210&forum_id=2#34766386)



Reply Favorite

Date: November 25th, 2017 1:12 AM
Author: Trip Amber Market Brethren

OMG the call came from INSIDE THE HOUSE!

(http://www.autoadmit.com/thread.php?thread_id=3808210&forum_id=2#34766400)