I accidentally started a movement – Policing the Police by scraping court data

Almost 3 years ago, I posted a story of how a post I wrote about utilizing county level police data to "police the police" to r/privacy and hackernews. https://old.reddit.com/r/privacy/comments/gr11aw/i_think_i_accidentally_started_a_movement/

The idea quickly evolved into a real goal, to make good on the promise of free and open policing data. By freeing policing data from antiquated and difficult-to-access county data systems, and compiling that data in a rigorous way, we could create a valuable new tool to level the playing field and help provide community oversight of police behavior and activity.

In the almost 3 years since the first post, something amazing has happened.

The idea turned into something real. Something called The Police Data Accessibility Project. (https://www.pdap.io)

More than 2,000 people joined the initial community, and while those numbers dwindled after the initial excitement, a core group of highly committed and passionate folks remained. In these 3 years, this team has worked incredibly hard to lay the groundwork necessary to enable us to realistically accomplish the monumental data collection task ahead of us.

Let me tell you a bit about what the team has accomplished in these 3 years.

Established the community and identified volunteer leaders who were willing and able to assume consistent responsibility.

-Gained a pro-bono law firm to assist us in navigating the legal waters. Arnold + Porter is our pro-bono law firm.

-Arnold + Porter helped us to establish as a legal entity and apply for 501c3 status

-501c3 status granted

-We've carefully defined our goals and set a clear roadmap for the future

-Hired first full-time staff.

-PDAP was awarded a $250,000 grant by The Heinz Endowments

So now, I'm asking for help, because scraping, cleaning, and validating 18,000 police departments is no easy task.

The first is to join us and help the team. Perhaps you joined initially, realized we weren't organized yet, and left? Now is the time to come back. Or, maybe you are just hearing of it now. Either way, the more people we have working on this, the faster we can get this done. Those with scraping experience are especially needed. The second is to either donate, or help us spread the message. The more donations, the more data we can gather. I want to thank the r/privacy community especially. It was here that things really began.

TL;DR: I accidentally started a movement from a blog post I wrote about policing the police with data. The movement turned into something real because of r/privacy and hackernews: (Police Data Accessibility Project). 3 years later, the groundwork has been laid, non-profit established, full-time staff hired, and $250,000 in grant money and donations so far!

Scrapers so far Github https://github.com/Police-Data-Accessibility-Project/Scrapers Discord if you would like to join the efforts: https://discord.com/invite/wMqex8nKZJ

*This is US centric



joshenders 9d
Got excited by this call to action and then discovered this filth by your executive director in your discord.

What a huge letdown.

Take your pathetic crypto grift elsewhere please.

«I had a meeting with Dave @ Future Foundation discord: https://discord.gg/vHqwbDkX.

I shared that what we need help with most now is expertise to help us make quick decisions about what directions to go with crypto

I listed the 3 main ways we've thought of for using web3 tools:

1. DAO for the PDAP scrapers → community ownership

2. PDAP NFT to incentivize code contributions → contributor recruitment

3. Use distributed ledger tech for data storage → transparency + traceability

He said he's going to put together a *brainstorm session just for PDAP* with some of the experts who have collected in that Discord! They will help us answer our foundational questions and move us along the path to making stuff.»

ben174 9d
a while back I created www.bartcrimes.com to publish police reports which were intentionally hidden behind a mailing list you must get approved to be a member of. Turns out, the public loves this kind of thing.
Jcowell 9d
Post like this are interesting because as an idea you would think that HN would the best target. Even if no one here provides a a single character of code they can provide insight Into pitfalls and experiences they’ve run into when doing this sort of thing. I hope the comment section are fortuitous in advice.
bckr 9d
I'm curious if there are opportunities to be a force multiplier here. I see that the Readme says "there's no automated scraper farm" yet. Getting that set up seems crucial. Will jump on the Discord :)
meteor333 9d
Thanks for sharing about your project!

Do you mind giving us brief on what kind of data you are collecting and highlight any interesting findings so far?

elicash 9d
I'd be interested in helping scrape, but no experience. I'd presume every county is different so there's no simple training you can put folks through? Other tasks, like monitoring for things breaking?
ALittleLight 9d
I like writing web scrapers and this is an interesting project idea. If I understand right you are looking for volunteers to write scrapers that would take a police department, scrape the PD website, and download any PDFs or documents that gather data about the police department. Is that right? If so, I feel that's not super clearly communicated - I had to look at a couple example scrapers before arriving at this guess.

I do have a few questions too:

1. Will this scale? One problem with scrapers is that they break when people update their website. I'm imagining this problem multiplied by 18,000 and compounded by each scraper potentially being written by a different volunteer.

2. Where are the scrapers getting run?

3. How do the documents that the scrapers collect get transformed into usable data?

4. It seems to me like a scalable solution would be a standard to report data, a law to compel police departments to follow that standard, and then a system to collect that data and make it available. Do you work with police departments at all on data reporting?

treis 9d
[deleted by user]
Pelerin 9d
Thank you for your work with this! One question I have:

You say in your FAQ "We aren't a watchdog—our activism is data collection and accessibility, not analysis or research."

Can you note any instances of other people using your data for analysis or research?

Is it possible to see the data the PDAP has scraped? I visited the website but I don't see any actual data.
enviclash 9d
I would like to research on the data, is it available as a source? (Email in profile)
contingencies 9d
I wonder if it is legal / possible to record police radio traffic and associate it with the records?
curiousllama 9d
Really love the idea, and the passion behind it. Def could have legs.

Here’s the pitfalls I see you falling into:

(1) seriously, what data are you collecting? “Everything” isn’t a great answer (who’s supposed to use ‘everything’, anyway? “Anyone”?). “Apples-to-apples police misconduct statistics” is a good one.

(2) it’s important to clarify 1 because you need to know who you’re serving, and why. Different activists need different data. “Have all data” sounds good until you need to decide how to allocate your resources.

(3) more deeply, data is the land of edge cases. Even just with police misconduct, you need to get DEEP to rigorously compare seemingly-simple stats like “# of unjustified police killings”. If you don’t start narrow, you’ll never show value. If you don’t show value, nobody will ever care you exist.

When I look at the data you’ve collected, it ranges from annual reports, to municipal contact info, to crime stats. What’s important to collect at scale? To whom? What do they need it for?

Again - great, ambitious idea! But $250k goes fast. Show value before it runs out!

debacle 9d
This is important. Locally, we had a sheriff who was being heavily, heavily criticized due to several deaths at the county facility. This was at the height of the protests a few years ago.

It was a lot of work to find data on policing nationwide, because the question really was "Is the sheriff doing a bad job, or do bad things happen sometimes?"

After some hard work trying to identify other cities with similar socioeconomic circumstances and populations, it became clear that our local sheriff was actually better than average, and that much of the outrage was fabricated.

That's also when I learned that many people don't want to listen to statistics unless they agree with their own preconceptions.

celestialcheese 9d
For folks who do this kind of disparate data-source scraping at scale, what does best practices look like? What kind of tools are used in industry?

Maintaining scrapers for 18k county websites and PDs is no small task and looking through the docs for PDAP, it seems like this is still a very open question.

KennyBlanken 9d
Are you also working on pushing standards for data sources, such as a state-level standard? Ideally federal standards?

Maintaining thousands of scrapers for different formats seems like a nightmare, and it won't take long for departments to learn they can slightly tweak the format of their reporting to cause extra work for you.

On the plus side, working with all this data probably makes you all very qualified to advise on developing standards.

josh-pdap 9d
Hello! I'm the executive director. I have a design background, have done product management in the past, and aside from keeping the lights on at PDAP and making sure we're tax-compliant I am in a product role. I talk to people using police data, and figure out where we can add value to make the data more accessible.

TL:DR; If you want to write scrapers: go for it! Run your scraper, share the results in Discord and with your friends, and talk about the process. We'll be listening, and it will help us build tools to support this important work.

A few things to clarify:

a. The source of truth for "what are we doing right now" and "how can I contribute" is https://docs.pdap.io/.

b. Empowering people who write scrapers is a part of our broad mission of "police data accessibility", but we have some foundational work to do first! Right now our primary project is creating a database of police agencies and data sources. This will help people understand what kinds of data are available, at which agencies, with which steps to access it. It will also help us create archives of the primary sources, so that if they get taken offline we can still go back and scrape them.

c. What we have realized in the past few years: there are already a ton of people writing and using web scrapers for their day to day work. They are as decentralized as our police system. Our scrapers repo will reflect that. We shouldn't all rely on one library, or even one language. The people who need the data are most motivated to maintain scrapers, and we expect that maintenance will be ad-hoc and as-needed for the immediate future. In most cases, data already published on the internet is useful to local users as-is.

d. If you have a question you'd like to answer about the police, here's the investigation process:

1. Determine whether public data exists to answer your question. Use google to find the appropriate agency, and see what they're publishing. 2. Determine how it can be accessed; do you need to make a FOIA request? Is there a URL? 3. If there's a URL, determine whether you need to write a scraper to access the records. Often, the records can simply be downloaded. 4. Write and run a scraper, if you need one! 5. If there's not a URL, make a records request for the public information. This is a long and complicated process. 6. Share the data with your friends.

This means that scrapers are helpful and necessary some of the time; but not always, and not as the first step. We're trying to help with steps 1, 2, 3, 5, and 6. The theory is that writing scrapers is something people can easily slot in and help with; and that, depending on what question you're trying to answer, two scrapers for the same data source might look wildly different.

Scrapers are an important part of the ecosystem, but they're one piece of the puzzle.

account-5 9d
Apologies for my ignorance but how is this going to police the police? I read the original blog post, there was lots of inferences/could and might be's/etc made but little in the way of proof of anything. What's to stop the police saying it was just circumstance that provided your results?

I'm not here defending the police, or denigrating the project, just playing devils advocate. What happens if the police just ignore you?

vgeek 9d
Of all news outlets you'd never expect, USA Today did a good amount of FOIA requests and made them searchable at https://www.usatoday.com/in-depth/news/investigations/2019/0...

There are other sources regarding Brady lists like https://giglio-bradylist.com/ and http://bradycops.org/, but they are obviously not 100% complete.

tmaly 9d
Is there anything like this for regulatory capture in federal and state governments?

I could imagine a revolving door between people working in the regulatory bodies and the industry they regulate.

xenadu02 9d
You might try defining what the "ideal" department's data would look like: what categories of data, what columns each record has, what the values are for each, etc. Ideally you'd stamp it with a year and give it a spiffy name so it could be the National Police Data Reporting Standard 2022 (NPDRS.2022) or something.

Departments that are trying to be transparent (or who just don't want to deal with figuring it all out from scratch) may be happy to adopt something considered a "standard" for tracking and reporting data. In some cases it means it is a checkbox they can check without having to deal with annoying people and their annoying questions... but that hardly matters so long as the data is made available. It would also give companies developing software for police departments a target to aim for.

liamtuohyff 8d
I was an early helper when I saw that on reddit and joined your slack before you had a discord. I was also one of the ones you mentioned that fizzled out after the initial excitement died down. But I didn't stop helping because the excitement died down. I stopped helping because I felt like we weren't "doing" anything. Other than raising money and getting paperwork in order. Have you guys actually "done" anything in the three years since? Other than, you know, collecting data and sitting around talking about "stuff"