I accidentally started a movement – Policing the Police by scraping court data
The idea quickly evolved into a real goal, to make good on the promise of free and open policing data. By freeing policing data from antiquated and difficult-to-access county data systems, and compiling that data in a rigorous way, we could create a valuable new tool to level the playing field and help provide community oversight of police behavior and activity.
In the almost 3 years since the first post, something amazing has happened.
The idea turned into something real. Something called The Police Data Accessibility Project. (https://www.pdap.io)
More than 2,000 people joined the initial community, and while those numbers dwindled after the initial excitement, a core group of highly committed and passionate folks remained. In these 3 years, this team has worked incredibly hard to lay the groundwork necessary to enable us to realistically accomplish the monumental data collection task ahead of us.
Let me tell you a bit about what the team has accomplished in these 3 years.
Established the community and identified volunteer leaders who were willing and able to assume consistent responsibility.
-Gained a pro-bono law firm to assist us in navigating the legal waters. Arnold + Porter is our pro-bono law firm.
-Arnold + Porter helped us to establish as a legal entity and apply for 501c3 status
-501c3 status granted
-We've carefully defined our goals and set a clear roadmap for the future
-Hired first full-time staff.
-PDAP was awarded a $250,000 grant by The Heinz Endowments
So now, I'm asking for help, because scraping, cleaning, and validating 18,000 police departments is no easy task.
The first is to join us and help the team. Perhaps you joined initially, realized we weren't organized yet, and left? Now is the time to come back. Or, maybe you are just hearing of it now. Either way, the more people we have working on this, the faster we can get this done. Those with scraping experience are especially needed. The second is to either donate, or help us spread the message. The more donations, the more data we can gather. I want to thank the r/privacy community especially. It was here that things really began.
TL;DR: I accidentally started a movement from a blog post I wrote about policing the police with data. The movement turned into something real because of r/privacy and hackernews: (Police Data Accessibility Project). 3 years later, the groundwork has been laid, non-profit established, full-time staff hired, and $250,000 in grant money and donations so far!
Scrapers so far Github https://github.com/Police-Data-Accessibility-Project/Scrapers Discord if you would like to join the efforts: https://discord.com/invite/wMqex8nKZJ
*This is US centric
What a huge letdown.
Take your pathetic crypto grift elsewhere please.
«I had a meeting with Dave @ Future Foundation discord: https://discord.gg/vHqwbDkX.
I shared that what we need help with most now is expertise to help us make quick decisions about what directions to go with crypto
I listed the 3 main ways we've thought of for using web3 tools:
1. DAO for the PDAP scrapers → community ownership
2. PDAP NFT to incentivize code contributions → contributor recruitment
3. Use distributed ledger tech for data storage → transparency + traceability
He said he's going to put together a *brainstorm session just for PDAP* with some of the experts who have collected in that Discord! They will help us answer our foundational questions and move us along the path to making stuff.»
Do you mind giving us brief on what kind of data you are collecting and highlight any interesting findings so far?
I do have a few questions too:
1. Will this scale? One problem with scrapers is that they break when people update their website. I'm imagining this problem multiplied by 18,000 and compounded by each scraper potentially being written by a different volunteer.
2. Where are the scrapers getting run?
3. How do the documents that the scrapers collect get transformed into usable data?
4. It seems to me like a scalable solution would be a standard to report data, a law to compel police departments to follow that standard, and then a system to collect that data and make it available. Do you work with police departments at all on data reporting?
You say in your FAQ "We aren't a watchdog—our activism is data collection and accessibility, not analysis or research."
Can you note any instances of other people using your data for analysis or research?
Here’s the pitfalls I see you falling into:
(1) seriously, what data are you collecting? “Everything” isn’t a great answer (who’s supposed to use ‘everything’, anyway? “Anyone”?). “Apples-to-apples police misconduct statistics” is a good one.
(2) it’s important to clarify 1 because you need to know who you’re serving, and why. Different activists need different data. “Have all data” sounds good until you need to decide how to allocate your resources.
(3) more deeply, data is the land of edge cases. Even just with police misconduct, you need to get DEEP to rigorously compare seemingly-simple stats like “# of unjustified police killings”. If you don’t start narrow, you’ll never show value. If you don’t show value, nobody will ever care you exist.
When I look at the data you’ve collected, it ranges from annual reports, to municipal contact info, to crime stats. What’s important to collect at scale? To whom? What do they need it for?
Again - great, ambitious idea! But $250k goes fast. Show value before it runs out!
It was a lot of work to find data on policing nationwide, because the question really was "Is the sheriff doing a bad job, or do bad things happen sometimes?"
After some hard work trying to identify other cities with similar socioeconomic circumstances and populations, it became clear that our local sheriff was actually better than average, and that much of the outrage was fabricated.
That's also when I learned that many people don't want to listen to statistics unless they agree with their own preconceptions.
Maintaining scrapers for 18k county websites and PDs is no small task and looking through the docs for PDAP, it seems like this is still a very open question.
Maintaining thousands of scrapers for different formats seems like a nightmare, and it won't take long for departments to learn they can slightly tweak the format of their reporting to cause extra work for you.
On the plus side, working with all this data probably makes you all very qualified to advise on developing standards.
TL:DR; If you want to write scrapers: go for it! Run your scraper, share the results in Discord and with your friends, and talk about the process. We'll be listening, and it will help us build tools to support this important work.
A few things to clarify:
a. The source of truth for "what are we doing right now" and "how can I contribute" is https://docs.pdap.io/.
b. Empowering people who write scrapers is a part of our broad mission of "police data accessibility", but we have some foundational work to do first! Right now our primary project is creating a database of police agencies and data sources. This will help people understand what kinds of data are available, at which agencies, with which steps to access it. It will also help us create archives of the primary sources, so that if they get taken offline we can still go back and scrape them.
c. What we have realized in the past few years: there are already a ton of people writing and using web scrapers for their day to day work. They are as decentralized as our police system. Our scrapers repo will reflect that. We shouldn't all rely on one library, or even one language. The people who need the data are most motivated to maintain scrapers, and we expect that maintenance will be ad-hoc and as-needed for the immediate future. In most cases, data already published on the internet is useful to local users as-is.
d. If you have a question you'd like to answer about the police, here's the investigation process:
1. Determine whether public data exists to answer your question. Use google to find the appropriate agency, and see what they're publishing. 2. Determine how it can be accessed; do you need to make a FOIA request? Is there a URL? 3. If there's a URL, determine whether you need to write a scraper to access the records. Often, the records can simply be downloaded. 4. Write and run a scraper, if you need one! 5. If there's not a URL, make a records request for the public information. This is a long and complicated process. 6. Share the data with your friends.
This means that scrapers are helpful and necessary some of the time; but not always, and not as the first step. We're trying to help with steps 1, 2, 3, 5, and 6. The theory is that writing scrapers is something people can easily slot in and help with; and that, depending on what question you're trying to answer, two scrapers for the same data source might look wildly different.
Scrapers are an important part of the ecosystem, but they're one piece of the puzzle.
I'm not here defending the police, or denigrating the project, just playing devils advocate. What happens if the police just ignore you?
I could imagine a revolving door between people working in the regulatory bodies and the industry they regulate.
Departments that are trying to be transparent (or who just don't want to deal with figuring it all out from scratch) may be happy to adopt something considered a "standard" for tracking and reporting data. In some cases it means it is a checkbox they can check without having to deal with annoying people and their annoying questions... but that hardly matters so long as the data is made available. It would also give companies developing software for police departments a target to aim for.