<!doctype html><script src="eveal.js"></script>
by [traceypooh](https://twitter.com/tracey_pooh)data:image/s3,"s3://crabby-images/48688/48688c1e5f672fdaaade8ade2d8485dfadc0e05b" alt=""
https://traceypooh.github.io/mozfest17 _?_ for key shortcuts
git clone /~https://github.com/traceypooh/mozfest17; open mozfest17/index.html
data:image/s3,"s3://crabby-images/efb62/efb6244ee7ce321866dd0d35b2a5b124bcd2b0d3" alt=""
decentralized research and AI
built on top of
a library of stable, untampered worldwide TV recordings
- WayBack Machine
- past copies of 300B+ pages
- 15M books, lendable
- ~4M videos, ~4M audio & live concerts
- 3M images
- 200K software items & emulation (in JS!)
data:image/s3,"s3://crabby-images/d4274/d4274acdab672e00a054628f4ce4fc1db87805a7" alt=""
data:image/s3,"s3://crabby-images/e23cb/e23cbe5d694ce430cbdb51575ef872f9a3049495" alt=""
data:image/s3,"s3://crabby-images/b1c44/b1c449a9bd28822d8d218bb8ff543c36c7c906a5" alt=""
data:image/s3,"s3://crabby-images/8b8dd/8b8dd9046fb90d7a191f04b7c8b667769a5419bf" alt=""
- Absolute browser Privacy
- no personal data or IP addresses extracted
- Validation & nontampering
- keep original versions with 2+ checksums and logs
<file name="commute.mp4" source="derivative">
<title>commute</title>
<format>h.264</format>
<original>commute.avi</original>
<mtime>1325973601</mtime>
<size>11919082</size>
<md5>ff17ed66e7db5693dd208dd6ac488ff8</md5>
<crc32>ad1df03a</crc32>
<sha1>e9f9de8379cd25653d487ab30d198fc61a050091</sha1>
<length>115.61</length>
<height>480</height>
<width>640</width>
</file>
- OpenTimestamps
- uses SHA-1 and Merkle trees
- by Peter Todd - blog
- brand new!
- recording 50 - 100 channels
- 24 x 7
- around the world
- since 2000
- 2 million+ news shows
- search captions/metadata
- new Trump Administration and Congress subsets
- citable reference clips
- Popcorn editing/mashup clips
- for AI experiments
- text:
- chyron ("lower third") scanning OCR (Third Eye)
- caption alignment
- OCR captions from DVB-S
- BBC News
- speech to text (VoiceBase)
- Al Jazeera English
- Deutsche Welle English
- image:
- public officials facial detection
(Faceomatic <-- Matroid <-- FaceNet)
- public officials facial detection
- audio:
- fingerprinting
- audfprint - free/open like shazam
- political Ad tracking
- Duplitron 5000
- fingerprinting
- twitter bots & TSV
- slack bot
- continuous captions feed from CSPAN
- OCR 'lower third' - chyrons - overlaid text on broadcasts - not captions or descriptive text - editorial / summarizing in nature - 4 TV channels, 24x7, ~1 min from realtime - CNN - MSNBC - Fox News - BBC News
--- # bots - twitter bots - https://twitter.com/tvThirdEye - https://twitter.com/tvThirdEyeB - https://twitter.com/tvThirdEyeF - https://twitter.com/tvThirdEyeM - https://twitter.com/tvThirdEye/lists/all
- Tab Separated Values
- https://archive.org/services/third-eye.php
- nice for command-line
- import to google and excel spreadsheets
- filtered
- raw (~25MB / day)
- more errors
- 3rd-party filtering possible
- TSV files uploaded to https://archive.org/details/third-eye
- tesseract OCR
- free; errors
- simhash
- groups 'nearly the same'
- character flips
- word off in time
- groups 'nearly the same'
- look for vowels
- pick 'most seen' group every minute
- and tweet
- Vox determined Puerto Rico was paid little attention by Fox News
- audio fingerprints
- presented keynote paper on
CSPAN floor speeches and vocal pitch
Bryce Dietrich, UIowa - discovered 375K political Ads
- find sound bites of speeches
- presented keynote paper on
- little JSON annotations
- associate metadata to program start/end time range
- auto expands each clip to a "synthetic" document
- to elastic search
- JSONPatch for changes
- track play counts, some referers
- allows for decentralized annotations to other IA / research
{
"268.1|269.1": {
"subject": [
"Criminal Activity"
"Crime"
],
"factcheck": [
"http://www.factcheck.org/2016/07/factchecking-trumps-big-speech/"
]
},
"266.7|267.2": {
"ad_id": "PolAd_DonaldTrump_d9dsn",
"type": "campaign",
"race": "PRES",
"cycle": "2016",
"message": "pro",
"sponsor": [
"Republican National Cmte"
],
"sponsor_type": "PAC",
"subject": [
"Job Accomplishments"
],
"person": [
"Donald Trump"
]
},
"268.1|269.1": {
"collection": [
"nancy_pelosi_archive"
],
"subject": [
"Voting",
],
}
}
- https://archive.org/details/TVNewsKitchen
- want to serve journalists, researchers, librarians & more
- responsible behavior and access to data
- non-consumptive use
An imposter does not have Imposter Syndrome
- Convolutional Neural Network
- filtered neural network
- each layer uses output from prior layer as input
- instead of rule-based learning, use classified datasets to learn
- multi-node connections (but not "fully connected")
- "data squashers"
- feed in image
- node looking for eyelash
- node looking for iris
- could feed to node looking for eye
- meanwhile... nose node
- all feed to face recognizer node
- could feed to "is this Barack Obama?"
Rik Heijdens from jwplayer
- Demuxed 2017 talk
- feed in video - for each shot, make 3 vectors:
- image Inception CNN (tensorflow)
- audio CNN spectrogram
- text transcripts/STT into Word2Vec
- concat vectors, compare (cosine similarity), and graph
- ... yields scene detection
- all just for ideal Ad insertion!
- pixel diff algorithms (MAE, RMSE, MSE)
- perceptual hashing pHash.org
- image => 8x8 grayscale
- convolve to 8x8 image with DCT
- reduce to 64bit number
- hamming distance Int64 pairs
<style> .hashes img { width:150px; } </style>
- https://www.tensorflow.org/tutorials/image_recognition
- trained CNNs, locally run
- GoogLeNet Inception general classifier
- retrainable / customizable
- redo 'top layer' (Rik idea)
- https://www.tensorflow.org/tutorials/image_retraining
- 2048 multi-byte vectors (floats)
- iOS smaller single-byte vectors
- cosine distance comparisons
- can just compare vectors (and ignore readable classification labels (Rik idea))
- implementation of FaceNet
- https://cmusatyalab.github.io/openface/demo-3-classifier
- similar to tensorflow (Torch..)
- 3+ images per person/face
- avoid 'overfit'
- align eyes + nose (nostrils?)
- Rik idea
- differentiate instead of classify
- learns similarity of 2 inputs
- face tracking only public figures
- https://www.itic.org/resources/AI-Policy-Principles-FullReport2.pdf
- min. government regulation & access
- public/private partner; diversity/inclusion++
- preserve human dignity, rights, freedoms
- min. risk to humans; human control
- large datasets -- avoid harmful bias
- open discussion
- Siamese network
- miniARchive
- tensorflow
- google translate
- extend/shape our APIs
- AI ideas
- research, visualizations
- tag clips with AI metadata or pointers to Decentralized metadata
- more!
decentralized research and AI
built on top of
a library of stable, untampered worldwide TV recordings