Welcome to the Utopia Forums! Register a new account
The current time is Mon Apr 29 19:13:19 2024

Utopia Talk / Politics / Hard Software Request: The Sequel
murder
Member
Tue Jun 28 05:21:11
"I've read communists manifesto, it's been a while (back in HS) and I still prefer Milton Friedman" -- Habebe

Ah yes, the good old days of "Obama is a communist".

Remember all those communist policies he enacted?

Good times! :o)
Habebe
Member
Tue Jun 28 05:28:33
Didn't he give us Kagen and Sotomayer?

Nobody wants extremists on the SC.
murder
Member
Tue Jun 28 05:38:22

Does anyone know why Rugian was advocating assassinating political leaders?


==============================================
Rugian Member Sat Oct 04 18:50:28

I understand now why the occasional assassination of political leaders is a necessary part of democracy.

http://utopiaforums.com/boardthread?id=politics&thread=4036

==============================================


I'm guessing something extreme like ... government spending?

Habebe
Member
Tue Jun 28 05:40:49
Perhaps he meant like Allende.

Best CIA assassination ever.
murder
Member
Tue Jun 28 05:41:05

habebe Member Wed Oct 08 23:43:56

People called me crazy and radical to say Obama is a Marxist. Many didn't even think he was an extreme radical left winger.

http://utopiaforums.com/boardthread?id=politics&thread=4327


HAHAHAHAHAHAHA!!!!! Blue Republicans accused of marxism. :o)
Habebe
Member
Tue Jun 28 05:55:29
Good morning Murder. Your going to have some fun today.
Nimatzo
iChihuaha
Tue Jun 28 06:09:26
So how to do we go about searching?
murder
Member
Tue Jun 28 06:10:08

I don't know. I was just looking at some threads linked by nhill.

nhill
Member
Tue Jun 28 08:09:22
I let it run over night, will check on it when I get to the computer.

Is there anything in particular that you want searched? I’ll post the database so people can do their own searching today. I think the easiest way is to use SQL directly, as then you have full
query power.
nhill
Member
Tue Jun 28 08:10:18
The code was only about 15-20 minutes, but individually scraping a million posts without access to an API takes a while
Nimatzo
iChihuaha
Tue Jun 28 08:28:18
Post modernism
Foucault
Immanuel Kant

Are three things that come to mind :)
nhill
Member
Tue Jun 28 08:56:16
http://utopiaforums.com/boardthread?id=politics&thread=78919
nhill
Member
Tue Jun 28 08:56:20
http://utopiaforums.com/boardthread?id=politics&thread=79020
nhill
Member
Tue Jun 28 08:56:40
http://utopiaforums.com/boardthread?id=politics&thread=90078
nhill
Member
Tue Jun 28 08:57:19
http://utopiaforums.com/boardthread?id=politics&thread=52208
nhill
Member
Tue Jun 28 08:57:47
http://utopiaforums.com/boardthread?id=politics&thread=54414
nhill
Member
Tue Jun 28 08:57:57
http://utopiaforums.com/boardthread?id=politics&thread=78911
nhill
Member
Tue Jun 28 08:58:24
http://utopiaforums.com/boardthread?id=politics&thread=26384
nhill
Member
Tue Jun 28 08:58:31
http://utopiaforums.com/boardthread?id=politics&thread=55289
nhill
Member
Tue Jun 28 08:59:35
some reading material :)

i'm uploading the database as we speak

you can use http://sqlitebrowser.org/dl/ to query once it's done. database is over 600MB, that's a lot of text ;)
murder
Member
Tue Jun 28 09:00:46

"that's a lot of text"

How much of it is CCs? ;o)

murder
Member
Tue Jun 28 09:01:33

btw did this pick up the deleted posts and threads too?

nhill
Member
Tue Jun 28 09:02:23
Yes it should
nhill
Member
Tue Jun 28 09:03:12
>How much of it is CCs? ;o)

We can answer that now with a bit of SQL query magic ;)
murder
Member
Tue Jun 28 09:03:31

So you got all the spam posts? lol

No wonder it took so long.

nhill
Member
Tue Jun 28 09:07:39
It went through each thread from 1 to 90149 and saved the posts. But only UP, not UGT.
nhill
Member
Tue Jun 28 09:16:17
http://dri...cJ3ynNHV4KU8s/view?usp=sharing

Here's the database. It's not very clean, but you should be able to use db browser and submit sql queries or use any other sql compatible tools.

The easiest way to search, provided it's a simple text search, is to use the "filter" functionality in DB browser.

Otherwise, if you're more familiar with SQL you can use FTS5 virtual tables.

Now that we have the engine so to speak, we can do whatever we want with it once the data is cleaned. There's a lot of messed up formatting at the moment.

Would be nice if TC would just share a JSON export then we don't have to scrape. Sigh.
nhill
Member
Tue Jun 28 09:26:04
Looks like there's about 2000 duplicates in there too. Not that it matters much, those are easy. The harder ones are the 1830 posts where the username & body failed to parse.

So about 3830 rows of junk data out of 1.2 million on a first (very ugly) pass. Is acceptable error rate :P
nhill
Member
Tue Jun 28 09:33:40
CC has written approximately 800,000 words in UP.
murder
Member
Tue Jun 28 09:35:28

That's a lot of words. :o)

nhill
Member
Tue Jun 28 09:35:33
I have written approximately 482,000 words. CC wins this round. I musta copy pasted some long articles to pad that word count, because I keep things digestible unless talking on a technical topic.
nhill
Member
Tue Jun 28 09:37:38
Murder, you have written almost 700,000 words. Is a crude estimate because I'm summing the amount of characters and diving by average length of a word (4.7).

And, also, I imagine you copy-pasted long articles at time.
nhill
Member
Tue Jun 28 09:40:23
Nimatzo has ~31713 posts. Murder ~16453. CC ~6778. CC wins on amount of words per post by far. ;)
nhill
Member
Tue Jun 28 09:41:09
Wow I have 9533 posts in UP. That's way more than I expected. Must have gone schizo for a while. I'm not even interested in politics. :P
nhill
Member
Tue Jun 28 09:42:18
Habebe has 25577 posts.

Looks like Nim may be the winner so far. I can write a quick script for a leaderboard, but what's the fun in that when we can stay in suspense ;)
nhill
Member
Tue Jun 28 09:59:50
Murder, it wouldn't pull deleted posts because I didn't turn on &showdeleted=true

It did pull deleted threads, though, as those appear to work in the original URL scheme
murder
Member
Tue Jun 28 10:01:38

Yeah, my posts are usually tweet sized or less except for when I post articles.

Nimatzo
iChihuaha
Tue Jun 28 10:04:01
Leaderboards!
Nimatzo
iChihuaha
Tue Jun 28 10:07:22
Have to do the usual suspects!

Sam adams
Seb
Jergul
Paramount
The Children
Rugian
nhill
Member
Tue Jun 28 10:10:27
SELECT user.*, count(post.username)
FROM post LEFT JOIN user ON user.username=post.username
GROUP BY post.username
ORDER BY count(post.username) DESC

:)
nhill
Member
Tue Jun 28 10:10:55
UP Leaderboard

hot rod 90292
sam adams 49357
rugian 46635
crownroyal 44573
jergul 36927
paramount 36282
tumbleweed 33747
aeros 33650
nimatzo 31713
earthpig 26907
camaban 25911
habebe 25577
the children 25082
seb 24595
dakyron 21696
river of blood 21457
cthulhu 19258
mckobb 17278
forwyn 17024
williamthebastard 16480
murder 16453
cloud strife 16168
kargen 15987
pillz 14867
hood 14828
asgard 14642
roland 13859
obaminated 13754
fred felcher 11880
renzo marquez 11511
milton bradley 11132
garyd 10934
swordtail 9831
nhill 9533
tj 8888
mexicantornado 8866
yankeessuck123 8771
mrpresident07 8716
dukhat 8506
nekran 8400
neverwoods 8183
patom 7980
y2a 7946
miltonfriedman 7757
real fred 7672
wrath of orion 7605
ehcks 7571
licker 7516
billah 7441
werewolf dictator 6973
phunkyphishstyle 6931
kreel 6805
cherub cow 6778
clitoral hood 6619
master bates 6549
mavl 6282
canadian 5837
pissflaps mcgee 5513
adolf hitler 5469
saiko 5380
madc0w 5378
hellfire 5306
chen 5232
charper 4535
smart dude 4227
ork 4184
hip 4167
hrothgar 4057
chuck 3869
isaksson 3865
osamaisdaworstpresid 3802
oddfish 3771
average ameriacn 3702
iii 3690
so what 3573
im better then you 3488
hoer 3450
eikeys ghost 3276
daemon 3194
the guardian 3172
superdude 2990
honest politician 2968
delude 2821
dickhead uper 2774
kilo 2675
arab 2635
firestorm phoenix 2623
goreth 2567
valishin 2477
muslim 2435
snuke 2420
earthpig epchmuftfoi 2391
turtle crawler 2326
state department 2232
freddy 2199
ninja 2192
allahuakbar 2141
j.b. 2096
still well 2060
jesse malcolm barack 2050
Paramount
Member
Tue Jun 28 12:07:21
I wonder how Still Well is doing.
nhill
Member
Tue Jun 28 12:11:58
He's fine, posted in UGT recently.
nhill
Member
Tue Jun 28 12:12:12
Last time he posted in UP was 2016, so maybe he got sick of you guys?
Paramount
Member
Tue Jun 28 12:20:14
Maybe. Or maybe he is posting with another name in UP.
nhill
Member
Tue Jun 28 12:21:00
*plot thickens again*
nhill
Member
Tue Jun 28 12:22:56
Could do lexical analysis with AI and get a rough idea of whom is multi for whom.

But not sure I want to spoil that. :)

TC probably tracks IPs and the version of Apache he uses is old enough to be hacked easily. But I'll be nice.
Paramount
Member
Tue Jun 28 12:28:11
Can you find which was my first Israel-thread? Where Israel is included in the title?
nhill
Member
Tue Jun 28 12:28:45
Sure
nhill
Member
Tue Jun 28 12:29:20
http://utopiaforums.com/boardthread?id=politics&thread=1132
and justice 4 all
Member
Tue Jun 28 12:29:23
and the first Israel-thread with my original name: and justice 4 all
nhill
Member
Tue Jun 28 12:29:40
http://utopiaforums.com/boardthread?id=politics&thread=1452
nhill
Member
Tue Jun 28 12:30:14
and justice 4 all:

http://utopiaforums.com/boardthread?id=politics&thread=5501
Paramount
Member
Tue Jun 28 12:30:14
I mean, the first Israel-thread that I created.
nhill
Member
Tue Jun 28 12:30:25
oh ok paramount
nhill
Member
Tue Jun 28 12:31:31
Paramount:

Here's your first Israel thread:
http://utopiaforums.com/boardthread?id=politics&thread=3044
nhill
Member
Tue Jun 28 12:32:15
and justice 4 all:

http://utopiaforums.com/boardthread?id=politics&thread=8454
Paramount
Member
Tue Jun 28 12:43:13
Thanks :)

It was fun looking through these old threads.
nhill
Member
Tue Jun 28 12:45:32
http://utopiaforums.com/boardthread?id=politics&thread=3512

Here's another one of yours :)
nhill
Member
Tue Jun 28 12:47:02
http://utopiaforums.com/boardthread?id=politics&thread=4051

another for paramount
Paramount
Member
Tue Jun 28 12:49:18
It was so easy to create an account back then. I typed my name wrong (Paranount instead of Paramount) and then I had a new account. Lol
Nimatzo
iChihuaha
Tue Jun 28 12:52:09
Even when paramount is sentimental, he is a nazi. lol, just kidding :P
nhill
Member
Tue Jun 28 13:04:54
"I only need to search my pants to find the worlds biggest penis, and the ladies are waiting in line to suck on it."

^some vintage Paramount going on here ;)
nhill
Member
Tue Jun 28 13:06:19
1,960 posts in UP refer to a penis.

6,223 refer to "dick", but that could be a name too.
nhill
Member
Tue Jun 28 13:10:21
5,415 posts have the word "asshole" in it. Kinda expected that to be higher
nhill
Member
Tue Jun 28 13:11:41
"nazi" is the winner so far-- 7,314 posts
nhill
Member
Tue Jun 28 13:16:36
"Paramount
Member Tue Sep 02, 2008 11:48:02

Israel can lick my hairy balls."

Paramount has the world's biggest penis with some hairy balls, apparently.
Pillz
Member
Tue Jun 28 16:06:22
"you can use http://sqlitebrowser.org/dl/ to query once it's done. database is over 600MB, that's a lot of text ;)"

Anything specific we need for the search like a dB name or just go for it?
Nimatzo
iChihuaha
Tue Jun 28 16:43:40
Nice work Nhill.

Now we can all grow old reliving the same fruitless discussions and debates :)
Paramount
Member
Tue Jun 28 16:50:52
” "I only need to search my pants to find the worlds biggest penis, and the ladies are waiting in line to suck on it." ”


I don’t have any memory of ever writing this, but I can see in that old thread that I did write this.
nhill
Member
Tue Jun 28 16:52:01
There is a "post" table, and a "user" table. Select which one you want to check, and then type some text into the filter function.

You can also query with SQL, it's a query language, easy to learn, such as:

SELECT * FROM post WHERE username = 'pillz'

That would be all of your posts.
nhill
Member
Tue Jun 28 16:55:17
Anything you want to check, feel free to ask and I'll create a SQL script for you to execute.

Like if you wanted to check a list of users aggregated by number of posts that are 100 characters or more (so eliminating short posts):

SELECT user.*, count(post.username)
FROM post LEFT JOIN user ON user.username=post.username
WHERE length(body) > 100
GROUP BY post.username
ORDER BY count(post.username) DESC
nhill
Member
Tue Jun 28 16:56:50
>Nice work Nhill.

Thanks! I'll make a visual tool that is easier to use eventually, but DB Browser (or any database viewer that supports sqlite) takes care of most of the basics.
nhill
Member
Tue Jun 28 17:02:06
http://i.imgur.com/oN8ckhK.mp4

^here's a tutorial
nhill
Member
Tue Jun 28 17:04:12
>I don’t have any memory of ever writing this, but I can see in that old thread that I did write this.

Is it true that you have the world's biggest penis?
show deleted posts

Your Name:
Your Password:
Your Message:
Bookmark and Share