Welcome to the Utopia Forums! Register a new account
The current time is Mon Apr 29 19:13:19 2024
Utopia Talk / Politics / Hard Software Request: The Sequel
murder
Member | Tue Jun 28 05:21:11 "I've read communists manifesto, it's been a while (back in HS) and I still prefer Milton Friedman" -- Habebe Ah yes, the good old days of "Obama is a communist". Remember all those communist policies he enacted? Good times! :o) |
Habebe
Member | Tue Jun 28 05:28:33 Didn't he give us Kagen and Sotomayer? Nobody wants extremists on the SC. |
murder
Member | Tue Jun 28 05:38:22 Does anyone know why Rugian was advocating assassinating political leaders? ============================================== Rugian Member Sat Oct 04 18:50:28 I understand now why the occasional assassination of political leaders is a necessary part of democracy. http://utopiaforums.com/boardthread?id=politics&thread=4036 ============================================== I'm guessing something extreme like ... government spending? |
Habebe
Member | Tue Jun 28 05:40:49 Perhaps he meant like Allende. Best CIA assassination ever. |
murder
Member | Tue Jun 28 05:41:05 habebe Member Wed Oct 08 23:43:56 People called me crazy and radical to say Obama is a Marxist. Many didn't even think he was an extreme radical left winger. http://utopiaforums.com/boardthread?id=politics&thread=4327 HAHAHAHAHAHAHA!!!!! Blue Republicans accused of marxism. :o) |
Habebe
Member | Tue Jun 28 05:55:29 Good morning Murder. Your going to have some fun today. |
Nimatzo
iChihuaha | Tue Jun 28 06:09:26 So how to do we go about searching? |
murder
Member | Tue Jun 28 06:10:08 I don't know. I was just looking at some threads linked by nhill. |
nhill
Member | Tue Jun 28 08:09:22 I let it run over night, will check on it when I get to the computer. Is there anything in particular that you want searched? I’ll post the database so people can do their own searching today. I think the easiest way is to use SQL directly, as then you have full query power. |
nhill
Member | Tue Jun 28 08:10:18 The code was only about 15-20 minutes, but individually scraping a million posts without access to an API takes a while |
Nimatzo
iChihuaha | Tue Jun 28 08:28:18 Post modernism Foucault Immanuel Kant Are three things that come to mind :) |
nhill
Member | Tue Jun 28 08:56:16 http://utopiaforums.com/boardthread?id=politics&thread=78919 |
nhill
Member | Tue Jun 28 08:56:20 http://utopiaforums.com/boardthread?id=politics&thread=79020 |
nhill
Member | Tue Jun 28 08:56:40 http://utopiaforums.com/boardthread?id=politics&thread=90078 |
nhill
Member | Tue Jun 28 08:57:19 http://utopiaforums.com/boardthread?id=politics&thread=52208 |
nhill
Member | Tue Jun 28 08:57:47 http://utopiaforums.com/boardthread?id=politics&thread=54414 |
nhill
Member | Tue Jun 28 08:57:57 http://utopiaforums.com/boardthread?id=politics&thread=78911 |
nhill
Member | Tue Jun 28 08:58:24 http://utopiaforums.com/boardthread?id=politics&thread=26384 |
nhill
Member | Tue Jun 28 08:58:31 http://utopiaforums.com/boardthread?id=politics&thread=55289 |
nhill
Member | Tue Jun 28 08:59:35 some reading material :) i'm uploading the database as we speak you can use http://sqlitebrowser.org/dl/ to query once it's done. database is over 600MB, that's a lot of text ;) |
murder
Member | Tue Jun 28 09:00:46 "that's a lot of text" How much of it is CCs? ;o) |
murder
Member | Tue Jun 28 09:01:33 btw did this pick up the deleted posts and threads too? |
nhill
Member | Tue Jun 28 09:02:23 Yes it should |
nhill
Member | Tue Jun 28 09:03:12 >How much of it is CCs? ;o) We can answer that now with a bit of SQL query magic ;) |
murder
Member | Tue Jun 28 09:03:31 So you got all the spam posts? lol No wonder it took so long. |
nhill
Member | Tue Jun 28 09:07:39 It went through each thread from 1 to 90149 and saved the posts. But only UP, not UGT. |
nhill
Member | Tue Jun 28 09:16:17 http://dri...cJ3ynNHV4KU8s/view?usp=sharing Here's the database. It's not very clean, but you should be able to use db browser and submit sql queries or use any other sql compatible tools. The easiest way to search, provided it's a simple text search, is to use the "filter" functionality in DB browser. Otherwise, if you're more familiar with SQL you can use FTS5 virtual tables. Now that we have the engine so to speak, we can do whatever we want with it once the data is cleaned. There's a lot of messed up formatting at the moment. Would be nice if TC would just share a JSON export then we don't have to scrape. Sigh. |
nhill
Member | Tue Jun 28 09:26:04 Looks like there's about 2000 duplicates in there too. Not that it matters much, those are easy. The harder ones are the 1830 posts where the username & body failed to parse. So about 3830 rows of junk data out of 1.2 million on a first (very ugly) pass. Is acceptable error rate :P |
nhill
Member | Tue Jun 28 09:33:40 CC has written approximately 800,000 words in UP. |
murder
Member | Tue Jun 28 09:35:28 That's a lot of words. :o) |
nhill
Member | Tue Jun 28 09:35:33 I have written approximately 482,000 words. CC wins this round. I musta copy pasted some long articles to pad that word count, because I keep things digestible unless talking on a technical topic. |
nhill
Member | Tue Jun 28 09:37:38 Murder, you have written almost 700,000 words. Is a crude estimate because I'm summing the amount of characters and diving by average length of a word (4.7). And, also, I imagine you copy-pasted long articles at time. |
nhill
Member | Tue Jun 28 09:40:23 Nimatzo has ~31713 posts. Murder ~16453. CC ~6778. CC wins on amount of words per post by far. ;) |
nhill
Member | Tue Jun 28 09:41:09 Wow I have 9533 posts in UP. That's way more than I expected. Must have gone schizo for a while. I'm not even interested in politics. :P |
nhill
Member | Tue Jun 28 09:42:18 Habebe has 25577 posts. Looks like Nim may be the winner so far. I can write a quick script for a leaderboard, but what's the fun in that when we can stay in suspense ;) |
nhill
Member | Tue Jun 28 09:59:50 Murder, it wouldn't pull deleted posts because I didn't turn on &showdeleted=true It did pull deleted threads, though, as those appear to work in the original URL scheme |
murder
Member | Tue Jun 28 10:01:38 Yeah, my posts are usually tweet sized or less except for when I post articles. |
Nimatzo
iChihuaha | Tue Jun 28 10:04:01 Leaderboards! |
Nimatzo
iChihuaha | Tue Jun 28 10:07:22 Have to do the usual suspects! Sam adams Seb Jergul Paramount The Children Rugian |
nhill
Member | Tue Jun 28 10:10:27 SELECT user.*, count(post.username) FROM post LEFT JOIN user ON user.username=post.username GROUP BY post.username ORDER BY count(post.username) DESC :) |
nhill
Member | Tue Jun 28 10:10:55 UP Leaderboard hot rod 90292 sam adams 49357 rugian 46635 crownroyal 44573 jergul 36927 paramount 36282 tumbleweed 33747 aeros 33650 nimatzo 31713 earthpig 26907 camaban 25911 habebe 25577 the children 25082 seb 24595 dakyron 21696 river of blood 21457 cthulhu 19258 mckobb 17278 forwyn 17024 williamthebastard 16480 murder 16453 cloud strife 16168 kargen 15987 pillz 14867 hood 14828 asgard 14642 roland 13859 obaminated 13754 fred felcher 11880 renzo marquez 11511 milton bradley 11132 garyd 10934 swordtail 9831 nhill 9533 tj 8888 mexicantornado 8866 yankeessuck123 8771 mrpresident07 8716 dukhat 8506 nekran 8400 neverwoods 8183 patom 7980 y2a 7946 miltonfriedman 7757 real fred 7672 wrath of orion 7605 ehcks 7571 licker 7516 billah 7441 werewolf dictator 6973 phunkyphishstyle 6931 kreel 6805 cherub cow 6778 clitoral hood 6619 master bates 6549 mavl 6282 canadian 5837 pissflaps mcgee 5513 adolf hitler 5469 saiko 5380 madc0w 5378 hellfire 5306 chen 5232 charper 4535 smart dude 4227 ork 4184 hip 4167 hrothgar 4057 chuck 3869 isaksson 3865 osamaisdaworstpresid 3802 oddfish 3771 average ameriacn 3702 iii 3690 so what 3573 im better then you 3488 hoer 3450 eikeys ghost 3276 daemon 3194 the guardian 3172 superdude 2990 honest politician 2968 delude 2821 dickhead uper 2774 kilo 2675 arab 2635 firestorm phoenix 2623 goreth 2567 valishin 2477 muslim 2435 snuke 2420 earthpig epchmuftfoi 2391 turtle crawler 2326 state department 2232 freddy 2199 ninja 2192 allahuakbar 2141 j.b. 2096 still well 2060 jesse malcolm barack 2050 |
Paramount
Member | Tue Jun 28 12:07:21 I wonder how Still Well is doing. |
nhill
Member | Tue Jun 28 12:11:58 He's fine, posted in UGT recently. |
nhill
Member | Tue Jun 28 12:12:12 Last time he posted in UP was 2016, so maybe he got sick of you guys? |
Paramount
Member | Tue Jun 28 12:20:14 Maybe. Or maybe he is posting with another name in UP. |
nhill
Member | Tue Jun 28 12:21:00 *plot thickens again* |
nhill
Member | Tue Jun 28 12:22:56 Could do lexical analysis with AI and get a rough idea of whom is multi for whom. But not sure I want to spoil that. :) TC probably tracks IPs and the version of Apache he uses is old enough to be hacked easily. But I'll be nice. |
Paramount
Member | Tue Jun 28 12:28:11 Can you find which was my first Israel-thread? Where Israel is included in the title? |
nhill
Member | Tue Jun 28 12:28:45 Sure |
nhill
Member | Tue Jun 28 12:29:20 http://utopiaforums.com/boardthread?id=politics&thread=1132 |
and justice 4 all
Member | Tue Jun 28 12:29:23 and the first Israel-thread with my original name: and justice 4 all |
nhill
Member | Tue Jun 28 12:29:40 http://utopiaforums.com/boardthread?id=politics&thread=1452 |
nhill
Member | Tue Jun 28 12:30:14 and justice 4 all: http://utopiaforums.com/boardthread?id=politics&thread=5501 |
Paramount
Member | Tue Jun 28 12:30:14 I mean, the first Israel-thread that I created. |
nhill
Member | Tue Jun 28 12:30:25 oh ok paramount |
nhill
Member | Tue Jun 28 12:31:31 Paramount: Here's your first Israel thread: http://utopiaforums.com/boardthread?id=politics&thread=3044 |
nhill
Member | Tue Jun 28 12:32:15 and justice 4 all: http://utopiaforums.com/boardthread?id=politics&thread=8454 |
Paramount
Member | Tue Jun 28 12:43:13 Thanks :) It was fun looking through these old threads. |
nhill
Member | Tue Jun 28 12:45:32 http://utopiaforums.com/boardthread?id=politics&thread=3512 Here's another one of yours :) |
nhill
Member | Tue Jun 28 12:47:02 http://utopiaforums.com/boardthread?id=politics&thread=4051 another for paramount |
Paramount
Member | Tue Jun 28 12:49:18 It was so easy to create an account back then. I typed my name wrong (Paranount instead of Paramount) and then I had a new account. Lol |
Nimatzo
iChihuaha | Tue Jun 28 12:52:09 Even when paramount is sentimental, he is a nazi. lol, just kidding :P |
nhill
Member | Tue Jun 28 13:04:54 "I only need to search my pants to find the worlds biggest penis, and the ladies are waiting in line to suck on it." ^some vintage Paramount going on here ;) |
nhill
Member | Tue Jun 28 13:06:19 1,960 posts in UP refer to a penis. 6,223 refer to "dick", but that could be a name too. |
nhill
Member | Tue Jun 28 13:10:21 5,415 posts have the word "asshole" in it. Kinda expected that to be higher |
nhill
Member | Tue Jun 28 13:11:41 "nazi" is the winner so far-- 7,314 posts |
nhill
Member | Tue Jun 28 13:16:36 "Paramount Member Tue Sep 02, 2008 11:48:02 Israel can lick my hairy balls." Paramount has the world's biggest penis with some hairy balls, apparently. |
Pillz
Member | Tue Jun 28 16:06:22 "you can use http://sqlitebrowser.org/dl/ to query once it's done. database is over 600MB, that's a lot of text ;)" Anything specific we need for the search like a dB name or just go for it? |
Nimatzo
iChihuaha | Tue Jun 28 16:43:40 Nice work Nhill. Now we can all grow old reliving the same fruitless discussions and debates :) |
Paramount
Member | Tue Jun 28 16:50:52 ” "I only need to search my pants to find the worlds biggest penis, and the ladies are waiting in line to suck on it." ” I don’t have any memory of ever writing this, but I can see in that old thread that I did write this. |
nhill
Member | Tue Jun 28 16:52:01 There is a "post" table, and a "user" table. Select which one you want to check, and then type some text into the filter function. You can also query with SQL, it's a query language, easy to learn, such as: SELECT * FROM post WHERE username = 'pillz' That would be all of your posts. |
nhill
Member | Tue Jun 28 16:55:17 Anything you want to check, feel free to ask and I'll create a SQL script for you to execute. Like if you wanted to check a list of users aggregated by number of posts that are 100 characters or more (so eliminating short posts): SELECT user.*, count(post.username) FROM post LEFT JOIN user ON user.username=post.username WHERE length(body) > 100 GROUP BY post.username ORDER BY count(post.username) DESC |
nhill
Member | Tue Jun 28 16:56:50 >Nice work Nhill. Thanks! I'll make a visual tool that is easier to use eventually, but DB Browser (or any database viewer that supports sqlite) takes care of most of the basics. |
nhill
Member | Tue Jun 28 17:02:06 http://i.imgur.com/oN8ckhK.mp4 ^here's a tutorial |
nhill
Member | Tue Jun 28 17:04:12 >I don’t have any memory of ever writing this, but I can see in that old thread that I did write this. Is it true that you have the world's biggest penis? |
show deleted posts |